Forum "Spring" Cleaning - Operation Broomstick

Now we’ve been using Discourse for over eight months, and the quality of life changes since moving have started to really shine through, one older problem we do have is the text format not correctly showing images or mentions to other usernames. I tried looking into bulk updating and it’s not going to work, so cleaning it up means getting hands dirty. Well keyboards.

Going through the posts means people may start to see edits in their notifications, older topics being bumped and maybe a few other random bits but otherwise won’t cause a lot of disruption. So please don’t be alarmed if you get a larger than normal amount of notifications that you wouldn’t usually see.

One change made is now everyone can edit any post they’ve made on the forum, going back to 2020. So if you have time to edit any mentions, you can easily on your own posts by going into the post, using the edit function and changing this:

@“username#p7404

to

@username

and save it, nothing else needed! The rest of the text is fine as is and this will then reinstate the mention.

If anyone wants to help out more, let me know and I can send over a list of posts that need editing and set up temporary trust levels to complete them. Once this part is done we’ll look at image cleanup too.

:broom: :broom: :broom:

22 Likes

Just to let everyone know - the changes above may remove quotes or mentions of other posts in the topic that were linked in Flarum. It’s hard to match them up as the topic and post IDs in Flarum are different in Discourse, so that’s a huge clean up operation which would take too long to fix fully.

So if crucial quotes and mentions are removed from posts, I apologise in advance.

Depending on how technical your team is you could attempt something like this to save manual effort?

Also a good April Fools trick to have in your back pocket lol

1 Like

Thanks for the advice - I had already looked into writing something but the challenge is finding the specific text needing to be edited as it’s often in the middle of other text which needs to be kept in place. I can write the update, but it’s not easy to deploy as a find with position can vary when looking at the raw text.

Sometimes it’s just easier to roll up your sleeves and do it the hard way!

1 Like

No worries, I totally understand

1 Like

Any pointers or help with this appreciated, I have looked and tried things but there’s always someone out there who may see something I don’t.

Of course I am still thinking about April 1st, 2025 too…

Have you looked into regex for this? I wrote this quick regex:

/(@.*)(#[0-9]*)/gm

Which would match any group of characters preceded by an @ followed by a # and some numbers. So if you can find and replace capture groups, like find /(@.*)(#[0-9]*)/gm and replace with $1 (the first capture group, i.e. just the username) seems like that might work.

May be oversimplifying this though as I have no familiarity with discourse!

Edit, alternatively this one:
/@\“(.*)\”#p[0-9]*/gm

Would match the username in quote marks, followed by the p and the number. You could replace this with @$1 to just get the username.

1 Like

The database itself is accessed as Postgres, so regex itself can be substituted for the SIMILAR operator. The hardest part is the amount of characters varies after the @ symbol due to length of user names and raw text.

Here’s an example from a post of mine I found that can be edited. This is the raw post that came over from Flarum:

<r><p><POSTMENTION discussionid="3048" displayname="Mnemogenic" id="159929" number="983">@"Mnemogenic"#p159929</POSTMENTION> It is, I had it when I was younger and it was one of my favourite games on the machine! Didn<FP char="’">'</FP>t realise it was on the GBC so I will check that <DEL><s></s>put<e></e></DEL> out very soon!</p> </r>

So to get it to work again I need to remove @“Mnemogenic”#p159929 and change it to @ Mnemogenic without deleting any other text either side of the >@ or </P… to preserve the original mention and look into updating that at a later date. So finding it, writing something to create a new string to hold it, reinsert it back into the original string in the correct position, updating each post like it and then uploading that back into the database is a tough one. That example above was a small post, as there’s also many that have more than one mention so the regex/SIMILAR would get confused and possibly replace the wrong text.

Edit - the postmention text is related to Flarum only and was an extension used on the old forum, meaning the text itself cannot be easily replicated here. I am however keeping it in the posts (not visible of course, this is the raw text) in case there’s someone who can figure out how to make the link!

I’m not familiar with PostgreSQL but it does support regex, which would then allow you to better account for variations in length. It seems like this would be the best method as the start and end of the target is quite specific. I don’t think you’d get any over reaching

Look for regexp_replace

1 Like

I’d read the same article and the specific plugin on Discourse Data Explorer, which I am using to access the posts at this point doesn’t support that function. However I will have a play outside of that to see I can write a script that uses it to find and update accordingly. If it works we can look at running it on a server level. Thanks for that.

In the meantime, for any Tampermonkey users I threw together this script, the idea is that if you go to edit a post it adds this “Fix Flarum artifacts!” button that runs a few regex on the contents of the post you’re editing. So in theory you should be able to just click edit and then press the button. You can then check the contents of the updated post before saving. If anyone gives this a try let me know how you get on!

Before:

After:

8 Likes

I’m happy to help do this! I’m a particular sort of weirdo whose brain is helped to do creative things by doing repetitive tasks. Feel free to send me a list!

5 Likes

This is great, thank you! I don’t use Tampermonkey but should check it out.

Does it find all instances of it or is it based on first find only? And could you share it so I could have a quick read please?

Thanks, let me just reach out and I’ll sort out how to give you edit access and provide some posts to clean

:metal:t2:

1 Like

Sure, the code itself is visible in the link if you click on the code tab. It’s just a few lines of JS (that I have barely refactored - don’t judge me!).

The actual replace commands are here, it just runs them in sequence:

           text.replace(/@\“(.*)\”#p[0-9]*/, "@$1")
            .replace(/@\"(.*)\"#p[0-9]*/, "@$1")
            .replace(/@\"(.*)\"#[0-9]*(<\/USERMENTION>)/, "@$1$2")
            .replace(/(@.*)(#[0-9]*)/, "$1");

They are matching the following formats, which are different versions of this issue that I’ve seen from looking through my old posts:
@“Kez”#p0000
@"Kez"#p0000
<USERMENTION...>@"Kez"#0000</USERMENTION>
@Kez#0000

I have put in both regular " speech marks and the proper “ ” quotation marks that discourse converts them to, but actually I think just the regular ones are required. I initially thought that’s how it looked as discourse formatted your original post that way.

It scans the text for any strings matching the regex and replaces them with just the @username form (it doesn’t remove any HTML tags, those are just there to make the regex match more specific). Not just the first match.

Then it forces a “change” event so that Discourse will update the preview and link up any matching usernames as if you typed it in. One issue is that some users have changed their names over the years, so a simple find/replace won’t fix them. For example in my preview image DavidNoo is now DaveedNoo so discourse doesn’t pick it up.

In theory I could just add a list of all historical aliases for known changelings to the script and replace them with their current username.

I also don’t know if discourse is generating these links dynamically at runtime or is actually storing the links in the database upon save. So if I tag @Tom for example, it generates a link to your user. How does that show up in the discourse DB? If it’s more than just simply @Tom then running a batch job on the database directly would work, but if discourse is actually parsing the input and converting it to something else or generating relationships - any batch job would also need to replicate that work when updating the DB.

Anyway, the script sidesteps that as you are still submitting the edited post in the normal way. Still it only works on one post at a time! But it could be made quite robust and streamline the workflow considerably if people are interested in that. Fixing other formatting weirdness at the same time for example.

2 Likes

I like it, a lot. Thank you! There are very small changes to the text when you find it, for example I have been searching with PostGres using the where clause of - ‘%@“%’ - which so far has found what I believe to be all of the instances of this appearing.

There are a few differences, if you could include one addition it would be:

.replace(/@"(.)"#[0-9](</POSTMENTION>)/, “@$1$2”)

which is another type, probably the largest I’ve found, where in Flarum it was used as the old plug in to quote text from another post and seems to have caused the biggest problems.

Those posts really can’t be fixed automated, as the ID it came from doesn’t include which piece of text, and more importantly as it’s using old Flarum IDs it doesn’t align to the new one, so it’s really a case of doing it the very hard way. But that’s why I wanted to keep as much original text, so maybe together we can put the collective IC heads together and work out a way to fix it. It does seem to fix in real time, but I need to confirm exactly what part of the text is being updated (posts have two different kinds of text stored in the database and now I’ve done some edits, I can check tomorrow to confirm what edit took place). The link to another user is actually part of the first text style, which moves into the second and is part of Discourse, and not a plugin like Flarum.

It’s a great little script, thank you again for doing it!

1 Like

Okay, I’ll continue to tweak it a bit. I’ve already noticed some issues. :p It’s very much an untested alpha build lol.

1 Like

I’ll take some time and get the Tampermonkey plugin for some fun with it too. Everything helps with this, so let me know what I can do to help.

1 Like

Looking through my post history, there are lots of posts with a functioning @ mention followed by a # and numbers (like here for example). I think they might not be picked up by your search here as you’ve included speech marks.

Sure, just try it out and let me know what’s broken! :p

1 Like

Those ones are, because I am not looking at the text displayed but the actual string in the database which is converted to be visible. What’s happened with that example is someone has edited it already to show the name, but left the number which in Flarum was linked to the original post that was being mentioned.

It’s all fun and games this. Kinda haha

1 Like