The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was  Approved.

Operator: DannyS712 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 07:55, Tuesday, March 5, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Solve CW Error #17 - Category duplication

Links to relevant discussions (where appropriate): Wikipedia:Bots/Requests for approval/PkbwcgsBot

Edit period(s): One time run

Estimated number of pages affected: ~8000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Currently, PkbwcgsBot only fixes a maximum of 300 instances of this error per week. While this certainly helps with the backlog, I'd like to do a one-time run to clean it out. Using AWB, I would do find-and-replace on the regex (\[\[Category:.*\]\])((.|\n)*)\1\n, replacing it with $1$2. I did a few of these manually to perfect the regex (eg [1], [2], [3]). While gen-fixes would fix this issue, they would not be activated, so no other edits would be made.

Discussion

[edit]

Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 19:42, 10 March 2019 (UTC)[reply]

@Primefac: Should I have AWB autosave, or hit save manually? --DannyS712 (talk) 19:55, 10 March 2019 (UTC)[reply]
Does it matter? The results are the same. Primefac (talk) 19:55, 10 March 2019 (UTC)[reply]
Trial complete. - 50 edits made. [4] (search for "Category duplication"). 2 issues: marking the pages as fixed within the wikiproject (I posted on the discussion page to figure out if the list is regenerated, or how to mark the pages automatically from AWB); and the regex doesn't work if the categories have different sort keys. So, the current regex would work on ~2300 pages. Once I finish those, I can look into a different regex that removes the second instance of a category, even if it has a different sortkey, but that is a separate issue, and should probably be a separate task --DannyS712 (talk) 21:30, 10 March 2019 (UTC)[reply]
Update - marking as done taken care of - automatically updated at the end of the day, so issue 1 is unneeded. Isuse 2 doesn't prevent the bot from running, but rather just limits the scope, so as far as I can tell I should be able to run the bot overnight (once its approved). Forgot to @Primefac last time. Thanks, --DannyS712 (talk) 22:33, 10 March 2019 (UTC)[reply]
So does that mean the bot skips pages that have duplicate cats but different sortkeys? Primefac (talk) 22:35, 10 March 2019 (UTC)[reply]
@Primefac: yes, as it currently stands, I have the bot skip pages where no changes are made. The only change that is made is based on the find-and-replace regex, which relies on either identical sortkeys or having no sortkeys at all. --DannyS712 (talk) 23:15, 10 March 2019 (UTC)[reply]

(\[\[Category:[^|\]]*)((?:\|[^\]]*)?\]\])((?:.|\n)*)\n\1(?:\|[^\]]*)?\]\]\n?

Which is replaced with $1$2$3. This is, as you said, more reader-useful. What do you think of an extended trial? --DannyS712 (talk) 03:16, 14 March 2019 (UTC)[reply]

As a comment, I'm the one that added "Technically cosmetic, however this is either deemed too much of a bad practice, or prevents future issues deemed egregious enough to warrant a deviation from WP:COSMETICBOT." back then. And the reason is that I felt this is a future-proofing situation, because someone that wants to update a sort key might only do it in one place, and it won't kick in because there's a dual listing of the category. Or they might remove the category in one place, thinking they removed the category from the article, unaware there's a duplication of it. This wasn't RFC'd or BRFA'd before however. Headbomb {t · c · p · b} 23:46, 20 March 2019 (UTC)[reply]
Also is there a particular reason why genfixes are disabled for this? They'd seem worth making on top of the main task, IMO. Headbomb {t · c · p · b} 23:49, 20 March 2019 (UTC)[reply]
@Headbomb: I'd prefer not to automatically run genfixes, but if you'd like them enabled I can supervise an extended trial --DannyS712 (talk) 00:11, 21 March 2019 (UTC)[reply]
In my experience genfixes have been pretty stable and well tested for a while now. But it's your bot, so it's your call ultimately about whether or not you want to enable them. It just seems to me that if you're going to make some genfix-like edits (duplicate category removal is covered by them after all), you might as well enable the full suite of genfixes. Headbomb {t · c · p · b} 00:15, 21 March 2019 (UTC)[reply]
@Headbomb: in that case, sure. Would you be willing at approve an extended trial with both regexes (to also fix category duplication) and also genfixes? --DannyS712 (talk) 00:17, 21 March 2019 (UTC)[reply]
Approved for extended trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I'll approve for further trial, but since I'm the one that added "Technically cosmetic, however this is either deemed too much of a bad practice, or prevents future issues deemed egregious enough to warrant a deviation from WP:COSMETICBOT." back then, I'll recuse myself from final approval. Headbomb {t · c · p · b} 00:21, 21 March 2019 (UTC)[reply]
@Headbomb: Trial complete. 50 edits, see [5] - for the first 10, I forgot to enable genfixes. I didn't see any errors, except for one where there were multiple repeated categories, which I fixed. Thanks, --DannyS712 (talk) 00:56, 21 March 2019 (UTC)[reply]

How does the bot handle cases like this [6]? Should it? Headbomb {t · c · p · b} 01:08, 21 March 2019 (UTC)[reply]

@Headbomb: I don't really understand the first question - that case is a bot edit, and I think it should handle it exactly as it did. --DannyS712 (talk) 01:20, 21 March 2019 (UTC)[reply]
There are two clashing sortkeys. How does the bot decide which to remove? Headbomb {t · c · p · b} 01:25, 21 March 2019 (UTC)[reply]
@Headbomb: it always removes the second instance of a category. If one or both have sortkeys, it still just removes the second instance of the category, and keeps the first, regardless of if the second had a sort key and the first didn't, etc --DannyS712 (talk) 01:28, 21 March 2019 (UTC)[reply]
@Xaosflux: does running it with genfixes and fixing sortkeys too allay your concern about cosmetic-bot? If so, would you be willing to approve this? DannyS712 (talk) 02:39, 22 March 2019 (UTC)[reply]
"Adding genfixes" is never a selling point for me, maybe one of the other BAGers. — xaosflux Talk 02:44, 22 March 2019 (UTC)[reply]
Well, the genfixes are not garanteed to be made, so it's rather moot. The real thing to look at is whether future-proofing is enough of a reason to make the edits. Headbomb {t · c · p · b} 02:28, 23 March 2019 (UTC)[reply]
@Headbomb: I believe that it is, since it also enables users to set sortkeys that actually work (in addition to fixing the error itself, and any genfixes) --DannyS712 (talk) 03:19, 26 March 2019 (UTC)[reply]
((BAGAssistanceNeeded)) the trial has been over for almost a week --DannyS712 (talk) 06:26, 28 March 2019 (UTC)[reply]
I'm actually not thrilled with the idea of removing same-category-different-sortkeys, since in my experience people (incorrectly) try to add new ones when they want to correct the sortkey. I'm actually a little surprised no one commented on this. Are you just assuming that the second one is wrong? Primefac (talk) 14:03, 7 April 2019 (UTC)[reply]
@Primefac: yes. However, I could try to change it so that it removes the first category rather than the second, if that would be better. --DannyS712 (talk) 17:22, 7 April 2019 (UTC)[reply]
Falls afound of CONTEXT. I'd rather just see them left as-is for someone to adjust manually. Primefac (talk) 17:22, 7 April 2019 (UTC)[reply]
@Primefac: then I can go back to the original regex that skipped sort key collisions - I only changed it because of Xaosflux's suggestion above. --DannyS712 (talk) 17:25, 7 April 2019 (UTC)[reply]
That's probably for the best. Collision could be logged somewhere else though, and this task focus on preventing collisions, rather than fixing them. Headbomb {t · c · p · b} 16:51, 10 April 2019 (UTC)[reply]
@Headbomb: Once I make the run, all remaining WCW errors would be due to collisions, so I don't really see the need to log --DannyS712 (talk) 17:06, 10 April 2019 (UTC)[reply]
((BAGAssistanceNeeded)) I can do this with skipping pages with different sortkeys, or with removing the second instance of a category regardless of different sortkeys, or remove the first, but I'd like to do it. Can one of the options please be approved? Thanks, --DannyS712 (talk) 21:15, 14 April 2019 (UTC)[reply]

 Approved.. The option approved is skipping pages with different sortkeys. I would personally like to see a log that could be addressed at a later date (perhaps with a new BRFA), but if you have a different way that still catches them all and could be listed to look through in the future, then I guess I would be okay with that. Sorry for the delay, I have been tied up lately with off-wiki matters.

If you would like to request any amendments or clarifications, post on the talk page and ping and we can go from there. --TheSandDoctor Talk 21:35, 26 April 2019 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.