The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator: ~ AmeIiorate U T C @

Automatic or Manually Assisted: Automatic

Programming Language(s): AWB (possibly assisted by a C# app)

Function Summary: Escaping categories in user space pages

Edit period(s) (e.g. Continuous, daily, one time run): Sporadically Most likely once/twice a week.

Already has a bot flag (Y/N): N

Function Details: AilurophobiaBot escapes article categories in userspace pages, it does this by replacing and then filtered through AWB, although I am toying with the idea of using an app to automate this (unless someone can think of a better way to generate the list?)

It will not escape any category starting with "Wikipedia" (which includes "Wikipedian/s") or "User".

AilurophobiaBot escapes article categories that to not belong in userspace pages. It does this by replacing [[Category: with [[:Category:

The regex includes filtering to prevent it removing legitimate userspace categories, it ignores any category containing Wiki, Bot, User, task force (inc. taskforce), Possible, Candidates, proofreaders, translators, workgroup, admin or proxies (cases are insensitive). It also ignores specific categories; Category:Vandalism Control Network members, Category:Non-talk pages that are automatically signed and Category:The IC Star Recipients.

The lists of pages to edit are generated using <categorytree namespaces=User mode=pages>, an example of this is User:AmeIiorate/Badcats/Living People (a list of all userspace pages in the Living People category) api.php, example. The categories I will focus on are those that appear with a lot of pages on User:Ilmari Karonen's Badcats table. Basically, I will create a handful of categorytree pages (some can include multiple listings, such as all the "XXXX births" on one page) and check the pages periodically, when a category returns a sizeable backlog I will then have the bot clear it up. I wrote a C# app to alert me about new pages in Category:Articles with invalid date parameter in template, so I will change it to also alert when userspace pages are added to article-categories (such as Living people).

Discussion[edit]

You are right about that, so a change of scope; how about if it included categories rather than excluding them. So instead of escaping everything except X, it ignores everything except categories that are definitely article-only. It will just take a bit of time to set up the right categories. ~ AmeIiorate U T C @ 19:43, 18 August 2008 (UTC)[reply]
Sounds fine to me. In theory all categories appropriate for userspace should be tagged with ((wikipedia category)) and friends, but in practice we've still got some way to go there. Just remember to be careful not to make any mistakes, use friendly and informative edit summaries (something simple like "adding colons to category links" would be a good start), make it easy for users to find your talk page and try to stay calm when, inevitably, someone doesn't understand what your bot did to their user page and comes angrily complaining to you. In my experience (see e.g. here and here) most people won't mind helpful edits to their user page at all, but if you edit enough of them, eventually someone will complain. —Ilmari Karonen (talk) 21:58, 18 August 2008 (UTC)[reply]
The edit summary I had in mind was: escaping category link to prevent this page appearing in main article categories more info ... - question? (if this is approved the bot's userpage will be redone to explain in-depth what it does). I realise people are often quite possessive of their userspace but I am confident I can deal with any hostilities that arise. ~ AmeIiorate U T C @ 22:52, 18 August 2008 (UTC)[reply]
Scratch that, using defined categories is not feasible. Therefore, I propose to use Special:Recentchangeslinked to produce lists. For example, Changes related to "Category:Living people" (filtered to userspace). I will then readopt my original plan to replace

[[Category: with [[:Category:. At the moment the list of pages to edit will be generated manually through Special:Export [[Category: with [[:Category: along with some filtering to ignore certain categories, this is 'safer' because if a userspace page has been caught by the RecentChangesLinked result of an article-only category then it should be safe to bulk remove the categories (ignoring the incontestably userspace-only cats.) %27%27'~ AmeIiorate%27%27' U T C @ 11:47, 20 August 2008 (UTC)[reply]

You say "along with some filtering to ignore certain categories"... How will this filtering work? Details needed. – Quadell (talk) 13:53, 20 August 2008 (UTC)[reply]
Similar to before, only with a bigger scope. If the category name contains Wiki, Bot, User, task.?force, Possible or Candidates, it won't be removed - which should cover all categories that are userspace acceptable. ~ AmeIiorate U T C @ 09:53, 21 August 2008 (UTC)[reply]

Hmmm... is there a reason why a workflow like this wouldn't work?

  1. Obtain a list of all categories used on user pages, either from the toolserver or from the categorylinks database dump.
  2. Apply the exclusion rules you suggested above. This ought to narrow down the list quite a lot.
  3. Post the remaining list of categories on-wiki for manual review.
  4. Once the list has been reviewed, set the bot to work on it.
  5. Remember the results of the review for later runs, so that only categories that haven't been checked before need to be reviewed each time.

I could help you with step 1: I have both a toolserver account and some scripts for grepping database dumps. You can probably find people to help you with step 3 if there are too many categories for you to do it alone. —Ilmari Karonen (talk) 10:43, 21 August 2008 (UTC)[reply]

Here's a list of suspicious categories found on user pages, and here's the query to generate it (warning: takes a while to run). These are only from root user pages, not from subpages, and excludes any categories that match /[Ww]iki|[Bb]ot|[Uu]ser|[Tt]ask.?[Ff]orce|[Pp]ossible|[Cc]andidates|IP_address|[Ll]icen[cs]e|[Ss]ockpuppet/ or transclude any of ((User category)), ((Sockpuppet category)), ((Wikipedia category)), ((Educat)), ((UsersSpeak)), ((User language subcategory)), ((Userloc-2)), ((Userbox)) or any template beginning with "Usercat". The numbers tell how many user pages are in each category. —Ilmari Karonen (talk) 13:01, 21 August 2008 (UTC)[reply]
Excellent. Per that information, anything containing "proofreaders", "translators" or "workgroup" is now filtered, also Category:Rouge admins and Category:Vandalism Control Network members are excluded, they were the only two I noticed that shouldn't be removed. ~ AmeIiorate U T C @ 13:32, 21 August 2008 (UTC)[reply]
Probably should filter out anything with "admins", "administrator" or "proxies" in it, as well as Category:Accessibility advocates, Category:Birthday Committee, Category:The IC Star Recipients and Category:Non-talk pages that are automatically signed. By the way, here's the same list as a sortable wikitable. Feel free to edit it as needed. —Ilmari Karonen (talk) 05:23, 22 August 2008 (UTC)[reply]
Actually, it seems Category:Birthday Committee doesn't belong on userpages, but gets there via transclusions of Wikipedia:Birthday Committee/Calendar subpages that don't have the category wrapped in <noinclude> tags. Fixing that would probably be a bot job in itself, seeing as there are slightly over 366 of them. —Ilmari Karonen (talk) 05:29, 22 August 2008 (UTC)[reply]
Now filters anything containing: admin and proxies as well as The IC Star Recipients and Non-talk pages that are automatically signed. Also, if this is approved I'll file a request for a separate task to fix the cats on the Birthday Calendar pages. That sortable list is excellent, thanks. ~ AmeIiorate U T C @ 06:23, 22 August 2008 (UTC)[reply]

As a content editor, I'm aware that lots of people use userspace for sandboxes for articles before moving them into the mainspace when ready. It'd be irritating for them if your bot kept fiddling with Cats prior to page moving. You could address this by setting the bot to ignore User:Foo/Sandbox pages and subpages thereof. For those who use sandboxes but don't call them "sandbox" (!) you could assist by setting a time-related filter, to ignore recently created or perhaps those recently worked on? Just some thoughts. --Dweller (talk) 11:01, 21 August 2008 (UTC)[reply]

As it isn't an omnipresent "watchdog" type bot (like ClueBot) I can't see this causing problems. An article created in a userspace sandbox should only be categorised right before it is moved, so the category links would have to have been added just before the list of miscategorised pages is created and not moved before the bot got to it, so the way the bot would mess up here would require, either a significant delay before moving the page, or incredibly bad luck and timing. ~ AmeIiorate U T C @ 11:54, 21 August 2008 (UTC)[reply]
Yeah, I tend to agree with you. Userfied pages would fall foul, but I think there's an argument for supporting the Cats on those pages being "switched off". Re your comment about bad luck/timing, it'd be good to avoid this if possible; perhaps ignoring pages with very recent edits would cover it completely? --Dweller (talk) 12:04, 21 August 2008 (UTC)[reply]
It now works as follows: I make a list 24 hours before the bot run and right before the bot run. The two are compared, and only pages that appear on both lists will be edited. ~ AmeIiorate U T C @ 22:23, 24 August 2008 (UTC)[reply]

I have rewritten the full function details to clarify/outline what has been changed. ~ AmeIiorate U T C @ 10:47, 22 August 2008 (UTC)[reply]

((BAGAssistanceNeeded))

Approved for trial (25 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. It could also link some templates that place the page in a category. BJTalk 19:19, 26 August 2008 (UTC)[reply]

Trial complete. Done. ~ AmeIiorate U T C @ 21:39, 26 August 2008 (UTC)[reply]

((BAGAssistanceNeeded)) ~ AmeIiorate U T C @ 23:39, 31 August 2008 (UTC)[reply]

I can't approve this but I review all the edits and only found one mistake. [1] It also removed a newline in two of the edits for some reason. BJTalk 12:29, 1 September 2008 (UTC)[reply]
Skipping that category was by design, as it contains "Candidates", although I think that filter isn't necessary as any "candidates" category is put there through a template anyway. Not sure about the newline-removal, probably just an AWB quirk. ~ AmeIiorate U T C @ 12:56, 1 September 2008 (UTC)[reply]


The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.