The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was  Approved.

Operator: Kadane (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 16:10, Tuesday, March 19, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: Not published yet

Function overview: Tags redirects with ((R to disambiguation page)), ((R from unnecessary disambiguation)), and ((R from incomplete disambiguation)) if it meets criteria described in function details.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Tag_with_Template:R_from_unnecessary_disambiguation

Edit period(s): Monthly

Estimated number of pages affected: ~56,417 first run

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details:
Note: This BRFA only covers the functionality mentioned in Case 2. Case 1 and Case 3 have been stricken
Case 1: If a redirect exists Foo (bar) -> Foo where bar does not equal disambiguation AND Foo is NOT a disambiguation page, then tag Foo (bar) with ((R from unnecessary disambiguation))
Currently 39,963 articles fit this case


Case 2: If a redirect exists Foo (bar) -> Foo where bar does not equal disambiguation AND Foo is IS a disambiguation page then tag with ((R from incomplete disambiguation)).
Currently 16,427 articles fit this case


Case 3: If a redirect exists Foo (disambiguation) -> Foo AND Foo is a disambiguation page AND Foo (disambiguation) is NOT malformed, then tag Foo (bar) with ((R to disambiguation page))
Currently 27 articles fit this case


The following functionality/logic exists for all 3 cases:

Discussion[edit]

Comment @Kadane: The following should be tagged as ((R from incomplete disambiguation)) instead of ((R from unnecessary disambiguation))

Extended content

Those can be identified by the landing page being a disambiguation page.

This one should be skipped, or tagged with something else (investigating)

These ones should be skipped as malformed DAB pages (missing space, capital D), but collecting them so they can be RFD's would be good.

Headbomb {t · c · p · b} 17:11, 19 March 2019 (UTC)[reply]

Okay I have updated the functional details of the bot to fix the cases you brought up. I will update the table of edits when I make it home. Kadane (talk) 19:23, 19 March 2019 (UTC)[reply]
@Headbomb: I have uploaded new edits to User:KadaneBot/Sandbox. It contains 100 edits of each of the cases, with the exception of ((R to disambiguation page)) which only has 22 edits total. I have also included all of the malformed disambiguation pages (these will not be modified by the bot, just included in the log). Kadane (talk) 05:48, 20 March 2019 (UTC)[reply]

Better, although

Should be tagged with ((R from incomplete disambiguation)) instead of ((R from unnecessary disambiguation)). Headbomb {t · c · p · b} 09:31, 20 March 2019 (UTC)[reply]

@Headbomb: - There was an error in my CSV parsing from the database dump. I forgot to set the parameter quoting=csv.QUOTE_NONE, which resulted in some lines being skipped when the database query was being scanned. Because of this some articles and disambiguation pages were being ignored. This is fixed now. I clicked through most of the cases and I can't find any errors. User:KadaneBot/Sandbox is updated. Kadane (talk) 15:17, 20 March 2019 (UTC)[reply]
Of all cases, the following aren't really disambiguation pages.

Maybe a full list should be created so we can purge all cases that shouldn't be tagged. Everything else look fine though. Headbomb {t · c · p · b} 18:03, 20 March 2019 (UTC)[reply]

To save time, that full list to review could exclude things that end in \s\(.* (album|song|single|EP|soundtrack|network|channel|episode|series|film|journal|magazine|website|company|publisher|newspaper|company|station|decade|numeral|number|game|novel|book|gene)\) since those are safe. Headbomb {t · c · p · b} 21:02, 20 March 2019 (UTC)[reply]
Alright all edits have been saved with the of the articles that end in what you listed above removed.

Kadane (talk) 21:52, 20 March 2019 (UTC)[reply]

Case 3 are all fine, I'll review Case 1 and 2. Headbomb {t · c · p · b} 22:09, 20 March 2019 (UTC)[reply]
Actually Always(song)) and a few others with )) are malformed. Headbomb {t · c · p · b} 22:12, 20 March 2019 (UTC)[reply]

So are

Extended content
  • Ahmed Ali(footballer)
  • Always(song))
  • Blinded(movie)
  • Chris Collins(Politician)
  • City of Angels(TV Show)
  • Daredevil(comics)
  • Everlasting(BoA)
  • Expm(x)
  • Marlborough(car)
  • Molly(fish)
  • One in a Million(TV series)
  • Paul Hamilton(Footballer)
  • Point(unit)
  • Reckless(2010 novel)
  • Relentless(CD)
  • The Brothers(TV Series)

Headbomb {t · c · p · b} 22:19, 20 March 2019 (UTC)[reply]

Ah I was under the impression that we only checked malformed disambig on case 3 (when name ends with (disambiguation)). Updated the logic to check for malformed disambigs for all cases. Kadane (talk) 22:37, 20 March 2019 (UTC)[reply]

There are actually a few more, which I've sent to RFD.

Extended content
  • CCI (Prison disambiguation)
  • Euso (disambugation)
  • First Army (Poland - disambiguation)
  • Gary (disambiguation page)
  • Gradius 2 (Disambiguition)
  • Gradius 2 (disambiguition)
  • Lake Mamacocha (dosambiguation)
  • Lancaster (disambiguation page)
  • Le Mont (disambigution)
  • Momochi (disambuiguation)
  • Pook (disambiguaton)
  • Rizwan Ahmed (disambiguation page)
  • Roger Graham (disambituation)
  • Sarah Palmer (disambiguation page)
  • Shirani, Iran (dismabiguation)
  • Social Justice Coalition (disambiguation page)
  • St. Thomas' Church (disambigaution)
  • Ten (album disambiguation)
  • Tiw (disabiguation)
  • Upstage (disabiguation)
  • Victoria (geographical disambiguation)
  • Wanne (isambiguation)

Headbomb {t · c · p · b} 22:49, 20 March 2019 (UTC)[reply]

@Kadane:, actually could you break User:KadaneBot/Task3/Case 1 in sections of 100 KB tops? Those pages are pretty slow to load/edit (I have scripts that classify type of links, which slow down these pages considerably). Headbomb {t · c · p · b} 23:06, 20 March 2019 (UTC)[reply]

 Done @Headbomb: Also I am catching disambiguation misspellings as well as other words appearing next to disambiguation between parenthesis. If there are any other misspellings they should probably be excluded manually unless there is a pattern. Kadane (talk) 23:15, 20 March 2019 (UTC)[reply]

Could you also break down redirects into 'species', e.g. all those ending with \s\(*album\) into a subpage (or section), all those ending with \s(*song\) into another, and so on (and everything else considered "Other")? At least for endings in

All case insensitive. Headbomb {t · c · p · b} 23:18, 20 March 2019 (UTC)[reply]

@Kadane: and could you also put the target page in those lists? Headbomb {t · c · p · b} 23:21, 20 March 2019 (UTC)[reply]
I am on my way to class but I can do that in a couple hours. Kadane (talk) 23:23, 20 March 2019 (UTC)[reply]
No rush. Enjoy class. Headbomb {t · c · p · b} 23:24, 20 March 2019 (UTC)[reply]
@Kadane: any update? Headbomb {t · c · p · b} 20:41, 22 March 2019 (UTC)[reply]
Headbomb I got sick and fell behind. This is on my to do list today. Kadane (talk) 21:31, 22 March 2019 (UTC)[reply]

Okay all edits have been sorted by 'species' and a list of all pages can be found here. @Headbomb: Kadane (talk) 00:09, 23 March 2019 (UTC)[reply]

Approved for trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. - Let's start with everything in User:KadaneBot/Task3/Edits/other/Case_3. This is something that could safely be automated. Make sure to run on the most version of the pages, since things may be updated. Headbomb {t · c · p · b} 00:11, 23 March 2019 (UTC)[reply]

Headbomb - Come to find out Task 3 is already taken care of by RussBot and it ran through and tagged every article in case 3 with ((R to disambiguation)). I could run another database query to see if there are any cases that RussBot has missed, but a task for case 3 seems redundant. What do you think?
Also I made 1 trial edit[1] which resulted in an error because of a misplaced quotation mark in my code. Going forward it will check (correctly) to see if the category has been added since the last database scan. Kadane (talk) 01:20, 23 March 2019 (UTC)[reply]
If Case 3 is taken care of by RussBot, then let's leave it to RussBot. We can revisit this if RussBot goes dead. Let's trial case 2 on everything in User:KadaneBot/Task3/Edits/newspaper/Case 2 then. Headbomb {t · c · p · b} 01:23, 23 March 2019 (UTC)[reply]
Okay @Headbomb:. I found another error in my code for case 2 that resulted in articles that were already tagged being reported in the edit cases. I have fixed that bug and it has resulted in a large reduction of edits case 2. This error only affected the database scan and was caught during editing when the algorithm double checks it should edit.

I have completed the trial edits [2] [3] [4]. The rest were false positives. I am hesitant to mark the trial as done with only 3 edits.

May I suggest trialing either User:KadaneBot/Task3/Edits/cricketer/Case 2 (135 edits), User:KadaneBot/Task3/Edits/footballer/Case 2 (60 edits), or User:KadaneBot/Task3/Edits/politician/Case 2 (40 edits)? Kadane (talk) 01:47, 23 March 2019 (UTC)[reply]

I picked that category on purpose to see how it would handle those cases and not blow everything up. Side note [5]/[6]/[7] this is a much much better format. And while you don't have to do this, when making edits, you might as well add [8] if you find a #Whatever in the redirect. Headbomb {t · c · p · b} 01:51, 23 March 2019 (UTC)[reply]
For a follow up trial, you can do 25 edits in User:KadaneBot/Task3/Edits/other/Case_2/1. Headbomb {t · c · p · b} 01:59, 23 March 2019 (UTC)[reply]
You can do the rest of User:KadaneBot/Task3/Edits/other/Case_2/1/User:KadaneBot/Task3/Edits/other/Case_2/1 to see if all the kinks are worked out. Headbomb {t · c · p · b} 03:14, 23 March 2019 (UTC)[reply]

Small whitespace issues: [17], [18]. Headbomb {t · c · p · b} 04:55, 23 March 2019 (UTC)[reply]

Dupe disambiguation category: [19], [20]. Also [21].Headbomb {t · c · p · b} 05:00, 23 March 2019 (UTC)[reply]
Weird R catshell thing. [22] Headbomb {t · c · p · b} 05:02, 23 March 2019 (UTC)[reply]
Missed an R catshell opportunity [23]. Headbomb {t · c · p · b} 05:07, 23 March 2019 (UTC)[reply]
[24] the those with 'alternative' dabs should be likely be skipped. Or compiled in a seperate list for human review. Headbomb {t · c · p · b} 05:11, 23 March 2019 (UTC)[reply]
[25] should remove the dupe category for incomplete dabs. Headbomb {t · c · p · b} 05:17, 23 March 2019 (UTC)[reply]
One more: [26] (see all aliases)Headbomb {t · c · p · b} 05:23, 23 March 2019 (UTC)[reply]

For the whitespace issue, I think you can have something similar to \}\}\n+\{\{))\n(( and \n\n+\n\n. Headbomb {t · c · p · b} 05:29, 23 March 2019 (UTC)[reply]

@Kadane: if you're ready to continue trial, you can tackle User:KadaneBot/Task3/Edits/other/Case_2/3.Headbomb {t · c · p · b} 23:43, 27 March 2019 (UTC)[reply]
Okay everything is ready. I have several deadlines in the coming days and will run the trial when real life permits. Should be no later than Saturday 6th and I am hoping that it's much earlier than that. Kadane (talk) 01:16, 28 March 2019 (UTC)[reply]
@Kadane: Looks all good to me. Could you update the function overview section to reflect what the BRFA is for 'case 2' only? I'll approve after. Headbomb {t · c · p · b} 16:54, 15 April 2019 (UTC)[reply]
@Headbomb: done. Kadane (talk) 17:02, 15 April 2019 (UTC)[reply]
 Approved. Headbomb {t · c · p · b} 19:27, 15 April 2019 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.