The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was  Approved.

Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 21:56, Monday, December 17, 2018 (UTC)

Function overview: Fix CW Error #86 (External link with two brackets)

Automatic, Supervised, or Manual: Supervised

Programming language(s): AWB

Source code available: AWB

Links to relevant discussions (where appropriate):

Edit period(s): Once a week

Estimated number of pages affected: 100 to 200 a week

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: The bot will use AWB to fix error 86 (External link with two brackets). The bot is going to remove the double brackets around the link. For example, [[http://www.google.co.uk]] will become [http://www.google.co.uk]. General fixes will be switched on. Spelling fixing is going to be switched off.

Discussion[edit]

Do you mean error #86? Primefac (talk) 22:00, 17 December 2018 (UTC)Reply[reply]

@Primefac: Yes, sorry for the confusion. Pkbwcgs (talk) 07:39, 18 December 2018 (UTC)Reply[reply]
Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.. Primefac (talk) 15:55, 23 December 2018 (UTC)Reply[reply]
@Primefac: Is it okay if I do the trial with general fixes switched on? The error 86 fixing is part of the general fixes but spell fixing will be turned off. Pkbwcgs (talk) 16:47, 23 December 2018 (UTC)Reply[reply]
Can you ensure that if #86 is fixed with genfixes on that it will skip the page? Primefac (talk) 17:13, 23 December 2018 (UTC)Reply[reply]
@Primefac: I can but it will mean a very few edits. It is better to do them with GenFixes on because fixing double brackets around weblinks is part of the genfixes. Pkbwcgs (talk) 17:20, 23 December 2018 (UTC)Reply[reply]
Okay, let me rephrase - are you doing this on a specific list of pages so that my above question will be moot? Primefac (talk) 18:08, 23 December 2018 (UTC)Reply[reply]
@Primefac: Yes, I am doing this on a list of pages. My list of pages is located here. Pkbwcgs (talk) 18:13, 23 December 2018 (UTC)Reply[reply]
Okay. Primefac (talk) 19:06, 23 December 2018 (UTC)Reply[reply]
I would ideally want to do it with general fixes because error 86 fixing is part of general fixes. I can't code in RegEx to make it do only double bracket fixing. I will see if there is anything in AWB which disables all general fixes apart from double brackets in weblinks fixing. Pkbwcgs (talk) 22:13, 23 December 2018 (UTC)Reply[reply]
@Primefac: I have good news. I have found the regular expression to do this task. Now I don't need to do general fixes anymore. Here is my regular expression:

Find: \[\[(https?://[^][<>\s"]+) *((?<= )[^\n\]]*|)\]\]
Replace: [$1 $2]
The bot is going to use that regular expression to complete this task. I have sharpened my programming skills in the last couple of days and I was practising regular expressions recently. Pkbwcgs (talk) 22:21, 24 December 2018 (UTC)Reply[reply]

Trial complete. That went okay but I wish it went better. The edits are located here and here but the RegEx either doesn't work on some pages, removes only one set of brackets when there is a set of three or more brackets or leaves a space before the last bracket. Can anyone please suggest changes to the RegEx. Pkbwcgs (talk) 16:26, 25 December 2018 (UTC)Reply[reply]
There are a lot of bad changes (I only made it through about 15 diffs), with the primary reason being "people doing stupid formatting", mainly of the "pipe not space" issue (e.g. 1, 2, 3, 4). While these are GIGO issues, they should still be corrected. I'll note that the first two are with genfixes, and the latter two are without, so it looks like AWB's own genfixes needs, well, fixing.

Your regex simply doesn't take into account the situation where someone uses pipes in an elink (e.g. [[https:google.com|Google]]. I think the best regex would be along the lines of \[\[(http.*?)( |\|)?(.*?)?]] and replacing with [$1 $3]. This should cover all of the junk mentioned above, but you'll need to go back over those 50 edits and fix all of the pipe-not-space elink errors (don't you have a bot task that does this already?). Primefac (talk) 19:17, 25 December 2018 (UTC)Reply[reply]

@Primefac: The first four edits were AWB genfixes, the rest of the edits are with my own regular expression. Pkbwcgs (talk) 20:27, 25 December 2018 (UTC)Reply[reply]
I like your expression with $2 representing the pipe which should be taken out so it can't be in the replace expression. Pkbwcgs (talk) 20:29, 25 December 2018 (UTC)Reply[reply]
Sometimes, I corrected the things manually as well when the RegEx wasn't doing the correct thing. Pkbwcgs (talk) 20:36, 25 December 2018 (UTC)Reply[reply]
I fixed everything in today's edits. Pkbwcgs (talk) 20:44, 25 December 2018 (UTC)Reply[reply]
Your regular expression still doesn't remove the pipe. Pkbwcgs (talk) 20:49, 25 December 2018 (UTC)Reply[reply]
Also, I don't have a bot task that handles pipes inside links. I am thinking of opening another BRFA soon that handles it but I need to come up with some RegEx for that. Pkbwcgs (talk) 20:54, 25 December 2018 (UTC)Reply[reply]
Good point, I missed a set of parens. Try \[\[(http.*?)(?:(?: |\|)(.*?))?]], replacing with [$1 $2]. Primefac (talk) 17:29, 26 December 2018 (UTC)Reply[reply]
@Primefac: That works! Pkbwcgs (talk) 18:12, 26 December 2018 (UTC)Reply[reply]

Approved for trial (25 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 18:15, 26 December 2018 (UTC)Reply[reply]

@Primefac: It is still doing it incorrectly sometimes. For example, if it is [[https://www.google.co.uk]], it is replacing it with [https://www.google.co.uk ] which is wrong. We don't need a space before the last square bracket. Pkbwcgs (talk) 19:03, 26 December 2018 (UTC)Reply[reply]
So put in a secondary find/replace for _] (_ used to indicate a space) and replace with ]. Primefac (talk) 20:13, 26 December 2018 (UTC)Reply[reply]
@Primefac: That is still not working properly. It is now unable to identify the link with double brackets. Pkbwcgs (talk) 20:28, 26 December 2018 (UTC)Reply[reply]
Well, the other option is to just do two find statements.
  1. \[\[(http[^ \|]*?)]][$1]
  2. \[\[(http.*?)(?:(?: |\|)(.*?))?]][$1 $2]
Do them in that order and it will catch everything. Primefac (talk) 17:59, 27 December 2018 (UTC)Reply[reply]
@Primefac: There is still that annoying space before the closing square bracket of the external link. Do you know how can I get AWB to perform the regular expressions in order like you stated. Pkbwcgs (talk) 18:58, 27 December 2018 (UTC)Reply[reply]
The order you put them into AWB is the order they'll run. Primefac (talk) 18:59, 27 December 2018 (UTC)Reply[reply]
@Primefac: Trial complete. That was better. Trial edits are located here. The RegEx worked out well for this task. Pkbwcgs (talk) 19:16, 27 December 2018 (UTC)Reply[reply]

 Approved. As far as the edits themselves, they're perfectly fine. The pages where they're found, and how they're used, are another matter entirely. I would suggest periodically piping the edit list to the MOS and GOCE wikiprojects so that they can fix them. Primefac (talk) 02:38, 28 December 2018 (UTC)Reply[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.