The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator:Drilnoth (T • C • L)

Automatic or Manually Assisted: Automatic

Programming Language(s): AutoWikiBrowser

Function Overview: Cleans up various common errors in articles using the lists at WikiProject Check Wikipedia.

Edit period(s): Basically whenever I'm editing Wikipedia.

Already has a bot flag (Y/N): N

Function Details: Using AWB and my own RegExp (after it has been tested using the AutoEd script), DrilBot would use the daily lists at WikiProject Check Wikipedia to find and repair common errors, such as Unicode control characters, bold text in and colons at the end of section headings, missplaced categories and interwikis, and links to the current article. This would be done using the basic "general fixes" of AWB, which repair some of the errors, and custom regular expressions after they have been tested to ensure a minimum of false positives using the assisted editing script AutoEd. It would also do other cleanup at the same time... header name improvements ("Weblinks" → "External links", for example), link simplification and cleanup, and adding bullet points to external links. When running, anyone should be able to shut the bot off by posting on its talk page. I'd only run DrilBot when I'm around so that I can deal with any errors quickly. –Drilnoth (T • C • L) 16:27, 7 May 2009 (UTC)[reply]

Discussion[edit]

(n.b. I am a member of aforementioned WikiProject.) I suggest anyone interested looks at the main project page, to try to grapple with the scale of the problems on merely a day-to-day basis (new article feed etc.). Would you be prepared to actually watch it, or just be around? - Jarry1250 (t, c) 16:34, 7 May 2009 (UTC)[reply]

I am aware of the sheer number of errors there and how many more there are each day; I'd do my best to have DrilBot running more or less constantly while I'm editing to work on some of the backlogs which can be done automatically, so that human editors can focus on doing the things that can't be fixed by bot, like incorrect brackets and ISBNs. –Drilnoth (T • C • L) 16:36, 7 May 2009 (UTC)[reply]
Don't worry Drilnoth, I'm sure you know the scale of the problem! (It was the bot people around here I was primarily addressing, ;) ). Well, I've never found much room for false positives on some of the simpler tasks, as Drilnoth says. Just slightly too close to trial you myself, but it's shouldn't take too long to get the trial to make sure this is fertile ground for a bot, considering the possibility for false positives. - Jarry1250 (t, c) 16:44, 7 May 2009 (UTC)[reply]
Ah, gotcha. Yes there certainly will be some false positives... I won't deny that. AWB's general fixes have some changes which get some edits incorrect, e.g. moving all ((for)) tags to the top of the page, but that is uncommon enough and really is an error with the article which just isn't fixed correctly that I think the benefits would far outweigh the small number of false positives that the bot would get. –Drilnoth (T • C • L) 16:51, 7 May 2009 (UTC)[reply]

The Wikipedia community tends to be extremely intolerant of "bot false-positives", meaning automated changes that need to be reverted. A 5% false positive rate would be far too much, especially with a heavy volume of edits. With that in mind, I have some questions. One, are all these fixes things you've previously used AWB to fix under your own account, manually assisted? And two, if enough people complain about a certain class of false-positive, are you willing to suspend that function, even if you feel it's doing far more good than harm? – Quadell (talk) 19:06, 7 May 2009 (UTC)[reply]

I certainly don't think that there would be anywhere near 5% false positives. To answer your first question, yes. I have used AWB's general fixes a lot as can be seen in my contributions. Additionally, any custom-added regular expressions would first be tested out either with manual supervision in AWB or in the AutoEd script, so that it can be ensured that the change can be made reliably. To answer your second question, if anything is causing an at all unacceptable number of false positives I will not hesitate to stop the bot from making that change, regardless of my own feelings in the particular case. A false positive is worse than no edit at all, so deactivating a problematic change is the only logical thing to do. Of course, deactivating due to a single false positive doesn't really make sense in my mind, but if there are multiple complaints or concerns then that change should definitely be deactivated. Bots may be run and maintained by just one user, but I feel that their actions should really be determined by the community, not just one person. –Drilnoth (T • C • L) 19:52, 7 May 2009 (UTC)[reply]
Approving bots with super-generic "fixes"-type tasks is somewhat frowned upon now. What fixes, other than AWB general fixes will it be making? Also, bots doing solely AWB general fixes are denied. What makes the additional fixes significant enough that this needs to be done quickly with a bot, rather than just adding them to general fixes and letting people do them when they make more significant edits? Mr.Z-man 06:01, 8 May 2009 (UTC)[reply]
(added some more indentation). There are a few questions here, and I'll do my best to answer all of them.
1) Non-AWB fixes which I'd add would include things like better "unicodifying", removal of problematic Unicode control characters, and some more template/link cleanup as they are tested in AutoEd.
2) The additional fixes themselves aren't what I feel makes this sort of bot needed, rather it's the massive backlog at WP:CHECKWIKI. When you say "rather than just adding them to general fixes and letting people do them when they make more significant edits?", that isn't really what happens with AWB in relation to CHECKWIKI... you use the lists there to generate a list for AWB and then basically just run the general fixes through it. However, doing this still takes quite a bit of time, so the backlogs there are still building up. Since many of these edits don't require humans to actually look at the article too much, it would be a huge timesaver to have a bot do them. For example, about a week or two ago I ran AWB on a list of about 300 or 400 article which had the CHECKWIKI error "Link equal to linktext". While doing so, all that I really did is glance over what general fixes were done to make sure that there weren't errors and clicking "save". This still took probably close to an hour and a half. If a bot had been doing this, I could have been working on some of the other problems which can't be done as easily. In essence, right now those lists are fixed by just using AWB's general fixes without any "more significant edits" being done at the same time, but a bot could handle it faster. These edits also appear on watchlists and in recent changes when they really shouldn't need to be.
Since DrilBot would only use the lists created by CHECKWIKI, almost every article that it edits would have an error which it could fix, so there won't be a ton of edits that just fix things like whitespace or reference order, which I agree would be kind of useless. Right now User:D6 does some of the fixes, but I'm not sure if it should be since I can't find a BRFA for that task. I feel that having an approved bot to help manage the CHECKWIKI lists would make it much easier to maintain since then human editors could focus on the things which bots can't do rather than trying to manage the whole list. If there aren't many false positives and the bot is flagged to prevent its appearance on watchlists/recent changes, I don't really see how this could be problematic.
(also, as a side note, it looks like Lightbot's controversy was because it changed date formatting; DrilBot shouldn't be doing anything that could cause that much controversy since all of the fixes would be to known errors, not things like date-formatting which can vary from article to article). –Drilnoth (T • C • L) 14:13, 8 May 2009 (UTC)[reply]

Approved for trial (1 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Let's get a feel for the sorts of changes we're talking about here. – Quadell (talk) 15:17, 8 May 2009 (UTC)[reply]

Can do; thanks. –Drilnoth (T • C • L) 15:53, 8 May 2009 (UTC)[reply]
Report

The trial went almost perfectly; I'll just report the handful of false positives that I saw (and reverted or fixed):

Shall I continue the trial or stop for now? –Drilnoth (T • C • L) 16:48, 8 May 2009 (UTC)[reply]

Yes, you have 22 hours or so left to go. :) – Quadell (talk) 17:04, 8 May 2009 (UTC)[reply]
Okay; I wasn't sure if you wanted me to give occasional reports or just lump it all together at the end. –Drilnoth (T • C • L) 17:54, 8 May 2009 (UTC)[reply]
I can't speak for Quadell, but I would advise you make the most of the trial - perfect the regexes, get some edits done. Record your fixes (and what you could fix) and report back here at the end. Good going so far, by the way. - Jarry1250 (t, c) 17:57, 8 May 2009 (UTC)[reply]
Note

I think you did a lot of good work and WP:CHECKWIKI and I'd happily back any proposal that helps you. -- User:Docu 22:42, 8 May 2009 (UTC)

Thanks; I have a few comments here.
  • I can include which list is being processed, although sometimes I might just omit that part of the summary if I'm doing a short list (like 15-30 items).
  • Of course; I just didn't mention this one because, as I said, it's an extraordinarily rare error.
  • Most definitely; I've already mentioned a few things at WP:AWB/FR, although most of what I'd suggest are already implemented in the next version.
  • I had been under the impression that the "AWB assisted" problems required human attention to fully fix... is that not correct?
  • The way that I see it, if a general fix doesn't have many false positives it is beneficial to apply it at the same time as the CHECKWIKI edits. If a particular change seems prone to false positives I'll deactivate it the way you mentioned, which I already did with date reformatting.
Thanks! –Drilnoth (T • C • L) 01:24, 9 May 2009 (UTC)[reply]
Some of the unbalanced brackets issues (both square and curly) are fixed automatically (approx 20-30% in my experience), but most do indeed need human supervision. - Jarry1250 (t, c) 10:08, 9 May 2009 (UTC)[reply]
Ah; thanks for the info. –Drilnoth (T • C • L) 13:33, 9 May 2009 (UTC)[reply]
Report after trial

I'm going to be away for a few hours, so here's my report on the 500+ edits that DrilBot made yesterday (I checked all of them for errors):

I didn't notice any other potentially problematic edits. –Drilnoth (T • C • L) 13:32, 9 May 2009 (UTC)[reply]

Oops, hadn't seen this before: Trial complete.. –Drilnoth (T • C • L) 18:15, 9 May 2009 (UTC)[reply]
The ((for)) and ((dablink)) problem should now be fixed thanks to an AWB update. –Drilnoth (T • C • L) 15:05, 10 May 2009 (UTC)[reply]


This seems fine to me. If there are no objections in the next few days, I'm inclined to approve. – Quadell (talk) 15:34, 10 May 2009 (UTC)[reply]

Thanks. I've posted about DrilBot at WP:VPM per Jarry1250's suggestion on my talk page. –Drilnoth (T • C • L) 16:05, 10 May 2009 (UTC)[reply]

 Approved. Looks good. – Quadell (talk) 01:22, 12 May 2009 (UTC)[reply]


The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.