The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.

Operator: Babylonian Armor

Automatic or Manually assisted: Unsupervised

Programming language(s): Pywikipedia

Source code available: User:SheepBot/Code

Function overview: SheepBot has been created to add ((dead end)) to dead end pages, of which there is an incredible backlog.

Edit period(s): Continuous

Estimated number of pages affected: 360 pages per hour (at active times)

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): N

Function details: It will find pages from a compiled list at Jason's dead end page list, which is updated daily. It will then work on each page in the list, adding <nowiki((</nowiki>subst:|dated|Dead end)) to the pages that do not contain any wikilinks. It will automatically skip pages with maintenance templates already on them, as well as pages containing ((nobots)), or infobox-type templates (because these may contain wikilinks). The bot will appropriately handle categories, also.

Discussion

[edit]
Archived Discussion

I confirm that I assisted this user in the development and writing of this bot, in person. It was debugged rather thoroughly, and should be free of any major errors. Thanks, The Earwig (Talk | Contribs) 04:58, 4 September 2009 (UTC)[reply]

Quick code review:

  1. Why not use pywikipedia's "botMayEdit" function to check for ((nobots))?
     Done Babylonian Armor (talk) 17:37, 4 September 2009 (UTC)[reply]
  2. Your check for an existing "((dead end))" template is inadequate: it will only catch two of the 9 redirects. The case is similar for ((wikify)) and ((article issues)). A better idea might be to load the actual list of transcluded templates from the API with prop=templates (I'm sure pywikipedia has a function for this).
    We've changed this to use a list of the templates and their redirects and this problem has been solved. Babylonian Armor (talk) 17:37, 4 September 2009 (UTC)[reply]
  3. Should the bot also skip articles up for AFD?
    I decided not to skip articles up for AFD, as not all AFD articles are eventually deleted. Babylonian Armor (talk) 17:37, 4 September 2009 (UTC)[reply]
  4. As far as I know, not all "box" templates include the text "box" in their name. If you take the suggestion in item #8 in this list, you may be able to skip this check completely.
    I have fixed this by using the API see #8 below. Babylonian Armor (talk) 17:37, 4 September 2009 (UTC)[reply]
  5. If a page contains category links, the check for "box" templates will not be performed. Which would lead to the page being "falsely" tagged if it doesn't contain any non-category links.
     Done Babylonian Armor (talk) 17:37, 4 September 2009 (UTC)[reply]
  6. The bot counts images (and other files) and interwiki links as links; I don't know whether this is intended behavior or not.
     Done Babylonian Armor (talk) 17:37, 4 September 2009 (UTC)[reply]
  7. Also, BTW, your regular expression checks would false-positive on an article discussing a dead end street, or an article that happens to contain the word "box" in between two unrelated templates. But since either would lead to the bot not editing when it could, IMO that's not a major issue.
    You're right it's not a major issue, but it has been fixed regardless. Babylonian Armor (talk) 17:37, 4 September 2009 (UTC)[reply]
  8. One more foolproof way to check for links in a page is to use the API's prop=links (I'm sure pywikipedia has a function for this, too). To exclude things like ((fact)) linking to Wikipedia:Citation needed, you would use the API's plnamespace parameter or post-processing on the the "ns" elements in the API result to only count links to namespace 0.
     Done, this solves the problem of infoboxes because the API automatically expands templates and reads those links. Babylonian Armor (talk) 17:37, 4 September 2009 (UTC)[reply]
  9. I note the bot will continually re-check articles it has checked in the past, and will re-add its template if reverted. The former is probably a waste of resources, and the latter is certainly not a good idea. The bot should remember which pages it has checked in the past and skip them in future runs.
    SheepBot will maintain a text file that lists all the pages it has edited, and will ignore already edited pages in future runs. Babylonian Armor (talk) 17:37, 4 September 2009 (UTC)[reply]

HTH. Anomie 11:50, 4 September 2009 (UTC)[reply]

The way you're using the API here is rather wrong,

def queryLinks(title=""):
    params = {'action':'query', 'prop':'links', 'pllimit':500, 'plnamespace':0}
    params['titles'] = title
    data = urllib.urlencode(params)
    data = urllib2.urlopen("http://en.wikipedia.org/w/api.php", data)
    query = data.read()
    templates = re.findall('<span style="color:blue;">&lt;pl ns=&quot;0&quot; title=&quot;(.*?)&quot; /&gt;</span>', query)
    return templates

The HTML output is meant for programmers to read, not for bots to parse. You should use one of the real result formats. Since 2.6, Python has standard library support for XML or JSON (for Python <2.6 you need simplejson) and there are probably modules available for YAML and WDDX. Personally, I find JSON to be easiest to use. Mr.Z-man 17:59, 4 September 2009 (UTC)[reply]

 Done Updated said area to include JSON. Babylonian Armor (talk) 19:13, 4 September 2009 (UTC)[reply]
Suggest you upload the skipped list to sheepbot/Dead-end pages skipped and link to the page from strategic locations. Reason: the list of pages you are skipping will gradually grow until it crowds out the ones you aren't. They will probably be true dead-end pages some botphobe has reverted, and will probably only be ta handfull a day. If humans have access they will probably be able to fix them up. Rich Farmbrough, 09:44, 5 September 2009 (UTC).[reply]
SheepBot does not skip any dead end articles - unless it has already edited them. Babylonian Armor (talk) 13:14, 5 September 2009 (UTC)[reply]

Are there limits to what is considered a dead end article? In general I can't see why an article would fail to have any wikilinks, but I don't see the benefit in labeling articles already identified as problematic in one place. Who is going to edit and add wikilinks to the articles, and why are they not doing this arleady based on Jason's list? I don't see the benefit of labeling them over wikilinking them. Has this been discussed with wikipedia editors in general, and, if so, where? Why are these articles not being wikilinked? Is this just to categorize them? And if it's about categorizing them, who is dealing with articles in the category to get them out of it? --68.127.234.44 (talk) 03:17, 5 September 2009 (UTC)[reply]

Dead-end pages
Subtotals
Undated articles0
They are cleaned up by the dead-end pages cleanup project - they last emptied the categories on 2 Septemeber "Woot to them" . See progress table on the right. I have played with Jason's reports they could be very useful. Rich Farmbrough, 09:39, 5 September 2009 (UTC).[reply]
Indeed. The Dead-end pages cleanup project looks for pages with dead end templates, not pages without. It will also become more accessible to human users in future runs. And mystery man (68.127.234.44) the reason they aren't being wikilinked is because of the backlog in access of 1800 articles. And more articles may be added to the list every day. Eventually, I might attempt to program another task for SheepBot, to wikify articles that are dead end. Finally, rich, that isn't the largest amount of progress. Like I said, there is currently a 1,888 article backlog. Babylonian Armor (talk) 13:15, 5 September 2009 (UTC)[reply]
Well, that's the reason I support this task. By the way the progress box doesn't indicate those small numbers have been cleaned - they are the new arrivals since Sept 2, in their monthly categories. (Mostly caused by reversions and undeletions I hope, apart form the September group, which I assume are your work.) Since no BAG members are around may I boldly suggest you do a test run of 50? This is likely what they will ask for next. Rich Farmbrough, 09:30, 6 September 2009 (UTC).[reply]
Your answer is not very direct. However, I'm okay with the task as long as there is a group on the receiving end of these categorized articles. However, I also asked have you discussed it with them? Please post a link to the discussion or a notice to this discussion on the Dead-end pages cleanup project. --68.127.234.44 (talk) 04:42, 8 September 2009 (UTC)[reply]
I have yet to discuss it with them, but I plan to discuss it with them after a trial run (if a trial ever happens). I can then show them what SheepBot does, as so they would see the work in action, not merely hear fantasys of what it can do. Who knows... The Bot may never actually become accepted, so talking to them now would be rather pointless, but I admire your insight into the topic, and I'm glad your on wikipedia... you should create an account! Babylonian Armor (talk) 13:40, 8 September 2009 (UTC)[reply]
Oh another feature you may consider is adding the parameter to "article issues" if it's there, or creating an article issues template if there are going to be more than 2 suitable templates. Rich Farmbrough, 09:33, 6 September 2009 (UTC).[reply]
Don't worry about that. When looking at an article, SheepBot "scans" it for dead end templates, article issue templates and wikify templates. It was extensively covered, as you can see in the source code. [[User:Babylonian Armor|<span style="color:#990000">¿Babylonian Armor?</span>]] (talk) 16:40, 6 September 2009 (UTC)[reply]
As they say, please read what I said, not what you expected me to say. Rich Farmbrough, 17:22, 6 September 2009 (UTC).[reply]
The communication is indirect, but I think the operator is attempting to address concerns, to understand how to run the bot well. I have no more concerns at this time with this bot, however BAG members intend to handle it. --69.225.12.99 (talk) 07:24, 13 September 2009 (UTC)[reply]
I hope I can "Bump" this...

Approved for trial (20 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Mr.Z-man 00:46, 24 September 2009 (UTC)[reply]

Thanks. I'll get right on this. Babylonian Armor (talk) 12:42, 28 September 2009 (UTC)[reply]

Trial complete., Okay. The bot trial was smooth for the most part - it made it's 20 edits with no problem. Originally it messed up on 1950, but the bot corrected the error soon after, it also correctly skipped pages. I have listed here the pages that the bot skipped: A More Perfect Union (film), Alan Lorber, Ansty, Dorset. Babylonian Armor (talk) 15:03, 28 September 2009 (UTC)[reply]

The bot didn't correct its error with 1950, the bot's operator reverted the bot. It was a pretty substantial addition of a dead end template to an article with filled with dozens of wikilinks! Have you fixed this error from occurring again, Babylonian Armor? I would like to see it recoded for this error and tested again, whatever you find as the reason for the error. --69.225.5.4 (talk) 05:59, 29 September 2009 (UTC)[reply]
Oh, I see, you are the bot operator. It appears from the edit history the bot corrected the error. Still the error needs addressed, corrected. --69.225.5.4 (talk) 06:08, 29 September 2009 (UTC)[reply]
It was a mistake indeed - but at the time of the error - the page 1950 had been vandalized - stripping it of all links and replacing it with text somewhere along the lines of "1952 - Aliens invaded earth except for the great USA!". With no links, the bot saw it as a dead end. After ClueBot reverted the vandalism, and after I removed the wrongly placed template, it did not go back to 1950, and operated as it should have for the rest of the trial. I'd be happy to have another trial to prove that this was a simple vandalism-related error. Babylonian Armor (talk) 10:41, 29 September 2009 (UTC)[reply]
I'm not convinced. From the article history:
  • 21:40, 25 September 2009: 75.182.33.13 vandalizes the article
  • 21:40, 25 September 2009: ClueBot fixes it
  • 14:33, 28 September 2009: SheepBot makes its error
How did the bot get the text of a revision that was immediately reverted almost three days earlier? I don't see anything at first glance in the posted code that should have caused this, but deeper investigation is certainly required. Anomie 12:41, 29 September 2009 (UTC)[reply]
Measures of deeper investigation will be taken at your request - but seeing that it was a one time mistake (didn't repeat it again) I see nothing more than a rare glitch. As you said, there was nothing in the code that could've caused the small error. Whereas it is unlikely that I will find anything as to the reason of this mistake, I will investigate - and fix the code accordinglyBabylonian Armor (talk) 21:11, 29 September 2009 (UTC)[reply]
On the contrary, the code must have caused the error, or something worse. What other factors were in play that could have caused the error? If the bot is not in control of its operations, allowing outside factors to influence what is going on while it is operating, I don't think the bot should be run at all. I see you've agreed to investigate and correct. This is good. But, if the bot's code is not the source of the error, then something more serious is wrong. --69.225.5.4 (talk) 18:13, 1 October 2009 (UTC)[reply]
It could certainly be the case that there is nothing in SheepBot's code causing the error; there could be a bug in pywikipedia, or a bug in MediaWiki, or some odd caching somewhere in the rest of Wikipedia's infrastructure. It could also be a subtle omission of correct error checking for some unusual edge case that isn't obvious at first glance (which is all I personally gave it so far). At any rate, your polemics are unnecessary here at this time. Anomie 19:44, 1 October 2009 (UTC)[reply]
Or something worse, such as you list, a bug in pywikipedia, or a bug in MediaWiki, or incorrect caching, all of which could cause more errors to pages all over wikipedia, as this bot was simply confined to a test run.
Without proof of a serious error at a higher level, and that's not for the programmer to check in his code, it's easier to debug code with the assumption that the code caused the error. Assuming right off the code didn't cause the error requires prior knowledge of an existing bug. Again, not something for the programmer to deal with with his code.
Lol, "my polemics are unnecessary here at this time!" Lol, I'm guessing you meant to say something else, but hell if I know, so I'll get off the personal and back to the topic, with one more comment, if you really don't want polemics, sticking to the topic is an excellent method for not inducing them (WP:NPA contains excellent advice on staying on topic) . --69.225.5.4 (talk) 20:24, 1 October 2009 (UTC)[reply]
It is my belief that an API bug caused this problem. The problem nonetheless has been resolved. Babylonian Armor (talk) 01:39, 2 October 2009 (UTC)[reply]
Also, to reassure everyone, perhaps a second trial is in order? Babylonian Armor (talk) 01:41, 2 October 2009 (UTC)[reply]
My guess was that it was a MediaWiki error. The page was only vandalized for 6 seconds, yet the Toolserver tool and the bot both managed to get bad data. The toolserver tool is written in PHP, so a pywikipedia bug is unlikely. But then I looked at tools:~jason/deadend_pages.php, which has a link to bugzilla:17154 and would have saved me a couple minutes of thinking about it if I checked that first. Mr.Z-man 02:20, 2 October 2009 (UTC)[reply]
I confirm that bugzilla:17154 certainly would have caused the bot to make that error. Anomie 12:17, 2 October 2009 (UTC)[reply]
Yeah, second trial would be a good idea, imo, and it can't hurt. It's a better coding practice than guessing what the error is, if you can't specifically locate it, even if, sometimes especially if the source appears to be outside the code itself. --69.225.5.4 (talk) 05:51, 4 October 2009 (UTC)[reply]
Indeed - and in a polite fashion, I call upon any member of the BAG to approve a second trial (Hopefully the definitive trial). Babylonian Armor (talk) 00:36, 5 October 2009 (UTC)[reply]
Sorry to be a bother - but it would be nice to get that second trial :-D Babylonian Armor (talk) 02:42, 11 October 2009 (UTC)[reply]

((BAGAssistanceNeeded)) Babylonian Armor (talk) 18:39, 11 October 2009 (UTC)[reply]

Excuse me? Babylonian Armor (talk) 21:31, 20 October 2009 (UTC)[reply]

Hmmm, no one has handled this yet? Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Let's give it a good second trial. Anomie 12:32, 21 October 2009 (UTC)[reply]

Yikes - didn't see this - I'll handle it on the weekend! Babylonian Armor (talk) 01:41, 23 October 2009 (UTC)[reply]
((OperatorAssistanceNeeded)) Any progress here? Anomie 01:41, 2 December 2009 (UTC)[reply]
I will attempt to contact this editor off-wiki, and see if they can complete the trial. — The Earwig @ 17:48, 13 December 2009 (UTC)[reply]
((OperatorAssistanceNeeded))Any news? MBisanz talk 01:19, 3 January 2010 (UTC)[reply]
((BotExpired)) Mr.Z-man 20:45, 6 January 2010 (UTC)[reply]
Hey, I would like to re-open this bot request. I was hoping to pick up where I started with that second trial of 50 edits, I just need some sort of confirmation from the BAG! Babylonian Armor (talk) 02:59, 14 March 2010 (UTC)[reply]
Yep, go ahead! Tim1357 (talk) 03:37, 14 March 2010 (UTC)[reply]
A quick update: user had some trouble at first because his source for the dead end pages list was not updating properly. I've coded a quick replacement, tools:~earwig/reports/enwiki/deadend_pages.txt, to be updated hourly. It uses SQL to get a list of the first 50 dead end pages, and the bot will use this from now on until a proper fix can be coded in. — The Earwig (talk) 00:04, 16 March 2010 (UTC)[reply]
The second trial of 50 edits is now complete. Babylonian Armor (talk) 00:23, 16 March 2010 (UTC)[reply]
Trial complete. — The Earwig (talk) 00:24, 16 March 2010 (UTC)[reply]
Looks good. Why not take the list from http://en.wikipedia.org/wiki/Special:DeadendPages ? Pywikipedia has deadendpages(), but I guess that doesn't filter for not having the tag on it already. Josh Parris 03:13, 16 March 2010 (UTC)[reply]
Deadendpages is often out of date, it updates every few days now. It is much more efficient to run a query daily than to wait for Deadendpages. (X! · talk)  · @604  ·  13:29, 16 March 2010 (UTC)[reply]
What X! said. Babylonian Armor (talk) 16:24, 16 March 2010 (UTC)[reply]
 Approved. (X! · talk)  · @758  ·  17:11, 16 March 2010 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.