The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator:Quadell (talk) (random)

Automatic or Manually Assisted: Automatic, but with supervision

Programming Language(s): Perl, using Perlwikipedia

Function Summary: Fixes to references and external links

Edit period(s): In small batches (one category at a time)

Already has a bot flag: Yes

Function Details: The most prominent thing this bot will do is name external links, similar to the way that DumZiBoT does. There are a few differences: DumZiBoT runs on all pages using a database dump, while Polbot will run on one category's worth of articles at a time and will use live data. Also, DumZiBot changes "[http:www.example.com/subpage]" to "[http:www.example.com/subpage A Subpage]", whereas Polbot will instead change it to "[http:www.example.com/subpage A Subpage] at www.example.com" per feedback on my talk page. And DumZiBoT can handle PDFs, but I can't figure out how to do that, so Polbot will skip PDFs.

But wait, there's more! Polbot's 8th function will also make other improvements, to wit:

  1. Convert the <references /> tag to the ((reflinks)) template, as AWB does in its auto-fixes.
  2. If there are <ref> tags but no ((reflist)), it'll add a ((reflist)).
    • Where? In Notes or References sections if they exist. If not, then just before the earliest of External links, Sources, Further reading, See also, or just before the categories, or at the end if all else fails.
  3. Fix the double bracketing like [[http://www.example.com]], also inspired by AWB auto-fixes.
  4. Change [http://en.wikipedia.org/wiki/Example] to [[Example]] and [http://pt.wikipedia.org/wiki/Example this] to [[:pt:Example|this]].
  5. When there are bare links to IMDB, such as [http://www.imdb.com/title/tt0099700/], change them to use the various IMDB templates, such as ((imdb title|0099700|Gremlins 2: The New Batch (1990))).
  6. Turn previously-unnamed bare external links (BELs) to references by putting them in <ref> tags. This is so that pages don't intersperse numbered reference links[1] with numbered BELs (like [1]), when both are used as citations and have contradictory numbering schemes. It will ignore BELs in HTML comments or in the ((PDFlink)) template. It will also ignore BELs that are the only thing on a line. This is because...
    • example
      ...looks better, and is closer to the original editor's intent, than...
    • [2]
  7. If any two ref tags have the exact same content, it will merge them. E.g. "<ref>Example</ref>...<ref>Example</ref>" will become "<ref name=botgen1>Example</ref>...<ref name=botgen1 />"
  8. If there have been any changes needed due to the above, then Polbot will perform misc other cleanup tasks at no extra charge.
    • Fix miscapitalized headers (e.g. "See Also")
    • Fix mislinked dates and years (e.g. "[[20th Century]]")
    • Change <i> to '' and <b> to '''

References for examples[edit]

  1. ^ like this
  2. ^ [http:www.example.com example]

Discussion[edit]

Hey, BAGgers! It's been close to a week with no objections, and I've answered all questions. I hate to be a nag, but I'm going on vacation soon and I'd like to run a trial before I go, if possible. – Quadell (talk) (random) 14:31, 10 July 2008 (UTC)[reply]

((BAGAssistanceNeeded))

Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. BJTalk 19:20, 10 July 2008 (UTC)[reply]

Thanks! Okay, the trial is still ongoing, but I already found a few opportunities for improvement. First, there were way too many changes like this, which seems a tad piddlin', so I changed the code to only save changes if it has something more substantial to change. This edit incorrectly changed a book title, so I took that out; now it only changes complete links such as 20th Century. This was a flat-out bug. I fixed it. And this edit created a bot-generated title of "File Not Found", when the server neglected to return a status of 404; I have changed the bot to read "File Not Found" as a dead link. Developing... – Quadell (talk) (random) 22:31, 10 July 2008 (UTC)[reply]

This edit? Definitely something I want to avoid. Fixed. Also, what do I do about links that want me to sign in, like in this diff? Do I mark it as a dead link, or ignore it, or what? – Quadell (talk) 22:56, 10 July 2008 (UTC)[reply]
 Done Okay, the testrun of 50 is done. I fixed the obvious errors I saw. See any subtler errors, or suggestions for improvement? – Quadell (talk) 23:15, 10 July 2008 (UTC)[reply]

Current status[edit]

Trial is complete. Found errors have been fixed. There is still one open question: what to about links that want me to sign in or register? If no one has any better ideas, I'll simply ignore them. Anyway, I guess I'm just waiting for final approval, or any further suggestions on improvements. All the best, – Quadell (talk) 02:47, 14 July 2008 (UTC)[reply]

One thing before I approve this, replacing <references /> to ((reflist)) was wrong last time I checked but replacing <div class="references-small"><references /></div> with ((reflist)) was fine. BJTalk 21:52, 17 July 2008 (UTC)[reply]
What the code does is this: First it checks to see if the there is a <references /> surrounded by two <span>s (or <div>s, or one of each), with no other text inside those divs. If this is the case, the bot replaces the whole "<div (something)><div (something)><references /></div></div>" with ((reflist)) (or ((refilst|2)), or whatever, depending on the div parameters.) If a doubly-divved <references /> doesn't exist, it looks for a singly-divved one. Does the same. And if that isn't found, it looks for a non-divved <references />, and replaces it with ((reflist)). This way, formatting isn't removed from manually-entered references. For instance, this wikicode:
<div class=references-small>
<references />
* A manual reference
</div>
...is converted to:
<div class=references-small>
((reflist))
* A manual reference
</div>
...so formatting is preserved. This is the same logic that AWB now uses, since the AWB folks addressed the complaints on the issue. I haven't seen Polbot introduce any mal-formatting so far with this, but I'll keep my eyes open.
Having said this, this is a very minor part of the task, and I don't mind leaving that out if it would help the BAGgers sleep peacefully at night. Quadell (talk) 15:09, 15 July 2008 (UTC)[reply]

Questions from Dispenser[edit]

Some question from the DumZiBot BRfA

Unfortunately, I'll be at the H.O.P.E. conference over the next few days and wont be able answer your responses.

I'm on vacation too, just popping in. Thanks for those links and comments! In brief: it does append ((dead link)) to those links it happens to be looking up anyway to find the titles. The bot's soft-404 detection is rudimentary, and I expect to be adding to it as I review its changes. Its only blacklisted titles are the really obvious soft-404 detection, e.g. "file not found". That link to DumZiBoT's test cases is great! I'll get on that when I'm back in town. Also, the difference between this and DumZiBoT is that DumZiBoT only plows through database dumps, and it doesn't auto-cite books or magazines or journals, so it really has a different scope. There's some overlap in what they do, but really, Polbot#8 will effect different pages in different ways. We're collaborating though.
Okay, I'm diving back into vaca-mode now. Enjoy H.O.P.E.! I have several friends who are going, and are really looking forward to it. All the best, – Quadell (talk) 17:22, 17 July 2008 (UTC)[reply]

Update: HOPE is over, my vacation is over, and I'm going through DumZiBot's test cases. I'll let you know when I'm ready here. – Quadell (talk) 15:02, 24 July 2008 (UTC)[reply]

Awe, ready for the second volley. Number to correspond to the answer above.
  1. Does it also detect the wrappers that are sometimes around it. Like the two columns?
  2. Fine.
  3. Fine.
  4. Watch out to make sure you aren't doing [http://en.wikipedia.org/wiki/Category:Firearms Linked category] -> [[Category:Firearms|Linked image]]. And would the first example that you give change a number [1] to example? See also Wikipedia:AWB/FR#External to Interwiki.
  5. Is there a wiki page where users could see and add to the list of templates? Mind you it doesn't need to pull it from the page.
  6. Could I see the source already, because the one I uses doesn't work when the in a ref on the same line.
  7. Same comment as on DumZiBoT page, would like to a better ref names as editors ultimately don't change these names
  8. This probably isn't useful but if you wanted more you could always have AWB to run the perl script.
Did you rewrite all of DumZiBot's code in perl? If you did wouldn't it have been easier to call it the program? When will you releasing the source code? — Dispenser 21:15, 25 July 2008 (UTC)[reply]

I appreciate your questions, and I hope I'm understanding them all correctly.

These[http://www.example.com/1] are examples[http://www.example.com/2].
...into:
These<ref>[http://www.example.com/1 Title 1]</ref> are examples.<ref>[http://www.example.com/2 Title 2]</ref>
Is there something the code's not doing that you think it should?

Current status (2)[edit]

I think I've answered all the questions. I believe this bot is ready for Wikipedia, if Wikipedia is ready for this bot. – Quadell (talk) 13:12, 29 July 2008 (UTC)[reply]

 Approved. BJTalk 13:17, 29 July 2008 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.
  1. ^ Example Company