The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator: ThaddeusB

Automatic or Manually assisted: Manually assisted for a while to confirm weird page formatting doesn't screw it up then full-automatic.

Programming language(s): Perl

Source code available: http://thaddeusb.awardspace.com/NRHP.txt

Function overview: Add NRHP historic district categories to pages with ((infobox nrhp)) as per this request.

Edit period(s): Most likely one time

Estimated number of pages affected: ~10-20% of the just under 20,000 pages that use ((infobox nrhp)) or 2000-4000 pages.

Exclusion compliant (Y/N): N

Already has a bot flag (Y/N): N

Function details: ((infobox nrhp)) currently automatically places pages into Category:Historic districts in the United States if nrhp_type is set to hd or nhld. The National Register of Historic Places Wikiproject wishes to remove this functionality and instead place each HD in its respective State category (e.g. Category:Historic districts in New York). In order to accomplish this, they need a bot to go through & put each HD into a category or else most won't have any HD category at all.

Instead of just writing code to add the generic code to every page, I have written code to further help the project by automatically putting most articles directly into the correct state. The logic is as follows:

  1. Load list of pages that use ((infobox nrhp))
  2. Is the page a HD? If no, go to next page. If yes, continue.
  3. Does the page already contain Category:Historic districts in the United States? If so, temporary strip it out
  4. Does the page contain any other HD category? If so, save without redundant US category (if needed) & go to the next page
  5. Try to determine what state the HD is located in by looking at locmapin parameter of the infobox
  6. If that fails, try the location parameter of the infobox
  7. If that fails, try the text of the article's lead section
  8. If all else fails or 2+ states are possible matches, use the generic US category
  9. Save & continue with the next page

It will also make a log of all changes for possible rapid review by humans. The log of what it would do with the first 100 entries can be found here.

The bot has been programmed with the assumption that this discussion will result in the renaming of 4 non-standard categories. If for some reason that doesn't happen I will have to modify the code slightly.

Discussion[edit]

I have tested the first 150 or so results locally and found no issues. However, I plan to run this in "display each change locally before uploading mode" for a while just to be sure it isn't messing up when encountering unusual wiki formatting. --ThaddeusB (talk) 20:15, 17 September 2009 (UTC)[reply]

Approved for trial (20 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Mr.Z-man 00:48, 24 September 2009 (UTC)[reply]

Trial's fine with me, but I would like the outstanding issue of what is done about pages without locmapin or location parameters decided. --69.225.3.119 (talk) 01:09, 24 September 2009 (UTC)[reply]
The bot already outputs the result of each page it loads in a sortable table. If the WikiProject decides they wants the list you described - or any other list - I will be happy to generate said list from the output tables at that time. --ThaddeusB (talk) 13:07, 24 September 2009 (UTC)[reply]
The original request that I filed at Bot Requests — with which other project members had already agreed — was to have the bot add Cat:HDs in the USA to all of these articles. I'm sure that none of us have a problem with the idea of the bot becoming confused and dumping an article into the nationwide category. Nyttend (talk) 01:27, 25 September 2009 (UTC)[reply]
That seems like a workable solution, then these can also be checked by the project members to try to get a state on them, particularly if an output table is generated with all of them. However, the project is on top of this, and however they want to handle it is fine, as they've already raised the issue. --69.225.5.4 (talk) 21:47, 25 September 2009 (UTC)[reply]

Trial complete. - Log. The only issue was a typo in the edit summary that caused it to point to a non-existent page, rather than the log. --ThaddeusB (talk) 04:52, 27 September 2009 (UTC)[reply]

The trial edits look fine, imo, and the project that requested the bot is monitoring the output, so I don't see any concerns. --69.225.5.4 (talk) 18:20, 27 September 2009 (UTC)[reply]

Is it intentional to leave the pages in the United States HD category when adding the more specific HD cat? - Kingpin13 (talk) 04:47, 30 September 2009 (UTC)[reply]
I didn't catch that. It shouldn't be done-that's overcategorizing if the state HD categories are subcats of the US HD cat. --69.225.5.4 (talk) 04:59, 30 September 2009 (UTC)[reply]
Yuh, I suspect that because the infobox automatically places pages into this category (and that can and will be removed from the template, rather than the pages), ThaddeusB hasn't set the bot up to see if the category is placed directly onto the page. But that's just my guess :) - Kingpin13 (talk) 05:05, 30 September 2009 (UTC)[reply]
Any direct categorization should be removed like it did here. Do you have an example where it didn't?
However, every single page is going to be in the national category since it is added by the infobox. That functionality will be removed from the infobox after the run is complete. --ThaddeusB (talk) 12:01, 30 September 2009 (UTC)[reply]
Bot seems to have worked just as it should have, except for the log upload failure; is that a major problem? The bot edited plenty of pages where there shouldn't be any historic district category at all, but that's an issue for WP:NRHP to take care of: the problem is that some articles with infobox nrhp shouldn't have them. It's best to have the bot do like it's doing and not try to determine whether the infobox really belongs there, so (in my mind) this is more evidence that the bot is working well. Nyttend (talk) 12:41, 30 September 2009 (UTC)[reply]
The log upload failed due to me using the wrong variable name, which has been corrected. --ThaddeusB (talk) 13:39, 30 September 2009 (UTC)[reply]

 Approved. I was sure that this edit left a direct nation-wide category in, but obviously I must have misread one of the other cats. It appears that this task is wanted/useful, and the bot works great; none of us four seem to have a problem with the actual edits. Good to go - Kingpin13 (talk) 16:40, 30 September 2009 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.