The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.

Operator: Jamo2008 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search) created bot account called JBradley_Bot

Time filed: 19:36, Sunday July 15, 2012 (UTC)

Automatic, Supervised, or Manual: Automatic, with any page that doesn't follow the expected format being flagged and me coming back to it and updating it manually.

Programming language(s): Java

Source code available: Not currently available

Function overview: Insert 2010 census data into US city articles.

Links to relevant discussions (where appropriate): originally posted here (permalink) also you can look at my talk page I've had several different people talk to me about the edits I already carried out before being informed I'm suppose to apply for permission. Note a large number of those edits got reverted since I wasn't approved and I will make changes to my bot so that it will add the 2010 data as a new subsection rather than replace the 2000 data and I'll also add some more references into it.

Edit period(s): off and on I have to change links in it every time I switch states and look at the pages it flagged as not following the expected format to see what I should do with them manually.

Estimated number of pages affected: All cities in the US ~30k if I'm not mistaken

Exclusion compliant (Yes/No): No, I'm not aware of what this does, but if it's as simple as checking for that template anywhere in the article and not editing if it has it then I can easily add that.

Already has a bot flag (Yes/No): No

Function details: 1. It would check to see if there is a sentence in the opening paragraph of the page saying what the population was if was already using the 2010 data it would be left alone if it was still using the 2000 data it would be replaced with The population was #populationHere at the 2010 census.
2. It would update the area statistics in the infobox.
3. It would update the population statistics in the infobox, if the auto function is currently in use to calculate density I leave that there.
4. It would replace the sentence in the Geography section that says how much total, land, and water area the place has with a new sentence that follows the exact same pattern only with up to date info.
5. It would add a new sub section to Demographics section title 2010 Census, and insert 3 paragraphs similar to the ones used for the 2000 data using 2010 census data.
6. I would add a sub section title before the older data would like input as to what that name should be, 2000 Census makes the most since at first, but several cities have had their median income info updated to a more recent date and that info would also be in this section so that title might not make the most since.
7. For Iowa only I would be adding historical population tables, this wouldn't be automatic it would only assist me by taking me to the pages I'd be inserting tables by hand.

I was entirely unaware that you had to go through approval processes and what not I just thought It'd be a neat summer project to write a bot and update cities. I was originally just going to do it for Iowa, but then decided to keep going with it and ended up updating North Dakota, South Dakota, Nebraska, Kansas, and some of Missouri before I got the majority of my edits reverted and told I needed to apply first. I apologize for any rules I broke with that I was unaware of this process at the time. If approved I will do some more coding to it so that it behaves as above rather then just replacing the 2000 data as it currently does and I will add more reference to it rather than just referencing the paragraphs in the Demographics section. Jamo2008 (talk) 19:36, 15 July 2012 (UTC)Reply[reply]

Discussion[edit]

A few questions:
  1. How familiar are you with the Bot policy?
  2. Is there some reason why the source code isn't available? (note that this isn't a requirement, although it is preferred by some to allow better oversight of your code and assist with identification of bugs, and to fit with the whole open source philosophy)
  3. Are you using an existing bot framework for Java, or using your own?
  4. What exactly is the "expected format" you mention?
  5. Nyttend noted on WP:BOTREQ that your bot was "changing around geography numbers against the sources" - it sounds like that's what you'll continue to be doing in step 4. How will you ensure that the data is properly referenced?
  6. On step seven, you mention it takes you to those pages to do by hand, how is it doing this? Would those manually added tables be done by you through the bot's account or your own? Is the bot then fully automatic or manually assisted?

Also, I think this would need to be exclusion compliant; as a bot operator, you do need to know what this entails, although you are right in that it isn't too much more than checking for ((nobots)). That template's documentation page provides more information as well as code snippets you can use. Hersfold non-admin(t/a/c) 15:33, 18 July 2012 (UTC)Reply[reply]

  1. I've read it once so I'm somewhat familiar with it.
  2. It is on Google code svn if anybody really wants to look at it.
  3. I coded it from scratch.
  4. Lots of little things for example I check for about 30 different sentences in the opening section of the city page that would either have updated data or old data that needs replaced, and If my code doesn't find any of those sentences it will throw an exception and not edit the page. Another example is certain fields in the infobox such as |population_density_sq_mi aren't found it will throw and exception and not edit.
  5. The reference that is usually in the Geography section links to a page with the 1990, 2000, 2010 gazetteer files. If you click on the 2010 link and go to places that were my data is coming from. My guess is he's going to the 2000 data since that is what the reference originally was meant for, I wasn't changing it because I'm using the same page you just have to click on a different link. I can simply add another reference to the end of my sentence to be completely explicit that I'm using the 2010 data if needed.
  6. So for both the tables and any page that has an exception I have a cleanup mode where it reads from a file of links and I simply click a button and it takes me to the next page when I'm done. For tables I'd just put a file with all Iowa city links in it and it would take me to them and I'd hit another button that generates the code for the table and it then puts it on my paste buffer so all I have to do is click where I want it and paste it in. Cleanup of pages with exceptions works roughly the same way it takes me to the page tries to edit it then has popup boxes come up and tell my what went wrong. Since the editing process is split into parts the parts that didn't have exceptions at that point will be correct and I simply go to the sections that had a problems and fix it by hand, which usually involves clicking a button to generate the needed sentence or paragraph and pasting it in. I can do this from either account it doesn't matter to me just let me know which one you'd prefer me to do it on.Jamo2008 (talk) 17:05, 18 July 2012 (UTC)Reply[reply]
  1. Good. :-)
  2. Just wondering, not too fussed about it.
  3. Although now that you mention that I am a bit curious to see your implementation (all of my bots are in Java)
  4. Hmm. Ok. I suspect you may end up getting a lot of pages in your "malformed" list, though.
  5. You need to update the reference to link to the correct page. This is an expectation across Wikipedia - a reader shouldn't have to hunt for the information, it should be directly available on the linked page.
  6. Interesting. If the code is generated by the bot, I don't see an issue with it being done from the bot's account. My concern there was that you would be manually entering code and saving it as the bot; any issues may then be misattributed to a problem with the bot rather than operator error. I would like some indication in the bot's edit summaries when it is running in this cleanup mode, however, as it sounds as though that would be closer to manually assisted editing (a la AutoWikiBrowser) as opposed to it running off on its own.
I'd expect that this is probably good for some trial runs with the requested changes, but let's wait for a few more comments first. Hersfold non-admin(t/a/c) 18:06, 18 July 2012 (UTC)Reply[reply]

This is more or less just a post to get it on peoples watch list, but I do need to do some coding on this and I'd like to have a yes or no on the trial before I put the time into doing that.Jamo2008 (talk) 02:32, 24 July 2012 (UTC)Reply[reply]

Has the bot been changed to update references to the 2000 census data when it changes that data? Thanks, — madman 18:25, 24 July 2012 (UTC)Reply[reply]
The current reference associated with the 2000 census data paragraphs in the Demographics section won't need to be updated since it is going to be tput into its own subsection, and the new paragraphs added for the 2010 data do have a reference in them. In the Geography section the reference is after the sentence that gives the coordinates for the location, and I'm not going to alter that sentence so I won't change that reference either, instead I'll add a new reference after the sentence giving the area statistics since that is the one I'll be updating. In the info box I will add references to the footnotes section of the area and population sections. Most of these changes still need to be coded, I'm waiting for the yes or no on a trial before I go ahead and do that so I don't end up wasting my time if I get turned down. Also as a side note I've noticed that a template such as ((GR|2)) is currently used for referencing, is there anyway I could adapt this and or make my own rather than use regular reference tags for everything?Jamo2008 (talk) 00:03, 25 July 2012 (UTC)Reply[reply]
That sounds all right, I think. I'll have to see the results of the trial to be sure. I suspect as Hersfold does that you're going to end up with a lot of pages in your "malformed" list, but you are Approved for trial (10 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. (only the automatic changes, please) as soon as you incorporate the changes discussed above (including exclusion compliance, as these edits have the possibility of being contentious). Thanks, — madman 19:17, 25 July 2012 (UTC)Reply[reply]
Sounds good I'll post here when I get done with my changes and run it on 10 pages.Jamo2008 (talk) 01:18, 26 July 2012 (UTC)Reply[reply]

I've ran the 10 edits for my trial on my bot account JBradley_Bot. I had one issue that involved having to type in a word on my edits since I was adding a new reference, if I have to do this for every page my bot won't work, but if this goes away after so many edits or something it will be fine. Also there is a feature that I commented out for these edits (although none of these pages had this problem so it didn't really matter) that I would like to explain to see if its okay if a uncomment it. I'd estimate that 80% of the pages not in a major metro that I've come across that have the paragraphs with census data in the demographics section claiming to be from 2010 really aren't the 2010 data. What has happened for many of them is somebody comes in and updates the population then changes the census date and leaves all other data the same. This causes a ton of pages to come back as malformed so I wrote code that if the 2000 paragraphs aren't found it looks for ones claiming to be from 2010 then checks the number of households and if it is correct it puts it in the malformed list so I can look it over to be sure and if the number is wrong it just deletes the paragraphs and replaces them with mine.

As of posting this it appears there is some server lag and bots history isn't showing the edits so I'll list the ten cities I did. Woolstock, Iowa Rowan, Iowa Goldfield, Iowa Manly, Iowa Kensett, Iowa Joice, Iowa Hanlontown, Iowa Smithland, Iowa Sloan, Iowa Salix, Iowa
Marking this Trial complete. so it can be reviewed by me or someone else later. — madman 21:51, 5 August 2012 (UTC)Reply[reply]
The trial looks very good. The only change I would request is that the 2010 Census section use the ((convert)) template as the 2000 Census section did, so if a change were made to the template it'd affect all the a contributor might expect and there wouldn't be inconsistent content between sections. If you'll make that change, I see no impediment to this task being approved. Thanks, — madman 04:08, 6 August 2012 (UTC)Reply[reply]
I made two changes the first is the population density stats in the first demographic paragraph and the second was to the area stats in the sentence in the geography section here is what the changes look like.
The [[population density]] was ((Pop density|168|1.06|sqmi|km2)).
The population density was 158.5/sq mi (61.2/km2).
According to the [[United States Census Bureau]], the city has a total area of ((convert|13.44|sqmi|sqkm|2)), of which, ((convert|10.8|sqmi|sqkm|2)) of it is land and ((convert|2.64|sqmi|sqkm|2)) is water.
According to the United States Census Bureau, the city has a total area of 13.44 square miles (34.81 km2), of which, 10.8 square miles (27.97 km2) of it is land and 2.64 square miles (6.84 km2) is water.Jamo2008 (talk) 02:52, 8 August 2012 (UTC)Reply[reply]
Excellent.  Approved.madman 14:26, 8 August 2012 (UTC)Reply[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.