The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator: Fritzpoll (talk)

Automatic or Manually Assisted: Supervised automatic (with additional safeguards - see below)

Programming Language(s): VB .NET using the DotNetWikiBot classes

Function Summary: To insert stub articles for all remaining villages and towns in the world without exisiting articles - essentially to stop User:Blofeld of SPECTRE and others having to do it manually.

Edit period(s) (e.g. Continuous, daily, one time run): Limited to a certain number of countries per day (which will be selected manually by operator) to run at any time deemed suitable. Lifetime of the bot limited by finite number of countries requiring indexing

Already has a bot flag (Y/N): N

Function Details: The bot will use a website www.maplandia.com to index the National Geospatial Intelligence Agency http://earth-info.nga.mil/gns/html/index.html files that I have downloaded to my own computer. The bot will then create a subpage of Wikipedia:WikiProject Missing encyclopedic articles/Places/COUNTRYNAME where COUNTRYNAME is the country that has been processed. It will list the places in a comma-delimited format:

[[English name(article title)]],original name,district name, latitude, longitude

Because the list will feature many repetitions of district a simple Find/Replace will allow adjustments to make sure the districts are properly wikilinked to existing articles. Once the directory has been created and the list has been checked at the various pages of Wikipedia:WikiProject Missing encyclopedic articles/Places, the operator will run a second piece of code to translate the list into a stub article, which will have the form of User:Fritzpoll/GeoBot/Example as used by Blofeld and others in their existing work. The bot is thus only able to create articles from complete, checked and sourced data as checked by other editors through a two-step verification process.

I am happy, if deemed necessary, to manually confirm any actions the bot takes, and naturally am able to restrict the bot for any trial BAG may wish me to run.

Discussion[edit]

I would strongly support such a proposal. Now we caan start to concentrate on expanding and notr creating such articles. This would be a massive operatiopn and I would not be suprised if the number of artuicles excedes 3 million within a month of bot approval. I'm an Editorofthewiki[[[Wikipedia:Editor review/Editorofthewiki|citation needed]]] 21:14, 19 May 2008 (UTC)[reply]

Naturally I would strongly support this bot also given that is it extracting coordinates from National Geospatial Intelligence Agency which is as great a site for obtaining names and coordinates as any but using maplandia district organization for guidance. Maplandia cannot claim copywright as technically there is no "copying" being done from their website as the places names are taken from google maps, and anyway the names and coordinates are clearly public domain. Based on the NGIA coordinates and names it is clear this is valid. If this is approved it will be one of the most important developments in wikipedia's history if we can get an article set up for every world location. It would strengthen this encyclopedia very powerfully, particularly if most of the articles can be xpanded and developed from nationla government sources at a later date. In all honesty this should have been considered five years ago. The way Fritz has proposed this is top class I have to say, dare I say it "genius"? ♦Blofeld of SPECTRE♦ $1,000,000? 21:31, 19 May 2008 (UTC).[reply]

Endorsement by Keeper I'm a bit of a fish outta water here, being completely technologically dumb, and Bot/Bag inexperienced, but from what I can read from Blofeld, one of our most prolific article creators, and from what I can tell from the hard work of Fritzpoll, this appears to be one of the more astounding and important bot requests I've ever seen. Why? Because it expands Wikipedia into a comprehensive, geographical encyclopedia. It is a bot request for the benefit of our readers, and not just for the efficiency of our writers. This is an excellent bot proposal. I have nothing but confidence that this will be an astounding achievement, saving valuable editor's time, and making Wikipedia the go-to place for geographical information, something it should have been long ago. Just my two cents. Keeper | 76 | Disclaimer 21:38, 19 May 2008 (UTC)[reply]

Support - I am in no way qualified to address the technical issues regarding the bot, so I won't try. I do however think that a bot to perform these functions would be extremely valuable in several ways. One, obviously, it would create articles for all the occupied locations of the world. By so doing, it would also make it easier for individuals writing articles which deal with such locations easier, as they won't have to try to find an approximate location to put in the article if the article on that specific location already exists. And, of course, it would do all that automatically, saving all of us who have tried to do such work by hand the effort of having to individually create every article. It would also alert any WikiProject or other group that has intersst in that area that there is good cause to think that the subject is notable, something that has occasionally been argues in the past. Obviously, it is only a first step in the development of those articles, but an incredibly important first step in any event. John Carter (talk) 21:46, 19 May 2008 (UTC)[reply]

Obviously I gave advice on developing this idea and such, so I can't wear my BAG hat here and approve it. I can however say I think its a good idea and would support say a 100 article creation trial if another BAG approved. MBisanz talk 22:21, 19 May 2008 (UTC)[reply]

Good, clear up Special:Newpages. sounds useful, and as said above, help the quality. Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Soxred93 (u t) 22:23, 19 May 2008 (UTC)[reply]
Thanks a lot - will be back once 100 articles have been completed Fritzpoll (talk) 22:24, 19 May 2008 (UTC)[reply]
Well, the 100 edits were completed and checked. The checked bot output is at Wikipedia:WikiProject Missing encyclopedic articles/Places/afghanistan. The code won't let me create a page unless the account is bot flagged, presumably as a safety feature. The bot correctly rejected the bluelink at the top of the page, and I have extracted the wikitext output that the bot would have inserted to the new page at User:Fritzpoll/template/test. I'll allow the others to comment on the usability of the process. Where do we go from here? :) Fritzpoll (talk) 22:51, 20 May 2008 (UTC)[reply]
I like the sample, however, would it be OK if you can give a URL for each National Geospatial Intelligence Agency page? It's just a style preference, and it would probably look better as [http://NGIAwebsite.gov/whatever ((PAGENAME))] from the [[National Geospatial Intelligence Agency]]. Perhaps we could create a template so that all the bot has to do is type everything past the / or whatever. I'm an Editorofthewiki[citation needed] 00:13, 21 May 2008 (UTC)[reply]
Problem is that there isn't a single page per country. If you go to their website, it is just a series of download links to the files that contain a lot of this data. I then use maplandia to index this (the NGIA data has a lot of natural features as well) and then make the page. So we can't give a single source, but maybe we can direct the link to the exact page? Fritzpoll (talk) 00:25, 21 May 2008 (UTC)[reply]
I think that would bode better with the verifiability folks... I'm an Editorofthewiki[citation needed] 00:36, 21 May 2008 (UTC)[reply]
I can change the code, but I've reached the limit of my test edits so I will get the bot to reproduce the wikitext and I'll manually transfer it over tomorrow night. Fritzpoll (talk) 00:39, 21 May 2008 (UTC)[reply]

About how many articles total need to be created? WODUP 02:52, 21 May 2008 (UTC)[reply]

Difficult to answer because it depends on how many pre-existing articles there are. The Western world tends to be reasonably well covered (although there will be gaps) whilst Asia and Africa are poorly covered. Based on the front page of www.maplandia.com, assuming no articles on these locations existed, we would be looking at 2.12 million new pages. If we assumed full coverage of Europe and North America, this would drop to 1.37million. This is why a) the bot is needed to complete at least stub-level coverage of the world, and b)why the bot needs to run slowly and in stages. I would anticipate trying to run it country by country at quiet times, and am open to other suggetsions if the scale becomes an issue Fritzpoll (talk) 11:53, 21 May 2008 (UTC)[reply]
Alright, sounds good. Thank you. WODUP 01:13, 22 May 2008 (UTC)[reply]

Looking at the 100 test pages, I note that there is a page Feyzabad in there, which is currently a redirect to an existing article on what looks to be the same location. So, a few questions: 1. Will you create this page over the redirect, even though it is a valid redirect? 2. In general, will you create the pages at X (location) or at X, Y (location, district)? 3. If the former, what will you do with identical names for different locations? Will only the first be created? (By the way, I support the proposal in general, and am glad that you use a seemingly reliable source and not the dreadful fallingrain.com or something similar) Fram (talk) 06:58, 21 May 2008 (UTC)[reply]

Good questions. 1) No. The bot assumes nothing about Wikipedia, and if the page exists, it ignores it completely. It will, however, output a log to me of pages that it did not create so that they can be reviewed manually 2) This is not a decision to be made by the bot. What the bot does is output the comma-separated list for each place as you have seen in the edits, and I then notify users like Blofeld and Editorofthewiki that the job has been completed. They then review the list and, by adjusting the first field, they can change the final title. If this has to be done a lot for a particular area, I can always make case-by-case adjustments within the bot to append text to the end of the title. But I think it is important to have real editors reviewing the lists before creation. Does this answer your question? Fritzpoll (talk) 11:53, 21 May 2008 (UTC)[reply]
Thanks, I'm satisfied. Good luck with the bot! Fram (talk) 12:20, 21 May 2008 (UTC)[reply]

((BAGAssistanceNeeded)) Trial limit reached, need comment :) Fritzpoll (talk) 13:59, 21 May 2008 (UTC)[reply]

Amendments have now been made to the bot per feedback. It will now post the list of required information in one go, rather than in individual edits. I have also added the correct categories in for the article creation process per discussions with Blofeld Fritzpoll (talk) 22:14, 21 May 2008 (UTC)[reply]

Support Although I understand nothing of the technical aspects of the bot, this is definitely a great project I checked several sample pages generated by the bot, asked a few questions, too, in the related talk pages. Looks excellent.--Dwaipayan (talk) 19:03, 24 May 2008 (UTC)[reply]

Even though I'm involved, since you were limited by the CAPTCHA, you are approved for a 100 page creation trial so we can judge the soundness of that task. MBisanz talk 21:12, 26 May 2008 (UTC)[reply]
Thanks very much - that's all I needed to see! Once we've got 100 checked items to create, I'll run the process, and get back to you here. Best wishes Fritzpoll (talk) 21:14, 26 May 2008 (UTC)[reply]

Encarta Links[edit]

Sorry for writing without a wiki user - I'm not a regular WP writer. This really looks like very interesting project - the only thing that I noticed is that on your sample page, the encarta-link at the end does not seem to help very much. In the present form, wouldn't it be better to only have the maplandia-links. They seem to work for pointing to the location in question, whereas encarta doesn't even find the sample town that you used, if one searches for Aju manually. Just my 2 cents. 91.37.145.86 (talk) 14:06, 24 May 2008 (UTC)[reply]

I think this is because the initial run forgot to append the coordinates to the end of the URL, which would make the link point to a more detailed map of the area. Thank you very much for pointing this out! Fritzpoll (talk) 14:09, 24 May 2008 (UTC)[reply]

The MSN Encarta map is one of the best atlases on the web. If wikipedia had an advanced atlas like this then the link wouldn't be needed. The mini wiki atlas doesn't have the same level of detail. What you;d need to do is locate the exact location on the link rather than having the searhc for it ♦Blofeld of SPECTRE♦ $1,000,000? 14:43, 24 May 2008 (UTC)[reply]

Trial Completed[edit]

The bot run created 100 articles this evening. I am awaiting confirmation that they are correct, but my initial examination highlights only one possible mistake in page creation, to do with a missing space in the title. Overall, I think the bot is operating fine and ready to be flagged, but that is of course up to BAG! Fritzpoll (talk) 20:04, 27 May 2008 (UTC)[reply]

((BAGAssistanceNeeded)) Fritzpoll (talk) 11:09, 30 May 2008 (UTC)[reply]
(See bottom). dihydrogen monoxide (H2O) 11:20, 30 May 2008 (UTC)[reply]

The trial showed it to be extremely efficient and ready to be done on a mass scale. We now have 100 new articles within a few minutes on real world places which are consistently referenced and have maps and locators. Remarkable, the sooner this approved the sooner the geographical coverage on wikipedia can even up ♦Blofeld of SPECTRE♦ $1,000,000? 20:26, 27 May 2008 (UTC)[reply]

I think this is an excellent idea. As it stands, creation of geographic articles has diverted a lot of resources that could potentially be used to firm up other areas of this project, and I think a bot would do a great deal of good work towards eliminating that problem. And as a consequence the encyclopedia can grow at a much faster rate. --User:AlbertHerring Io son l'orecchio e tu la bocca: parla! 20:48, 27 May 2008 (UTC)[reply]

Suggestions from Ganeshk[edit]

Could you please hold-off on a larger-run? I have some suggestions to give. I need some more time. Thanks, Ganeshk (talk) 21:56, 27 May 2008 (UTC)[reply]

No further edits can take place until the bot is approved anyway, so don't worry! Fritzpoll (talk) 21:58, 27 May 2008 (UTC)[reply]

Here is the list:

That's it for now. Thanks, Ganeshk (talk) 02:19, 28 May 2008 (UTC)[reply]

I can place it into whatever categories I'm told to - hopefully a consensus between you and the other editors involved can be reached on this. I'm not sure why the country would be excluded though... Altitude data would be nice, but one editor above has expressed disdain for fallingrain.com - don't know exactly why, but I'd rather hear from other editors about this. If the cooridnates and elevation are in the infobox, or the top right of the page, does it need to have its own section in the article? As I say, I can do all of these things, but as they appear to contradict other requests, I'll have to wait and see what other editors think. Fritzpoll (talk) 10:39, 28 May 2008 (UTC)[reply]


First of all, normally I would recategorize by state level, notce the Butrkina Faso etc categories I have restructured in this way. However as there are only 1100 articles and 34 provinces for Afghanistan I thought initially it would be best to have them all in one category and then just add the general province categories. However I think maybe this may be better organized if it is split. Most countries though will be categorized at state level first depending on how many settlements. As for extracting altitude from falling rain. PLEASE DON'T. I used falling rain in the past and many of the altitude figures are unreliable ask most people who have worked on geo settlments. One instance was a town on the coast of Madagascar reading at an altitude of 257m. There are thousands of incorrect data readings on that site. Another instance was where I was adding towns in the mountains of Burma and one said 3500m and the nearest settlement one mile away read as 460 m. The only thing which appears to be reliable is the coordinates and distance between settlements. There is no need to say something is located at something either becuase this is catered for in the infbox and map and two coordinates min atlas icons even 3 as is being suggested in one article is redundant -I often try to remove as many of them from the text as possible as it is untidy. I wouldn't have any objections though to the coordinates in the top right hand corner. As for stub templates this will have to be arranged with User:Alai and the stub sorting group, but in my experience they are often unwilling to create categories in advance ♦Blofeld of SPECTRE♦ $1,000,000? 12:04, 28 May 2008 (UTC)[reply]

  • Replies:
  • I was expecting this bot to consistent in category creation usage across countries, not have a seperate rule for each country. It would be best if the bot pushed the articles into province-level categories for Afganistan.
  • Altitudes from fallingrain are a starting point. If they are incorrect, then can be corrected. Something is better than nothing.
  • Geography section: About adding a line: See Wikipedia:WikiProject_Cities/Guideline#Geography. I quote, Although it is often included as a header or in the infobox, absolute positioning coordinates (eg. template:coor dms) can be included here.. I don't think it is untidy. Adding the altitude sentence is helpful too. Adding sections where possible will be helpful for future article expansion. It gives a article better structure than a single-line stub. I would suggest looking at any other sources for other infobox fields. It is easier to do these type of things on a new article, since existing articles will be different from each other and it would be difficult to program updates.
  • Stub templates: I have requested Alai and Grutness to weigh in here. There is got to be quick way to create these templates. If possible, I am trying to avoid revisiting category splits at a later time.
  • Glad we agreed on top-right coord template.
Thanks, Ganeshk (talk) 14:18, 28 May 2008 (UTC)[reply]
Sorry I totally disagree with the "something is better than nothing" policy. That sort of outlook is the very reason the media slay us for lack of inaccuracy. If the source is doubtable which it is and has shown to be inaccurate many times, I seriously don't think it should be trusted just for the saking of adding a figure. Wikipedia should try to be as accurate as possible and adding figures which have shown to be way out before is really not a good idea. ♦Blofeld of SPECTRE♦ $1,000,000? 16:16, 28 May 2008 (UTC)[reply]
Someone rang? The enlightened compromise is to create upmerged per-state stub templates for now, populate those at our collective leisure, and then create the separate categories when they've passed "threshold". This looks like a classic instance where this is the thing to do. (It's not tragic if it doesn't happen, since if the state categories are placed, I can bot-sort them afterwards, but it would be unnecesary double-handling.) I'm not unutterably opposed to creation of the stub sub-types in advance if we know with high confidence when and how many articles we'll have on a per-state basis, but the upmerged templates route is a pretty flexible solution. Alai (talk) 14:21, 28 May 2008 (UTC)[reply]
Just to de-jargon that a little, by "upmerged Afghan state templates", what I mean is, one template per state, initially feeding into the current Category:Afghanistan geography stubs category. (See ((Poland-sports-venue-stub)), for example.) Now, if the bot-op is able to say in advance "Paktia-geo-stub will have 89 articles, and the bot-run to create them will be finished within the week", then we needn't be quite so cautious, and might as well create Category:Paktia Province geography stubs in advance. If the number or timeframe isn't known, however, the upmerger route avoids us sitting around wondering whether we'll have an underpopulated stub category around indefinitely, whether we should be looking to re-upmerge it, and the potential for grief and miscommunication that attends such. I just want to avoid as much angst and double-handling as possible. Especially if the angsty double-handling is on my part. Alai (talk) 14:46, 28 May 2008 (UTC)[reply]
As I say, just a humble bot operator, so I will do as I am asked with regard to categories and stub templates. If there are future issues, I'll be happy to request another task to make the adjustments myself so that you don't have to worry about it. Fritzpoll (talk) 15:16, 28 May 2008 (UTC)[reply]
It's certainly not a big worry in any event. "Best practice" would be to drop a line at Wikipedia:WSS/P before you start each run, and say in effect "about to create a load of stubs for country [X], we should split these up according to first-order/most-useful subdivision [Y], right?", and if no one objects in very short order, create upmerged templates on that basis (or see if any of the stub regulars volunteers to do so, sufficiently promptly). But we can cope with assorted practices short of that, too... Alai (talk) 15:44, 28 May 2008 (UTC)[reply]
I agree with Alai - normally we'd only make stub types after the fact, once stubs are already in existence, but given that there will be bot-runs it makes perfect sense to get them sorted first. I think the only likelihood of problems would be trying to work out viable names and divisions for some countries. We normally go by official subregions (provinces in the case of Afghanistan IIRC), with template names in the form of RegionName-geo-stub, though understandably there have to be variants when there are several possible transliterations of a name or the same name is found in several countries. It's as well to give us a quick chance to assess whether there are likely to be any problems with this before going ahead and making the templates (this is exactly the sort of reason why stub types are proposed rather than simply made on the fly, BTW). As to the categories, upmerging the templates to the main Afghanistan geography stubs category is a perfectly reasonable idea at least as a temporary measure - it should become apparent pretty quickly which provinces would have enough stubs for their own categories. Grutness...wha? 00:26, 29 May 2008 (UTC)[reply]
Proposal I hope we are all agreed that the task of the bot is a) useful and b) accomplishable per the trial. To get through the approvals procedure, I therefore propose the following with regard to these details: Before the run of each country for article creation, I will consult widely about the stubs templates and categories to apply. This won't prevent making the lists for people to check through, which I am currently limited in doing because I'm forced to do it manually, and we can have these discussions on a case-by-case basis in parallel with the article checking. How does that sound? Fritzpoll (talk) 23:56, 29 May 2008 (UTC)[reply]

Suggestions from Dr. Cash[edit]

I think this bot is a good idea, but it would really be nice if it would automatically put the class & importance assessments in the wikiproject for the talk pages (see this page for Wikipedia:CITIES assessment descriptions; the class for a newly created article would automatically be stub; WP:CITIES has a population-based assessment (you could read in the population and assign the importance assessment based on that).

Without doing this, the sheer quantity of the articles that you're creating will create a metric buttload's worth of work in the assessments department for various wikiprojects. Dr. Cash (talk) 15:19, 28 May 2008 (UTC)[reply]

If you can find me a reliable source of population data, that's fine! I can dump the Wikiproject tags on, but I have no population data at present. I can always do this in a later run when a source of population data is found Fritzpoll (talk) 15:50, 28 May 2008 (UTC)[reply]

Yes of course Cash this is what we plan in regards to project tagging, I'm not sure about that population thing though as there doesn't seem anything avialable . ♦Blofeld of SPECTRE♦ $1,000,000? 16:18, 28 May 2008 (UTC)[reply]

Hmm, I'm not sure where you'd find population data for such a large dataset there? I think you can safely automatically tag the new articles that are created as stub-class, since that's what they likely will be when you create them. Let other editors promote them as they see fit. As for the importance ratings, I suspect many of these will be small towns with low populations, so it may just be safe to automatically assess them as low-importance and let editors adjust that as they see fit as well. Unless you know that a particular town is a provincial, state, or territorial capital in their respective territory -- these are assessed at high-importance. National capitals are assessed as top-importance, but I think we're already covered there anyways. So I'd probably just go with an automatic stub-class/low-importance for these. But failing to put these in is going to create so much work for wikiproject assessments, and the backlog would become so big, that something's got to be done to automatically do it,... Dr. Cash (talk) 16:42, 28 May 2008 (UTC)[reply]
Agreed - I can add the code now, but the additional task is subject to BAG approval (as is the entire bot!). Perhaps some prominent notice to all the relevant Wikiprojects would be good, otherwise I'm bound to forget somebody Fritzpoll (talk) 17:34, 28 May 2008 (UTC)[reply]

Wikipedia:FLAG[edit]

Hi. Great bot. Could you please not insert a flag into the infoboxes you create. The country name is plenty. Rettetast (talk) 23:24, 29 May 2008 (UTC)[reply]

That's a simple template change, and I can do that on-wiki. Assuming there are no objections, consider it done Fritzpoll (talk) 23:53, 29 May 2008 (UTC)[reply]

Well its only a matter of preference really, I don't accept the argument that flags affect an articles neutrality and I don't think they are ugly either. But I have no objections if you remove them ♦Blofeld of SPECTRE♦ $1,000,000? 09:58, 30 May 2008 (UTC)[reply]

Approved[edit]

Trial seems good, no problems. Go for it.  Approved. dihydrogen monoxide (H2O) 11:20, 30 May 2008 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.