The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator: Fritzpoll

Automatic or Manually assisted: Automatic

Programming language(s): VB .Net with the DotNetWikiBot Framework

Source code available: No, but can easily be made available on request

Function overview: Scans desired categories on other language Wikipedias and finds articles that don't exist on en-wiki. Adds these articles to a list in project space to allow a group of editors to find articles to write.

Edit period(s): Occasional

Estimated number of pages affected: 1 project space page per run.

Exclusion compliant (Y/N): Not applicable

Already has a bot flag (Y/N): Yes

Function details: Bot will be provided with categories on other language Wikipedias to scan, as well as a project space page to output results to. I envisage this happening in single runs with multiple category/project page pairs per run which is why I'm requesting approval rather than just copy/pasting output from an "offline" process. The bot takes the category, obtains the subcategory tree and then checks all articles within that category for an en: transwiki link. If an article does not have said link, it is output onto the project page. For a better idea, see the test output at Wikipedia:WikiProject Intertranswiki/Spanish/Culture/Bot Run, which is exactly what the output should look like.

Discussion[edit]

Greatly needed to draw up a directory of missing content from other wikis and can be used as a tool to work towards identifying exactly what is missing and working towards getting these articles into english as part of Wikipedia:WikiProject Intertranswiki. Himalayan 09:17, 7 October 2009 (UTC)[reply]

This looks fairly straightforward. I presume the bot will only run when a project approaches and asks you to do so for them? Or will it be running when WikiProject Intertranswiki wants it to? Why are there so many blue links at Wikipedia:WikiProject Intertranswiki/Spanish/Culture/Bot Run? At least one of the pages has existed since 2007. Also, I'd be interested personally in seeing the source code. - Kingpin13 (talk) 11:15, 7 October 2009 (UTC)[reply]
Some of the blue links arise because unrelated articles of the same name exist over here, and others because the interwiki links have not been established. I'll run it at the request of WikiProject Intertranswiki, but any other project could in theory approach for category extraction. Code is pretty rough and ready, but I'll post it here - don't judge me by it's quality! Fritzpoll (talk) 11:36, 7 October 2009 (UTC)[reply]
Nah, most of the code looks pretty good (tho I haven't done vb for years). Although if I were you I'd use a foreach to add all the strings in returner to pp.text. Also, I can't spot anywhere that you've filtered out non-mainspace, is this intentional? Natrually the cats should only contain articles, but to be on the safe side, I'd filter the list. This should be a simply matter of calling pList.FilterNamespaces with the int array argument only containing 0 (article space) after loading pList in For Each cat loop. - Kingpin13 (talk) 11:57, 7 October 2009 (UTC)[reply]
Good plan - I'll implement that filter. For..Each is something I'm not entirely used to, so I have a tendency to leap back to the standard form - I'll get used to that in time! Fritzpoll (talk) 12:07, 7 October 2009 (UTC)[reply]
Great. Do you mind starting a thread at Wikipedia talk:WikiProject Intertranswiki about this? Or/and you could consider asking User:xeno and User:ThaddeusB for their input, as members of the project who also are familiar with bots (there might be others but those two "jumped out at me" :D). Unless anything comes up this is probably looking to get approved. - Kingpin13 (talk) 12:16, 7 October 2009 (UTC)[reply]
Sure, time for our due diligence. :) I'm an infrequently contributing member of BAG, so I should know these rules! Fritzpoll (talk) 12:19, 7 October 2009 (UTC)[reply]
If this task is approved, I'm not sure re-approval would be necessary in order to add those stats in subsequent code updates. Some of those would require some thought, and, yeah, a little time :) Get this approved first, and we can discuss what info to extract in addition Fritzpoll (talk) 17:07, 7 October 2009 (UTC)[reply]
I'm quite sure it wouldn't require a further approval. :) --ThaddeusB (talk) 01:38, 8 October 2009 (UTC)[reply]
Thaddeus has a useful idea but the thing is this directory is intended to last period, decades. Articles size and status on other wikipedias is likely to get out of dat very quickly so in a fe wyears time the missing list databases will contain likely false info about the current status on the other wiki. However if we to also generate lists of FA/GA articles and the top 2000 articles on another wikipedia using that traffic tool for instance this would be a good idea. Himalayan 17:14, 7 October 2009 (UTC)[reply]

Doesn't look like we're gonna get much more input from the project, but since the three members who have commented here all seem to support it, and it seems fairly uncontroversial but helpful, it is  Approved. Feel free to add on more information about the pages to the list. - Kingpin13 (talk) 08:14, 9 October 2009 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.