The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator: WASD (talk)

Automatic or Manually Assisted: Manually, no changes, wiki API only

Programming Language(s): PHP, Python

Function Summary: this bot needs flag to use Wikipedia API when working with lots of data. Current job: «Visitor attractions» and it’s foreign counterparts

Edit period(s) (e.g. Continuous, daily, one time run): one time run

Already has a bot flag (Y/N): No (ruWiki only - WASDbot)

Function Details: This bot do not commit any edits. He gathers information from certain categories of Wikipedia (using interwiki links) to update data, create work lists for usersm synchronize categories (manual only) and etc.

Current job: «Visitor attractions» and it’s foreign counterparts.

This bot needs flag to use Wikipedia API when working with lots of data with API result limit bigger then 500.

Discussion

[edit]

I don't really understand what data this bot is collecting and why. BJTalk 22:47, 10 July 2008 (UTC)[reply]

This bot collect data about visitors' attractions in every Wikipedia. It analize first given Category (Category:Visitor attractions, for example), then load data about subcategories and iwikis of this category and continue parsing data about each found category. Then bot analize all articles in found categories. Finally, it create lists of visitor attraction in different languages, create link between them (and their categories) and save this lists on my disk. This bot do not commit any automatic edits in Wiki! Beside, this bot collect information about used images and coordinates in found articles for future tasks. || WASD (talk) 08:02, 11 July 2008 (UTC)[reply]
Sounds like this bot would read lots of data and create offline reports. That sounds fine to me... but would it be better to use a database dump for this? – Quadell (talk) 13:06, 14 July 2008 (UTC)[reply]
Dump of every single Wikipedia? Nope, I don't have so many space and traffic :) || WASD (talk) 08:13, 16 July 2008 (UTC)[reply]

How often will this bot be doing reads > 500? Will you be throttling it between large runs? SQLQuery me! 18:57, 15 July 2008 (UTC)[reply]

Every 5-6 sec between every request for 500 (5000) articles in API, several hours pauses between large runs. || WASD (talk) 08:13, 16 July 2008 (UTC)[reply]
Well, I'm inclined at this point, to indef. block the account (until it's approved for editing), and, approve this bot. Any objections? SQLQuery me! 21:29, 16 July 2008 (UTC)[reply]
OperatorAssistanceNeeded Template Deleted by WASD BJTalk 06:49, 22 July 2008 (UTC)[reply]
Oh, you were speaking to me, not other admins :) Sure I'll be satisfied with this decision. || WASD (talk) 05:50, 23 July 2008 (UTC)[reply]

I dont know this seems to be a clear cut example of something explicitly disallowed under WP:Bot policy, irregardless of the delay.

Bots that download substantial portions of Wikipedia's content by requesting many individual pages are not permitted.

It's up to you, but I'd suggest getting a dev opinion first. Q T C 05:57, 23 July 2008 (UTC)[reply]

The difference is that this bot do not request individual pages at all, it uses MediaWiki API for multipage request. || WASD (talk) 07:50, 23 July 2008 (UTC)[reply]

If you need to datamine wikipedia, use a dump. The API is clearly not meant to make datamining easier. NicDumZ ~ 10:25, 24 July 2008 (UTC)[reply]

Would you mind reading all the discussion? I cannot download dump of every wiki, it's nearly impossible. || WASD (talk) 13:12, 24 July 2008 (UTC)[reply]
If downloading the relevant dump files is too much for you, then scanning the API massively would be too much for both you and the Wikimedia servers. And being rude won't help your case. – Quadell (talk) 17:05, 24 July 2008 (UTC)[reply]
No offend, Quadell, where've you found something rude? That's NicDumZ who didn't read the discussion, not me. Besides, what's the relation between "too much to download" and "too much for Wikipedia API"? Downloading means ALL Wikipedia while scanning uses only several categories, there are no connection between database dump size and my bot's parser. You are inconsistent.
I can give you 10 another reasons (for example, different languages Wikipedia dumps differs by date, sometimes even for several months), but why should I? || WASD (talk) 20:21, 24 July 2008 (UTC)[reply]

The API explicitly allows for content to be taken from it. It has this feature for a reason. If there a legitimate reason for extracting the data (which there seems to be here) and a bot flag will make this job easier and more efficient, I see no reason to deny the request. Those who are worrying about performance issues really shouldn't be. The bot is appropriately throttling itself, and if it doesn't, the server admins will notice and take care of the issue. --MZMcBride (talk) 20:22, 25 July 2008 (UTC)[reply]

 Approved. BJTalk 07:29, 26 July 2008 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.