The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator: AlekseyFy (talk)

Automatic or Manually Assisted: No edits currently, Automatic if trial is successful

Programming Language(s): Python using wikitools

Function Overview: Automatic disambiguation of some ambiguous links using collaborative filtering.

Edit period(s): Daily

Already has a bot flag (Y/N): N

Function Details: Details of the proposed bot's operation appear on its user page. Right now, I would like the have the bot approved so that it can generate results to compare to human disambiguation without making any edits. If these trials are deemed successful, I would return here to ask for approval for it to commit edits itself.

Discussion[edit]

Are you requesting the bot flag just to make larger queries? Or, will the bot be making edits outside of its user space during the trial? How much data will it be transferring from the database? Wronkiew (talk) 04:38, 16 April 2009 (UTC)[reply]

The bot will not make edits outside of reporting its results in its user space during the trial, so yes, the request now is to allow larger queries, and because of advice here.
As far as data transfers, building the Mandarin test model required finding the backlinks for each target including resolving redirects, which comprised roughly 6000 pages. After that, the wikitext for the latest revision for each page had to be downloaded, then the links in each page needed to be resolved (done through the api using a generator). I don't know how much data was transferred for all of that, but saving the results for that test (which does not store complete wikitext, only the context around the target links) takes up roughly 27MB. The process takes several hours. AlekseyFy (talk) 05:02, 16 April 2009 (UTC)[reply]

Comment: This was previously discussed at Wikipedia:Bot requests/Archive 26#Link disambiguation bot, to general approval. This particular request--downloading the data for 6000 pages, with a good objective in mind--looks uncontroversial to me. – Quadell (talk) 14:12, 16 April 2009 (UTC)[reply]

Speedily Approved. Go for it. 03:54, 19 April 2009 (UTC)


The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.