The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.


Operator: krimpet

Automatic or Manually Assisted: Automatic, unsupervised

Programming Language(s): Python, using the pywikipedia framework

Function Summary: Will tag the talk pages of open Tor exit nodes indicating their open status, and tag blocked non-nodes, with relevant categories

Edit period(s) (e.g. Continuous, daily, one time run): Continuous

Edit rate requested: 4 edits per minute

Already has a bot flag (Y/N): N

Function Details: http://hemlock.ts.wikimedia.org/~krimpet/torlist.txt is an automatically generated list of Tor nodes exiting to the WMF servers; I have a small program that runs every 6 hours via cron job that queries the authoritative directory, filters out nodes whose exit policy blocks access to the WMF IP ranges (as well as restrictive exit policies like *:80, *:443, and *:*), and writes them to this file.

I would like to propose a bot, written in Python with pywikipedia and also running on my toolserver account, that uses this list to identify active Tor exits on their talk pages, as well as identify IPs that are no longer Tor but still blocked, so that administrators can block/unblock if needed. It would do this simply by tagging IP talk pages with an appropriate category (perhaps Category:Tor exit nodes and Category:Blocked former Tor exit nodes?) and removing the category when it finds no longer applies. (I also hope to make this bot portable across WMF wikis as well, in case other projects want to use it.)

Discussion

[edit]

([1])

That should work. It worked for me, at least. You should also use regexes for the replacements, but I'm not too familiar with them, so I didn't include it myself. Mønobi 23:27, 27 February 2008 (UTC)[reply]

Using a continuous while loop instead of a cron job was an intentional design decision - it sometimes takes longer than 6 hours for the bot to make a full run , and only one copy of the bot should ideally be running at a time, thus it sleeps only if 6 hours have not yet passed, and if over 6 hours have already passed, it then starts again. The process sleep()s for two minutes between iterations of the while loop, so virtually no system resources should be wasted; the overhead is probably less than a cron job would take, since that would mean discarding and creating another instance of the Python interpreter every time, instead of simply having one that slumbers away in swap space waiting to start again. :) krimpet 16:49, 28 February 2008 (UTC)[reply]
If you tell me what the regex's are supposed to actually do, I could put them together really easily. Q T C 23:42, 27 February 2008 (UTC)[reply]
Just do something like:
    retor = re.compile(r"\[{2}Category:Tor exit nodes\]{2}")
reformer = re.compile(r"\[{2}Category:Blocked former Tor exit nodes\]{2}")
and then use:
text = re.sub(retor, '', text)

Of course this wasn't based of any sort of study so those regex's have no warranty. Q T C 00:20, 28 February 2008 (UTC)[reply]

 Approved. --uǝʌǝsʎʇɹnoɟʇs(st47) 12:44, 8 March 2008 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.