The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was

Approved.

KrimpBot

[edit]

Operator: krimpet ✽

Automatic or Manually Assisted: Automatic, unsupervised

Programming Language(s): Python, using the pywikipedia framework

Function Summary: Will tag the talk pages of open Tor exit nodes indicating their open status, and tag blocked non-nodes, with relevant categories

Edit period(s) (e.g. Continuous, daily, one time run): Continuous

Edit rate requested: 4 edits per minute

Already has a bot flag (Y/N): N

Function Details: http://hemlock.ts.wikimedia.org/~krimpet/torlist.txt is an automatically generated list of Tor nodes exiting to the WMF servers; I have a small program that runs every 6 hours via cron job that queries the authoritative directory, filters out nodes whose exit policy blocks access to the WMF IP ranges (as well as restrictive exit policies like *:80, *:443, and *:*), and writes them to this file.

I would like to propose a bot, written in Python with pywikipedia and also running on my toolserver account, that uses this list to identify active Tor exits on their talk pages, as well as identify IPs that are no longer Tor but still blocked, so that administrators can block/unblock if needed. It would do this simply by tagging IP talk pages with an appropriate category (perhaps Category:Tor exit nodes and Category:Blocked former Tor exit nodes?) and removing the category when it finds no longer applies. (I also hope to make this bot portable across WMF wikis as well, in case other projects want to use it.)

Discussion

[edit]

Comment. This seems like a very useful function that would help bring a healthy share each of order and accuracy to an often muddled area of wiki administration. Vassyana (talk) 06:56, 20 February 2008 (UTC)[reply]
Comment - as part of checkuser duties, I check and block tor nodes all the time. This function would be invaluable - Alison ^❤ 07:03, 20 February 2008 (UTC)[reply]
Comment - This is indeed useful Krimpet. Compwhiz II^(Talk)_(Contribs) 11:42, 20 February 2008 (UTC)[reply]
Which templates will it use for blocked nodes? Unblocked nodes? This would be really helpful :) SQL^{Query me!} 12:06, 20 February 2008 (UTC)[reply]
- I foresee it using user categories, rather then templates. For example, current nodes would be tagged with Category:Tor exit nodes. Every few hours as the list is updated, it would compare the members of that category in the live list, and remove the category from IPs that are no longer Tor; additionally, if someone had blocked the IP as Tor, it would tag it with Category:Blocked former Tor exit nodes. krimpet ✽ 19:55, 20 February 2008 (UTC)[reply]
  - Thanks :) I misread, thought I saw something about templates. Those cat's sound about right, and, wouldn't interfere (would help nicely!) with my similar efforts. SQL ^{Query me!} 01:46, 21 February 2008 (UTC)[reply]
I see no point is blocking Tor nodes with no edits to wikipedia. It's like randomly killing someone because they *might* be a threat later on. Mønobi 03:39, 21 February 2008 (UTC)[reply]
- This bot is not designed to block Tor nodes -- there have been proposals to do so in the past with adminbots, but that is not what this bot is intended to do. Rather, it allows users and admins to clearly identify which IPs truly are and aren't Tor, to eliminate the current patchy system of guesswork that we have currently, and in the event of abuse know where it's coming from. krimpet ✽ 04:05, 21 February 2008 (UTC)[reply]
- I know, but certainly you'll tag tor nodes that have made zero edits to wikipedia. Mønobi 22:14, 21 February 2008 (UTC)[reply]
  - That is a good point, is there a specific benefit to tagging talkpages of accounts, that haven't ever edited? Is that something you'd be willing to work into the code? I mean, you're running constantly anyhow, there's no real need or harm, that I can see, to hold off, until that IP has actually edited. SQL^{Query me!} 03:04, 22 February 2008 (UTC)[reply]
    - Only tagging IPs with edits greatly lessens its utility for checkusers, though: if someone abuses the node while logged in but doesn't edit anonymously, they wouldn't be able to easily check an IP's talk to see if it's an exit node or not. krimpet ✽ 03:58, 22 February 2008 (UTC)[reply]
    - For some perspective, we've had a seriously abusive banned editor using TOR to register a bunch of accounts to get past the ACB rangeblocks that have been set up. A system where we can tag and check TOR would be seriously useful in dealing with this guy considering those accounts have basically made "zero edits" when you look at their contribs - Alison ^❤ 04:08, 22 February 2008 (UTC)[reply]
      - Click 'What links here', on the IP's userpage :) A couple/few of us maintain up-to-date lists of TOR nodes that allow Wikipedia exit, without tagging the IP's talkpage. I'm not opposed to doing this, for the record, by the way. SQL ^{Query me!} 04:19, 22 February 2008 (UTC)[reply]
        The talk page does have some benefits to a list though: one, it means the talk page will be User:Krimpet/Fake link, standing out in CU results; and two, the talk page history will make it easy to see the the IP's history as a node, as opposed to a list where all the bot's updates would be clumped together. krimpet ✽ 04:24, 22 February 2008 (UTC)[reply]
        Agreed, that'd make it a lot easier to track the 'flip-floppers'. SQL ^{Query me!} 04:27, 22 February 2008 (UTC)[reply]
Well, either way, I'd love to see a trial. What's everyone else think? I was thinking maybe 3 days. (I'm not in BAG, so, please note, that I cannot technically approve trials) SQL^{Query me!} 04:32, 22 February 2008 (UTC)[reply]
- Indeed, sounds interesting. Approved for trial (250 edits or 7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. - 7 Day/250 edits sounds more than reasonable. Its a fairly low risk bot.. Presumably the code is still to be written? Whether it needs writing or not, can you please post a link to a copy of the code (if that is ok with you), so other users can look over it as necessary? —Reedy Boy 15:58, 25 February 2008 (UTC)[reply]
  - Ah, thanks, I've got it up and running now :) The code is here, and I welcome any suggestions or improvements. krimpet ✽ 06:49, 26 February 2008 (UTC)[reply]
    - Quite frankly, your code is a waste of system resources (I mean in regards to the TS). It should not be in a while loop, you should really use crontab. I've cleaned it up a bit to include expections. You'll have to place it on the crontab. Here is the source:

([1])

That should work. It worked for me, at least. You should also use regexes for the replacements, but I'm not too familiar with them, so I didn't include it myself. Mønobi 23:27, 27 February 2008 (UTC)[reply]

Using a continuous while loop instead of a cron job was an intentional design decision - it sometimes takes longer than 6 hours for the bot to make a full run , and only one copy of the bot should ideally be running at a time, thus it sleeps only if 6 hours have not yet passed, and if over 6 hours have already passed, it then starts again. The process sleep()s for two minutes between iterations of the while loop, so virtually no system resources should be wasted; the overhead is probably less than a cron job would take, since that would mean discarding and creating another instance of the Python interpreter every time, instead of simply having one that slumbers away in swap space waiting to start again. :) krimpet ✽ 16:49, 28 February 2008 (UTC)[reply]

If you tell me what the regex's are supposed to actually do, I could put them together really easily. Q ^{T C} 23:42, 27 February 2008 (UTC)[reply]

Just do something like:

    retor = re.compile(r"\[{2}Category:Tor exit nodes\]{2}")
reformer = re.compile(r"\[{2}Category:Blocked former Tor exit nodes\]{2}")

and then use:

text = re.sub(retor, '', text)

Of course this wasn't based of any sort of study so those regex's have no warranty. Q ^{T C} 00:20, 28 February 2008 (UTC)[reply]

Comment As a side note, I've noticed that it's not correctly replacing the category tags. EG: this one Q ^{T C} 07:40, 28 February 2008 (UTC)[reply]
That specific IP, as well, is presently a blocked valid exit node, near as I can tell, but, it's marked as "former"? SQL ^{Query me!} 15:29, 28 February 2008 (UTC)[reply]
In the case of that IP, it seems to toggle on and off occasionally between being an active Tor node - see the talk page history for example. (If an admin is going through the former nodes category looking for IPs to unblock, I do encourage them to check the IP's talk page and block history before unblocking.) This particular node is back up and listed in the authoritative directory now, so the bot should come around and tag it as active again soon. krimpet ✽ 16:49, 28 February 2008 (UTC)[reply]
Trial looks like it went well (Been tracking it with SQLBot). Suggest approving this very useful bot! SQL^{Query me!} 06:18, 6 March 2008 (UTC)[reply]
- I've made a couple improvements in the bot recently based on feedback, BTW - it now holds off on tagging a node as "former" if it's been tagged as active in the last 72 hours, since many nodes go down intermittently. krimpet ✽ 18:47, 7 March 2008 (UTC)[reply]

Approved. --uǝʌǝs ʎʇɹnoɟ ʇs(st47) 12:44, 8 March 2008 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.