The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator: -- Cobi(t|c|b|cn)

Automatic or Manually Assisted: Automatic, unsupervised.

Programming Language(s): PHP, my classes.

Function Summary: Post suspected open proxies on IP vandals to Wikipedia:WikiProject on open proxies.

Edit period(s) (e.g. Continuous, daily, one time run): Continuous.

Edit rate requested: maxlag = 10 (but in reality, not more than once or twice an hour.)

Already has a bot flag (Y/N): N

Function Details: ClueBot has found quite a few open proxy IP vandals. I would like to have ClueBot report them to Wikipedia:WikiProject on open proxies.

Discussion[edit]

How does it find proxies? ~ Wikihermit 22:45, 2 September 2007 (UTC)[reply]
Magically ;)
Actually, it has a proxy scanner which checks several blacklists and actually tries to use the proxy on several ports when it finds anonymous/IP vandalism.
The following are the DNSBLs the bot checks (only the open proxy parts):
The following are the protocols/ports it actually tries to use:
  • HTTP: 80 8080 3128 6588 81 8000 8001 8081 808 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 1337 31337 1338 31338 7000
  • SOCKS4: 1080 3128 4914 6826 7198 7366 9036 29992 38884 18844 17771 31121 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 1337 31337 1338 31338 7000
  • SOCKS5: 1080 3128 4438 5104 5113 5262 5634 6552 6561 7464 7810 8130 8148 8520 8814 9100 9186 9447 9578 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 1337 31337 1338 31338 7000
  • ROUTER: 23
  • WINGATE: 23
  • HTTPPOST: 80 81 808 6588 4480 8000 8001 8080 8081 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 1337 31337 1338 31338 7000
-- Cobi(t|c|b|cn) 23:05, 2 September 2007 (UTC)[reply]
From above: Actually, it has a proxy scanner which checks several blacklists and actually tries to use the proxy on several ports when it finds anonymous/IP vandalism. It will actually connect to my server via the proxy. If it is successful, then it found an open proxy. -- Cobi(t|c|b|cn) 23:45, 2 September 2007 (UTC)[reply]
Right, by actually tries to use, where will the "it is open" occur, at say getting an edit token from us after relaying through them, or at simply connecting to their port? I've seen legitimate proxies that have these evil ports open on them, but end up not being open, or are normally in read only (restrict POST, etc commands) mode when accessed. Thanks, — xaosflux Talk 23:53, 2 September 2007 (UTC)[reply]
It is actually going to negotiate a connection to, connect to, and recieve data from a daemon running on my server. That data will be in the form of a string which the bot will check for before it says it is an open proxy. Are you familiar with BOPM (the Blitzed Open Proxy Monitor)? The actual proxy checking code is theirs. ClueBot provides IPs to check and parses the response. -- Cobi(t|c|b|cn) 00:00, 3 September 2007 (UTC)[reply]
(Again) From above: Actually, it has a proxy scanner which checks several blacklists and actually tries to use the proxy on several ports when it finds anonymous/IP vandalism. It gets it's list from ClueBot's reversion of IP vandalism. -- Cobi(t|c|b|cn) 23:45, 2 September 2007 (UTC)[reply]
Thanks, was tryign to see if it was going to also get candidates from other sources. — xaosflux Talk 23:53, 2 September 2007 (UTC)[reply]
Good question. The main reason is that ClueBot hands the IP addresses straight to the proxy scanner, and the proxy scanner hands the results right back. It would be complicated to make the proxy scanner hand them to another bot, and it would be a waste of resources to have ClueBot hand the results off to yet another bot simply to upload them to Wikipedia. Furthermore, ClueBot already reports to AIV. This would be like that, except if it finds an open proxy, it will report to WP:OP instead. -- Cobi(t|c|b|cn) 00:58, 3 September 2007 (UTC)[reply]
Ah, so the reporting code for the most part is there, just not to the right place/format then I take it? Kwsn(Ni!) 01:11, 3 September 2007 (UTC)[reply]
Yes, ClueBot is a fairly advanced bot. It already handles all of this. All that would need to be done to add this is to call a few functions when it gets the result back from the proxy scanner. -- Cobi(t|c|b|cn) 01:14, 3 September 2007 (UTC)[reply]
Sounds like a clone of similar to RonaldBot. I'd say do something similar, Wikipedia:OP isn't 'big' but it's pretty bulky, posting detected proxies to a User subpage might be more economical. Q T C 12:46, 3 September 2007 (UTC)[reply]

If this bot is making a large volume of queries to these DNSBLs, you should consider licensing/mirroring their zone files to reduce load on their servers. They are a public service and should be treated as nicely as possible.  :) — madman bum and angel 14:40, 7 September 2007 (UTC)[reply]

I have removed most of the dns black lists (as you can see above) because they had too many false positives/old positives. Besides, I know networks that use such services for each client that connects (more than one per second), ClueBot's query rate was about .4 a second, so I doubt ClueBot added much load to the servers. I have also added more ports it checks. -- Cobi(t|c|b|cn) 09:08, 8 September 2007 (UTC)[reply]
I didn't say you were a big offender, I just said it'd be more courteous to those public services to attempt to reduce the load as much as possible. Those other networks should too. Just my two cents. — madman bum and angel 04:02, 9 September 2007 (UTC)[reply]

Trial is complete, only thing I changed during the trial is that I removed several blacklists generating false positives and added several more ports to scan. -- Cobi(t|c|b|cn) 20:34, 10 September 2007 (UTC)[reply]

Looks fine.  Approved.. As per IRC, it would be worth just having an up to date list kept somewhere with a link on the bot page. That isnt major, just something i think thats worth doing, so people know where its looking. Reedy Boy 21:00, 10 September 2007 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.