The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.


Operator: Christopher Breneman (Crispy1989), Tim1357, and Jacobi Carter (Cobi).

Time filed: 00:35, Monday October 25, 2010 (UTC)

Automatic or Manually assisted: Automatic.

Programming language(s): The core is written in C++ by Christopher Breneman. The interface to Wikipedia is written in PHP by Cobi. The dataset is maintained by Tim.

Source code available: See Christopher Breneman for access to subversion repository.

Function overview: Vandalism detection and reverting using machine learning algorithms.

Links to relevant discussions (where appropriate):

Edit period(s): Continuous.

Estimated number of pages affected: Current statistics indicate approximately 70% of vandalism is caught, so it would be editing approximately 70% of vandalized pages.

Exclusion compliant (Y/N): Yes.

Already has a bot flag (Y/N): No.

Function details: Cluebot-NG is an attempt to revolutionize practical vandalism prevention on Wikipedia. Existing anti-vandal bots use simple static heuristics, and as such, catch a relatively small portion of vandalism, and with an unacceptable false positive rate, many of which are likely not even reported. Cluebot-NG shares no code with the original Cluebot, and uses completely different algorithms to detect vandalism. Details of these algorithms can be found at [1] . Because these algorithms must be trained on a dataset, there is also a convenient way to estimate accuracy before a live run - simply running the bot on a portion of its dataset not used for training. Currently, this is yielding a 60% to 70% vandalism detection rate - far above that of current bots.

Discussion

Pre-trial Discussion

Approved for trial (14 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Approved for editing at 0.25% FP rate. 0.25% of edits means that on average, 3 out of 1000 edits will be reverted, which is lower than our current bots and many of our human editors as well. Crispy and Cobi and Tim are working continuously on this bot, and it should only improve from here. What more, with the dataset being improved, FP rate is actually lower than stated, so this should be an allright FP rate. (X! · talk)  · @234  ·  04:37, 2 November 2010 (UTC)[reply]

Trial 1 discussion

Trial complete.

Trial Summary

The trial is now over, and I'd like to take a moment to go over what was found during the trial.

Problems found and fixed during the trial
Outstanding issues that can be fixed by improving the dataset
Things that can be improved
End-of-trial statistics
Overall

The bot performs as expected. The false positive rate (which can still be adjusted if necessary) is set at 0.25%, which, after the revert exemptions, causes only a few false positives per day. This is below the false positive rate of existing bots. The vandalism catch rate, determined by using the random sampling of edits from the review interface, is right around 55%, about an order of magnitude more than existing bots. This puts a very large dent in vandalism on Wikipedia, and will continue to improve.

While there are things that can still be improved to catch more vandalism, the false positive rate will always remain at a fixed percentage. Further improvements will yield a greater vandalism catch rate, but the false positive rate is adjusted by hand, and will not change unless it is decided that it should change.

The single most important thing for improving the bot is improving the dataset. Many people are already contributing large amounts of time to this purpose, and because of this, we can now use a real random sampling for statistics determination. As these people, and others, continue to help, we'll eventually be able to use the random sampling as a training set as well.

Request

I'd like to ask for an extended trial. The bot is production ready, and performs much better than existing bots, both in terms of false positives and vandalism catch rate. But an extended trial will maintain interest in helping us to expand the dataset so it becomes as good as it can be, while still reverting vandalism just as well as it would in production. Crispy1989 (talk) 23:20, 16 November 2010 (UTC)[reply]

Approved for extended trial (14 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. It seems the biggest thing needed is the improved dataset. Anomie 04:40, 18 November 2010 (UTC)[reply]

Trial 2 discussion

Trial complete. We'll post a summary shortly. -- Cobi(t|c|b) 04:33, 2 December 2010 (UTC)[reply]

Trial 2 Summary

Major Events During Trial 2
Controversies

Several controversies not (conspicuously) present during Trial 1 were raised during Trial 2.

Clarifications

These are clarifications on some things are are available elsewhere, but are restated here because they are commonly misunderstood.

Important Documentation

Those not already familiar with how the bot works should read these links. They are critical to understanding its behavior. These were written during Trial 2 in response to numerous repeated questions for the same information.

Support for the Bot

While the bot has generated some controversy, it has also received a large amount of support and praise - this support isn't on the BRFA, but may be useful. Only "pure support" message are included here - there are others that are part of controversial discussions.

It's also worth noting that this praise is coming from people who are familiar and used to the old ClueBot, so they are noticing a real difference.

Summary

The bot is performing well within its expected parameters. It was approved for Trial 1 for operation at 0.25% false positives, and it was always well within that limit. Halfway through Trial 2, it was changed to 0.1% false positives at user request, or 1 in 1000 incorrectly reverted edits (also note that this is a maximum).

Controversy has sprung up, often due to misunderstandings about how various statistics are calculated and used. These have been clarified, and an FAQ page written to explain these issues. The remaining controversy has been addressed (false positive rate has been more than halved, report interface improved, etc).

Cluebot NG's performance is almost an order of magnitude better than all previous anti-vandal bots. Using novel algorithms and approaches, it truly is the next generation to practical automated vandal-fighting on Wikipedia. And over time, as we continue to work on the bot, its accuracy will improve even more.

Request

The developers request that the bot be approved to operate at a false positive rate of the operators' discretion. We would like the ability to adjust the false positive rate for a few reasons:

We will never set the FP rate to anything above 0.25% (or 3 in 1000), and for now, it will remain at 0.1% (1 in 1000), as this is where community support lies. We will also always listen to the community and try to determine consensus if disagreement about the FP rate ever again arises.

After approval, we will restart the bot, so it can continue doing its job of keeping Wikipedia clean, and reducing vandal-fighter workload. Crispy1989 (talk) 04:36, 2 December 2010 (UTC)[reply]

False Positive Reporting

Less than 0.1% of constructive or well-intentioned edits are misclassified as vandalism by Cluebot-NG. Please see Information About False Positives for more information about why this happens, and why it is necessary. Reports posted here are reviewed by the bot developers in case anything can be done to the bot to improve its accuracy.



Approval

 Approved. to operate at operators' discretion. Reedy 02:24, 3 December 2010 (UTC)[reply]

Thanks. The false positive rate will remain at less than 0.1% for the foreseeable future, unless improvements are made to the bot which cause a slightly higher dropoff point than present, or the bot's accuracy improves to the point where it can be lowered without significantly affecting accuracy. Crispy1989 (talk) 02:37, 3 December 2010 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.