The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.

Operator: FASTILY (TALK)

Time filed: 06:21, Saturday October 22, 2011 (UTC)

Automatic or Manual: Automatic unsupervised

Programming language(s): Java

Source code available: Not currently

Function overview: Bot flags non-free files with old unused revisions.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Tagging_old_revisions_of_non-free_files, [1]

Edit period(s): continuous

Estimated number of pages affected: 50k+

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): T

Function details: See Wikipedia:Bot_requests#Tagging_old_revisions_of_non-free_files for specification.

Discussion[edit]

--MZMcBride (talk) 20:45, 23 October 2011 (UTC)[reply]

  • I'll obtain an input list by fetching the transclusions of various non-free license templates.
  • Plenty ;)
  • A crappy rough estimate. Given that about half of the media files uploaded to en.wikipedia are non-free, I did a survey of 16 non-free media files. Of these files, I found 2 of them to have old revisions in need of deletion. If 2/16 == ⅛ ⇒ 400k*⅛ = 50k -FASTILY (TALK) 00:46, 24 October 2011 (UTC)[reply]
  • From my experience, I would say that Fastily's guesstimate of 50k might actually be much, much higher than the actual number of instances. Sven Manguard Wha? 18:09, 2 November 2011 (UTC)[reply]

The bot should have something to prevent it re-tagging an image in which has already been tagged for deletion and had the tag removed, so you don't end up in a tag war (I can't see that happening very often, however if this task is 100% full-proof, then it should be done as an adminbot anyway). --Chris 13:00, 26 October 2011 (UTC)[reply]

As the original proposer of the idea of such a bot, I will say that an adminbot is not a good idea. First, there's a 7 day waiting period between tagging and revision deletion. Second, some files are tagged with the wrong license tag and thus would require individual review. — Train2104 (talk • contribs • count) 20:28, 26 October 2011 (UTC)[reply]
Mistagging is a major problem in files, but almost always in the other direction, with people tagging non-free material under an own-work free license. I've actually only ever seen two or three cases where someone tagged a free file as non-free, and it usually involved someone from one country not really knowing when the copyright for an image made in another country would expire.
The task of flagging these unused revisions is as close to 99.99% foolproof as any task I can think of, so I support that, but actually doing the rev-dels needs to be left to humans. There are cases where a good file was usurped by a bad one, or a good file was usurped by another good file, and both of those situations involve something more complicated than just automatically revdelling the non-active files. Sven Manguard Wha? 12:42, 30 October 2011 (UTC)[reply]
I think this should exclude by age. That is to say, if a file was very recently replaced by a free version, then this bot should not tag. This will give article watchers more time to fix obvious errors and not put all the work on admin judgment (especially since article watchers will probably have a lot more context to work from) Gigs (talk) 15:08, 2 November 2011 (UTC)[reply]
Most people do not watch the individual images within the articles they watch, and thus, would only know that a usurpation occurred if they loaded the article and looked at the images in it. Also, for the sake of discussion, the administrators who actively patrol the files needing revdels category are probably all experienced with file work. This is a rather obscure admin task, after all. Sven Manguard Wha? 18:09, 2 November 2011 (UTC)[reply]
As to the concern about the bots repeatedly tagging the same image, I doubt that will happen. Fastily dosen't run most of his bot tasks daily/continuously, he queues up a task, runs that task until the bot has been through everything once, and then gives that task a substantial fallow period. That means that the bot won't be going to the same file multiple times for the same task in the same week, or month, or probably even quarter. Sven Manguard Wha? 08:15, 5 November 2011 (UTC)[reply]
And? It's still a possibility. There needs to be some proper protection within the code so that it doesn't happen (all the bot really needs to do, is keep a record of images it has tagged in a file and check that). It's better to get things like this right the first time, rather than cutting corners. --Chris 13:33, 7 November 2011 (UTC)[reply]

Comment For a demonstration of what this bot would do, assume that File:Visual demonstration of a file usurpation.png is a non-free image, locally hosted. Note that it has three prior versions (pretend that they are also non-free). The bot would detect that those versions exist, and would place a tag for an admin to take a look at them. The bot, as I understand it, would not revdel the old images, and nothing in the article space would ever be edited. Sven Manguard Wha? 18:09, 2 November 2011 (UTC)[reply]

Is there any harm in waiting a little while? This is just a housekeeping issue that has no visible effect. If it's a continuous bot it'll eventually get them. Gigs (talk) 18:12, 2 November 2011 (UTC)[reply]
What would be the benefit of waiting a while? There's a one week waiting period between tagging and deletion, and the deletion is done by a human who (supposedly) is checking for usurpations. One problem I have is one of volume. If, one week after the bot starts running, the cat of files needing attention contains tens of thousands of files, admins may shy away from it. How about doing one letter of the alphabet a day to prevent such an issue? — Train2104 (talk • contribs • count) 18:14, 2 November 2011 (UTC)[reply]
The benefit would be less human intervention needed to revert a bad change, this bot would be slightly compounding the problem if it jumped in an immediately tagged things that some vandal or careless person messed up. Your concern is different from mine, but I think it might be valid as well. That said, Fastily will probably process thousands of them (if not all of them) by himself, so I don't see the creation of a huge backlog being too big of an issue. Gigs (talk) 19:13, 2 November 2011 (UTC)[reply]
I'd help out, but I'm not an admin (yet). This is precisely the kind of thing I'd use the mop for too, Fastily shouldn't have to shoulder this one alone. Sven Manguard Wha? 09:54, 3 November 2011 (UTC)[reply]
I'd echo the concerns of Mistagging/mislicensing/usurping and the odd 'bad overwrites' - the bot tagging is an excelent idea, but I think that there does need to be an admin's eye to double check the results. I don't see the backlog for this being a huge problem, especially if the bot runs are limited to X number per day. In re the # of files - I'm on #8 of 25 pages of my User:Skier Dude/My Sandbox/Oversize images ongoing project & you can see my deletion #'s are going through the roof! Skier Dude (talk) 04:51, 7 November 2011 (UTC)[reply]
Post script - there is a weird problem when deleting some revisions with dimensions of 0x0 or the size is 0K - the revision does not get deleted, all revisions are deleted and the file descriptinon page gets nuked as well. If the move is toward an adminbot, this needs to be built in or tagged for special treatment, as those deletions can easily leave redlinks. Skier Dude (talk) 04:58, 7 November 2011 (UTC)[reply]

Okay, for the record, Fastily never said that this was going to be an adminbot, the task is "Bot flags non-free files with old unused revisions", and someone else, not Fastily suggested that it be an adminbot. If Fastily wants to comment on this himself, he can log in or he can email me again, but until then, can we please not assume that something not in the task description is a fact? Sven Manguard Wha? 07:43, 7 November 2011 (UTC)[reply]

I'm comfortable with this going to trial, and I don't see any serious concerns here that haven't been addressed. I would like to see it have a small lockout period before tagging a recently done change, but if that's too hard, I don't think it's critical. Flagging for BAG attention. Gigs (talk) 18:39, 10 November 2011 (UTC)[reply]

Author on Wikibreak on en.WP only (use his Commons talk page for questions). He said he will code and trial run the bot when he comes back Nov 24. — Train2104 (talk • contribs • count) 19:05, 10 November 2011 (UTC)[reply]
((BAGAssistanceNeeded)) I'm back from my wikibreak. I've read over the discussion above and find the consensus regarding the specification of the bot to be acceptable. I've finished coding the bot and am ready to begin trial. -FASTILY (TALK) 09:37, 24 November 2011 (UTC)[reply]
Approved for trial (250 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. --Chris 08:17, 26 November 2011 (UTC)[reply]

Trial complete. No issues. -FASTILY (TALK) 06:05, 27 November 2011 (UTC)[reply]

Fixed the first bug. I missed a tilde in the code. File:Patty_Hearst.jpg had some major issues/inconsistencies with its file description page that threw off the bot. I've repaired those. Also, I derive my own database reports, which easier for a bot to read, less taxing on the servers, and more encompassing than the category. -FASTILY (TALK) 21:46, 27 November 2011 (UTC)[reply]
There are many more files like that, with free tags and fair use rationales or two contradicting license tags outright. It would be nice if the bot could output those in a list for editors to process. — Train2104 (talk • contribs) 21:52, 27 November 2011 (UTC)[reply]
I thought MZMcBride had a database report for that? If not, I'll make one for you. -FASTILY (TALK) 21:54, 27 November 2011 (UTC)[reply]
Actually there is. I just overlooked it. Oops. But still, the bot should disregard those files. — Train2104 (talk • contribs) 22:27, 27 November 2011 (UTC)[reply]
Δ has a list which is kept updated daily of files that are in both the free use and fair use overarching cats. There's also a DB report, slightly less current. Sven Manguard Wha? 02:20, 28 November 2011 (UTC)[reply]
((BAGAssistanceNeeded)) Apart from the little bug mentioned above, which has since been resolved, this bot is ready to go. Could a BAG member please approve this request? -FASTILY (TALK) 07:43, 28 November 2011 (UTC)[reply]
 Approved. MBisanz talk 18:33, 28 November 2011 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.