The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator: Erik9 (talk)

Automatic or Manually Assisted: Manually started, performs edits automatically.

Programming Language(s): Uses AWB with autosave function

Function Overview: As an expansion of task 6, adds template:unreferenced dated subcategories of category:articles lacking sources and category:all articles lacking sources to all articles identified as unsourced and not already tagged to that effect, using substantially the same algorithm.

Edit period(s): continuous, as needed

Already has a bot flag (Y/N): Y

This task would add template:unreferenced, using a bot=yes parameter similar to the one recently added to template:BLP unsourced [1] dated subcategories of category:articles lacking sources and category:all articles lacking sources, to all articles identified as unsourced using the same algorithm as task 6, with the exception that the absence of Category:Living people would be required (to avoid overlap with task 6 and placement of a generalized template on articles requiring a more specific notice), and adding the following additional criteria for an article to be tagged:

  1. Does not transclude Template:Wi, or any redirect thereto, which would indicate that the page is a non-article referring the reader to Wiktionary, and that no sources are needed.
  2. Is not a redirect.
  3. Is not a disambiguation page so marked with template:disambig or any redirect to, or including the substring "disambig" as any portion of the page title.
  4. Does not transclude Template:Inuse, or any redirect thereto.
  5. Does not have a title containing the substring "list of" or "lists of" (while lists are not exempted from source requirements, it is often considered acceptable to places sources in articles to which the list links; "lists of" pages usually contain no content subject to WP:VER, but are simply internal directories of articles)
  6. Does not contain any transclusions of any template listed at Category:Citation templates, Category:Deprecated citation templates, Category:Germany external link citation templates, Category:Law citation templates, Category:Medical citation templates, Category:Science citation templates, Category:Specific-source templates, and all subcategories of Category:Specific-source templates.

To avoid producing excessive server load through nearly 3 million API queries to review every article on Wikipedia, the initial identification of unsourced articles would be performed via offline processing of a database dump using AWB's database scanner function. Erik9 (talk) 05:07, 30 May 2009 (UTC)[reply]

Furthermore, in consideration of the guidance in the instructions for template:unreferenced to "Consider not adding this template to extremely short articles", the template will not be added articles transcluding template:stub, or any other template whose name contains the substring "stub" (as all stub templates listed at Wikipedia:WikiProject Stub sorting/Stub types currently do). Instead, the task will add appropriately dated subcategories of Category:Articles lacking sources to unsourced articles transcluding stub templates. Erik9 (talk) 17:51, 31 May 2009 (UTC)[reply]

Discussion[edit]

Am I to understand that you propose to tag every single article that doesn't have a ref or url with ((unreferenced))? What purpose does the multiplication of such cleanup tags serve? (And in any event, it would have to exclude dabs.) Gimmetrow 07:00, 30 May 2009 (UTC)[reply]

To consolidate the criteria for tagging articles from here and task 6, mainspace pages would have template:unreferenced dated subcategories of category:articles lacking sources and category:all articles lacking sources added to them if they met all of the following conditions.
  1. Does not contain Category:Living people
  2. Does contain any transclusions of template:unreferenced, template:BLP unsourced, or any redirects thereto
  3. Does not contain any uses of <ref> tags
  4. Does not contain any raw external links beginning with http://
  5. Does not contain any transclusions of any template listed at Category:External link templates or any subcategory thereof
  6. Does not contain the strings "ISBN", "ISSN", or "OCLC" (case sensitive)
  7. Does not contain any section title beginning with "reference", "footnote", "note", "external link", "source", "citation", "bibliography", "further reading", or "publication"
  8. Has never previously been edited by this bot for the purpose of adding Template:unreferenced, at the same article name.
  9. Does not transclude Template:Wi, or any redirect thereto, which would indicate that the page is a non-article referring the reader to Wiktionary, and that no sources are needed.
  10. Is not a redirect.
  11. Is not a disambiguation page so marked with template:disambig or any redirect to, or including the substring "disambig" as any portion of the page title.
  12. Does not transclude Template:Inuse, or any redirect thereto.
  13. Does not have a title containing the substring "list of" or "lists of" (while lists are not exempted from source requirements, it is often considered acceptable to places sources in articles to which the list links; "lists of" pages usually contain no content subject to WP:VER, but are simply internal directories of articles)
  14. Does not contain any transclusions of any template listed at Category:Citation templates, Category:Deprecated citation templates, Category:Germany external link citation templates, Category:Law citation templates, Category:Medical citation templates, Category:Science citation templates, Category:Specific-source templates, and all subcategories of Category:Specific-source templates.
Any article meeting all of these requirements likely does not have references, but, per Wikipedia:Verifiability, should. Adding template:unreferenced dated subcategories of category:articles lacking sources and category:all articles lacking sources notifies editors of the problem and encourages remediation, both because of the notice produced on the article itself, and because the template adds the problematic articles to Category:Articles lacking sources or appropriate subcategories thereof. Erik9 (talk) 11:29, 30 May 2009 (UTC)[reply]
Also, articles transcluding stub templates will not have template:unreferenced added to them, but will instead be placed into appropriate subcategories of Category:Articles lacking sources, as described above. Erik9 (talk) 17:51, 31 May 2009 (UTC)[reply]
What about templates that come with references (((GR)) is one I think)? - Jarry1250 (t, c) 14:56, 30 May 2009 (UTC)[reply]
I've modified the above descriptions to exclude articles transcluding templates listed in Category:Citation templates and appropriate subcategories thereof. I will also add this functionality to task 6 operations, although I am not aware of any false positives occurring due to articles whose sole references were contained in citation templates and not otherwise indicated. Erik9 (talk) 16:16, 30 May 2009 (UTC)[reply]
Indeed, task 6 has proceeded without any known false positives, which suggests that the proposed task here would also achieve a very high level of accuracy. Erik9 (talk) 19:24, 30 May 2009 (UTC)[reply]
Do stubs need cleanup tags like this? Gimmetrow 19:52, 30 May 2009 (UTC)[reply]
Stubs are not exempted from the source requirements of Wikipedia:Verifiability, nor do template:stub or the various subject matter specific stub templates by themselves indicate unsourced status. Erik9 (talk) 19:59, 30 May 2009 (UTC)[reply]
Some argue that stubs do not need cleanup tags which indicate the equivalent of "expansion". Even ((Unreferenced)) currently says "Consider not adding this template to extremely short articles", and it used to say more than that. Gimmetrow 15:36, 31 May 2009 (UTC)[reply]
Sourcing is not the equivalent of expansion - it is possible to have a stub which cites sources (or a long, unreferenced article), as stub status is determined primarily by the length of an article's text. Moreover, the characterization of template:unreferenced as a "cleanup tag" is itself questionable -- WP:VER is a core Wikipedia policy, which any completely unreferenced article blatantly violates. As a preliminary measure in remedying these policy violations, it is necessary to identify the offending articles. A list of unsourced articles is conventionally produced by placing them in subcategories of Category:Articles lacking sources. While this categorization is usually effectuated by through use of template:unreferenced, if the template is deemed excessively large in proportion to small articles, appropriately dated subcategories of Category:Articles lacking sources could be added to stubs directly. Therefore, I will modify the description of the task to reflect that articles transcluding stub templates will not have template:unreferenced placed on them, but will be appropriately categorized. Erik9 (talk) 17:39, 31 May 2009 (UTC)[reply]
Of course, placing an appropriate template or categories on a number of unsourced articles shouldn't be construed as an invitation to deletionist activism: editors are reminded that, per WP:BEFORE, the preferred remedy for unreferenced articles is to add reliable sources, and that AFD is reserved for articles reasonably believed to be unsourceable by editors familiar with their subject matter:

When nominating an article for deletion due to sourcing or notability concerns, make a good-faith attempt to confirm that such sources aren't likely to exist.

Erik9 (talk) 23:15, 31 May 2009 (UTC)[reply]
I've updated conditions 6 and 7 as described. With regard to accuracy and avoidance of false positives, my bot has a highly favorable record of performing similar identifications of unsourced articles, restricted to biographies of living persons, under task 6. While I obviously cannot have the bot edit any articles under this task prior to a trial approval, I can produce a "dry run", listing in userspace articles to which the bot would have added template:unreferenced or subcategories of Category:Articles lacking sources. It is believed, however, that Template:Better source, Template:Primary sources, etc need not be explicitly tested for, since, by assumption, such templates are only used where the article is sourced to some degree (and thus should not be identified by the bot as unreferenced at all), but the sources are considered to be inadequate. Testing for every template in Category:Citation and verifiability maintenance templates, and every redirect thereto, would require a great deal of effort to prepare regular expressions in the range of several megabytes, the processing of which would slow the bot's operations considerably, almost certainly without any benefit. Erik9 (talk) 02:17, 1 June 2009 (UTC)[reply]
In addition to having the bot identify lists by means of examining the page titles, I can have it avoid articles containing Category:Outlines, or any category whose title contains the substring "list". Erik9 (talk) 02:27, 1 June 2009 (UTC)[reply]
To assist in the evaluation of the accuracy with which this task can be performed, I've created the first dry run, which lists 896 articles to which the bot would have added template:unreferenced or a dated subcategory of category:articles lacking sources. Erik9 (talk) 03:39, 1 June 2009 (UTC)[reply]
While most of the articles listed in the first dry run were correctly identified as unsourced, a potential oversight is revealed in articles such as Adaptive predictive coding that transclude template:FS1037C, which can be construed as a source citation. However, not all templates listed in Category:Attribution templates are sufficiently specific as to constitute references (consider Template:USGovernment, for example). Tomorrow, I will begin a careful review of the attribution templates to determine which are acceptable references, and configure the bot to exclude articles transcluding them. Erik9 (talk) 03:52, 1 June 2009 (UTC)[reply]
I've created a second dry run, listing 2604 articles identified as unsourced by a recent reconfiguration of the bot. Erik9 (talk) 04:05, 2 June 2009 (UTC)[reply]
Will it remove refimprove when it adds unreferenced, or...? - Jarry1250 (t, c) 14:02, 2 June 2009 (UTC)[reply]
In the second dry run, the bot was operated under the assumptions that it would only be adding template:unreferenced or subcategories of category:articles lacking sources to articles which are entirely unsourced, and that template:refimprove or similar templates would only be used on articles which have some sources, but whose sources are considered to be inadequate, and would therefore not appear on an article that the bot would edit. The second assumption hasn't held, however: a number of articles listed in the second dry run incorrectly contain ((refimprove)), even though they are entirely unsourced. Additionally, one article transcluding the refimprove template, Airspeed Ltd., was incorrectly identified as unsourced because the phrase "further reading" was not tested for in section titles, a problem which I have since remedied. There are two ways to proceed when encountering a "refimprove" or similar template: the bot could assume that the template was correctly placed, and that the article was incorrectly identified as unsourced, and should be skipped. Alternatively, the "refimprove" could be assumed to be incorrect, and removed in the process of placing template:unreferenced or a subcategory of category:articles lacking sources in the article. I prefer the latter approach, as I believe that the bot's determination of source status can be achieved with a high level of accuracy. Erik9 (talk) 00:50, 3 June 2009 (UTC)[reply]

Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. This will probably be the first of several trials. – Quadell (talk) 13:40, 3 June 2009 (UTC)[reply]

Done; the results are available at [2]. Erik9 (talk) 03:36, 4 June 2009 (UTC)[reply]
Why did the bot add] Category:All articles lacking sources to Collagen helix, rather than ((unreferenced))? – Quadell (talk) 14:49, 4 June 2009 (UTC)[reply]
Because the article contains a stub template (((protein-stub))). --R'n'B (call me Russ) 15:11, 4 June 2009 (UTC)[reply]
I see. I looked through the rest of the contribs, and they look good to me. Anyone else have comments? – Quadell (talk) 15:46, 4 June 2009 (UTC)[reply]
You're right. Articles transcluding template:refimprove or similar templates will not be edited under this task. Erik9 (talk) 00:47, 5 June 2009 (UTC)[reply]

Approved for trial (150 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Per a previous request, I'm extending the trial for further examination before approval. – Quadell (talk) 12:52, 5 June 2009 (UTC)[reply]

Done, [3]. Erik9 (talk) 15:39, 6 June 2009 (UTC)[reply]
A comment regarding the current design of this task was made on User_talk:Erik9bot#Unreferenced_categories: Keith D contends that subcategories of Category:Articles lacking sources should never be added to articles directly, but only through template:unreferenced. As originally conceived, this task only involved the addition of template:unreferenced to articles; the direct addition of subcategories of Category:Articles lacking sources to articles transcluding stub templates was implemented to address objections by Gimmetrow that template:unreferenced should not be placed on stubs. Personally, I prefer the original task design under which only template:unreferenced would be added to articles; if few editors share Gimmetrow's concerns with the addition of this template to stubs, I would like to run the task under the original design. Erik9 (talk) 20:26, 6 June 2009 (UTC)[reply]

Objection[edit]

I object. Instead of slapping a hideous tag on the tops of articles that have no sources -- a tag in the face of readers, who we should be serving -- why not create a list of such articles needing references somewhere editors can see it? Editors are those people willing and able to add references. Readers outnumber editors by hundreds to one, if not thousands. How does this giant, disruptive, ugly tag help us write an encyclopedia? Think of the readers first! Create a list, a category, a talk page tag -- anything but millions of needless, unsightly blemishes. Antandrus (talk) 21:55, 7 June 2009 (UTC)[reply]

It sounds like you're saying that ((Unreferenced)) shouldn't be used at all. Is that what you're saying? – Quadell (talk) 22:16, 7 June 2009 (UTC)[reply]
At User_talk:Erik9bot#Unreferenced_categories, Keith D objected to the placement of articles in subcategories of category:articles lacking sources directly, because editors adding sources are unlikely to notice and remove the categories (then subsequently objected to the placement of template:unreferenced in articles, because he regarded the template as too big and noticeable). Under the assumption that Keith D is probably alone in leaving us with no method by which to categorize articles as unreferenced at all, I propose the following resolution to objections that template:unreferenced defaces articles: that we extend the bot's current behavior with respect to stubs, so that the application of template:unreferenced is forgone entirely, and dated subcategories of category:articles lacking sources and category:all articles lacking sources are added directly to any articles identified as unsourced. The creation of "millions of needless, unsightly blemishes" would be completely avoided: since categories used to indicate articles' unsourced statuses are hidden, their addition will not alter articles' appearances by so much as a single pixel. Erik9 (talk) 22:26, 7 June 2009 (UTC)[reply]

I may have missed it above, but how many pages is this bot expected to edit (at least initially)? For the record, off-hand I don't see much reason to object to this task. --MZMcBride (talk) 22:31, 7 June 2009 (UTC)[reply]

I estimate that there are at least 50,000 articles which would be identified as unsourced by this task. At this point, I have no strong preference as to whether that identification is performed by means of template:unreferenced or direct addition of subcategories of category:articles lacking sources, so long as something can be approved to categorize the unsourced articles. Erik9 (talk) 22:38, 7 June 2009 (UTC)[reply]
A hidden subcategory -- thereby giving a list for people interested in, and able to add references -- would not be a problem for me. In fact that would be useful. What I object to is a giant tag on top of millions of articles. Our QC mechanisms ought to be internal, and announcing to the world that the article lacks references is pointless and intrusive. Any intelligent reader can see that there are no references at the end, or within the article. Quadell, yes, I don't like that tag, or any of the others claiming that an article "needs" this or that (typically inline cites), and it absolutely floors me that no one else is objecting to the proliferation of these odious things! Maintenance tags belong in places that editors can see them.
If only logged-in users could see them, that's another way to solve the problem. Antandrus (talk) 22:54, 7 June 2009 (UTC)[reply]
Have you tried nominating the tag for deletion, to see if there's consensus for getting rid of it? – Quadell (talk) 11:02, 8 June 2009 (UTC)[reply]
Quadell, you and I have been Wikipedians for more than five years. Both you and I know perfectly well what would happen if I were to do this. While I believe this template is pointless, I know I am in a minority, and that such a debate would generate drama and acrimony; you yourself pointed out below that it has happened exactly that way before; and while I'd get a handful of support votes from that small group of surviving old-timer article writers and editors, the debate would be deluged with flames and "snowball" and "pointy" keep arguments from others. It's not worth the stress. It's a pointless template: that an article lacks references is obvious to any reader who is more than half-awake, and slapping an "unreferenced" template is about as useful as hanging an "I don't have any clothes" sign on naked people walking on the street. Making a hidden category, as Erik suggests, is a perfect solution, since it generates an accessible fix-list. If I ever thought I'd get some serious support I'd be happy to nominate all the hideous top-of-article maintenance tags for deletion, for they're a blight, a plague, and they are metastasizing throughout our project, appearing much faster than they are being removed. If I had my way I'd destroy them all and replace them with hidden mechanisms such as invisible categories accessible to the editors interested in making the requested improvements. Antandrus (talk) 00:51, 9 June 2009 (UTC)[reply]
Of course! Talking here, to -bot approval department desk, subdivision code check 9, is not the correct place for your objection, Antardus: you need instead to go to Department 4: Tag Rationale Assessment. You see, instead of asking whether it's a good idea to do the thing, the -bot approval only asks whether the -bot will do it the way other things have been done. And this is exactly how Wikipedia turns from volunteer project into a series of Projects and departments and ownerships and fiefdoms. When next I remove the tag placed by this -bot, I will be told that the -bot was approved and, implicitly, it must have been doing a good thing -- even though no one thought about asking the question. Geogre (talk) 11:22, 8 June 2009 (UTC)[reply]
Your sarcasm really isn't helpful. However, your objection (below) is. – Quadell (talk) 13:06, 8 June 2009 (UTC)[reply]
Object: I'm with Antardus, here. This isn't about coding. It's about using the tag. "Should the tag be used at all?" Why, no, I don't believe it should. I think, instead, a person should get in and get his or her hands dirty and do some dang work and sofixit. I don't think that bombing runs with tags actually accomplish anything at all. They certainly don't do anything for readers, and we are here to serve the reader not please ourselves. However, if the tag exists, it exists for low-input atrocity articles, and yet a -bot is, by nature, blind. It cannot tell the difference between "Zanzibar is a candy bar made by Mars" as the whole of an article, where putting such a tag might be a way of affixing a flashing light, and something like a hagiography, where the references exist and where the material is common knowledge (i.e. found in dozens of sources and not disputed, and therefore, in any academic setting, never to be cited except internally). So, no, this is not "never use the tag," and no, this is not "wrong department." This is "OBJECT TO A -BOT SLAPPING TAGS ON THE FIRST LINE OF ARTICLES" until such a date as -bots learn to read and tell the difference between an article that needs and an article that does not need a tag. Geogre (talk) 11:22, 8 June 2009 (UTC)[reply]
I see that two editors vociferously object to a bot putting these tags on articles, but more than that, it sounds like both object to humans putting these tags on articles as well (though Geogre seems to at first say the tag shouldn't be used at all, and then later say the should be used only on atrocity articles). I don't have any opinion on this, but if the tag isn't useful it should be deleted, it seems to me. It was nominated for deletion at Wikipedia:Templates for deletion/Log/2007 September 1#Template:Unreferenced, and there was additional discussion at Template talk:Unreferenced#This is the stupidest template: why not delete it.3F, some of it needlessly acrimonious. But glancing through the histories, it appears that hundreds of editors have added this template to tens of thousands of articles, so it's clear some editors find it useful. Either (a) the tag is not helpful, (b) the tag is helpful, or (c) the tag is sometimes helpful, but this bot is making mistakes on which articles should get the tag. If it's (a) then the tag should be deleted; if it's (b) then the bot should run; and if it's (c) then please link to articles where this bot added a tag inappropriately. I am not invested in whether this bot runs or not, and I won't approve it without consensus, but please, can we discuss it civilly? – Quadell (talk) 13:06, 8 June 2009 (UTC)[reply]
It's quite obvious that there is significant objection to the application of template:unreferenced to 50,000+ articles by means of a bot. As a bot operator, it is my responsibility not to run controversial tasks, whether or not approval could be secured for them, and to provide adequate notice to the editorial community whenever a task has a significant probability of proving controversial (which is why I added a link to this discussion in template:cent [4], though I was under no requirement to do so). Based on comments by Gimmetrow, Antandrus, Keith D, and Geogre opposing the automated addition of template:unreferenced to articles, I am revising the task for which approval is requested: use of template:unreferenced would be forgone entirely, and dated subcategories of category:articles lacking sources and category:all articles lacking sources would be added directly to any articles identified as unsourced. As these categories are hidden, their addition would not alter the appearance of articles to readers, and would serve the sole purpose of creating an accurate list of unsourced articles for the benefit of editors interested in working on them. Erik9 (talk) 23:15, 8 June 2009 (UTC)[reply]
How does it deal with Harvard Referencing? If it can't, I object --Joopercoopers (talk) 14:03, 8 June 2009 (UTC)[reply]
Parenthetic references should be accompanied by full bibliographic info somewhere else in the article. The task supposedly excludes articles with sections called "reference[s]", "note[s]", "source[s]" and so on. But it might be an issue for articles where the sources are given in section with a non-standard name, or without a name. Gimmetrow 14:46, 8 June 2009 (UTC)[reply]
It's exceedingly unlikely that an article would have references which are completely unformatted in any way: no <ref> tags, no descriptive reference section header, no external links, no ISBN numbers, etc. I would venture that this circumstance is so rare, that the false positives it caused would be fewer in number than those resulting from human error made in the course of categorizing articles manually. In this very rare event, an editor reviewing the categorization of an article as unsourced should add or correct the name of the header for the section in which the references appear when removing the categories. Since the addition of template:unreferenced to articles is no longer a task for which approval is requested, and since category:articles lacking sources and category:all articles lacking sources are hidden, any "defacement" concerns caused by these extraordinarily uncommon potential false positives would be avoided. Erik9 (talk) 23:29, 8 June 2009 (UTC)[reply]

Why did the first dry run include Chomsky (surname) (dab), Claudine (stub) and Clement Martyn Doke ("Publications")? The second dry run included Chesterton Range National Park (stub) and Chief Minister of the Northern Territory (list). These are just from spot-checking a half-dozen links on each page. Would it tag Municipalities of Chiapas? It looks like the task fails on technical grounds alone. Gimmetrow 14:32, 8 June 2009 (UTC)[reply]

The bot's regexps were revised significantly as a result of errors detected in the first and second dry runs, such that the "dry run" results are no longer reflective of the bot's current configuration. Indeed, one of the purpose of having a "dry run" is to identify and repair potential problems before they result in incorrect live edits. Since the bot has already undergone two live edit trials for this task [5] [6], totaling 201 edits, apparently without any errors, I suggest that it would be more helpful to consider the most recent live testing as reflective of the bot's current functionality. Though now largely irrelevant, the format of the "dry run" lists should be clarified: they include all articles that the bot would have edited in some way at the time, whether for the purpose of adding template:unsourced (for articles which did not transclude stub templates), or a dated subcategory of category:articles lacking sources (for articles which did transclude stub templates). Finally, Chief Minister of the Northern Territory includes significant non-list, and unsourced, content. Erik9 (talk) 22:55, 8 June 2009 (UTC)[reply]
Granted, the algorithm can improve, but the last live run included [7] tagging a stub. I thought stubs would be excluded? I'm not really convinced that this task is using a sufficiently fine tool to be useful. Gimmetrow 01:44, 9 June 2009 (UTC)[reply]
The task, at the time it was approved for the second trial [8], described the following behavior with respect to stubs:

Furthermore, in consideration of the guidance in the instructions for template:unreferenced to "Consider not adding this template to extremely short articles", the template will not be added articles transcluding template:stub, or any other template whose name contains the substring "stub" (as all stub templates listed at Wikipedia:WikiProject Stub sorting/Stub types currently do). Instead, the task will add appropriately dated subcategories of Category:Articles lacking sources to unsourced articles transcluding stub templates. Erik9 (talk) 17:51, 31 May 2009 (UTC)

Therefore, the addition of appropriate categories to an article transcluding a stub template [9], instead of adding template:unreferenced, the default behavior for the task at the time, was correct. Indeed, this special treatment of stubs was performed in response to your comments, so you should be familiar with it. Your claim that articles transcluding stub templates would not be edited at all under this task is blatantly and obviously false; please do not continue to disrupt this discussion with misrepresentations of fact. Erik9 (talk) 02:35, 9 June 2009 (UTC)[reply]
If you check above, you will see that I do not want stubs edited at all. If you cannot respond here without making accusations of disruption, of all things, perhaps you shouldn't be operating bots. Gimmetrow 02:45, 9 June 2009 (UTC)[reply]
The trial approval was granted based on my description of the task, not what you would have liked the task description to have been. To describe an edit as an error because, while it was consistent with the approved task, it was inconsistent with your preferences, is a gross misrepresentation of fact, and deliberately and maliciously disruptive. If you continue, I will ask that an administrator ban you from further participation in this discussion. Erik9 (talk) 02:52, 9 June 2009 (UTC)[reply]
I object to more tagging of stubs. If you disagree, fine. Above you say "Stubs are not exempted from the source requirements". That is true, but "source requirements" (WP:V) only require sources for items likely to be challenged. Even then they do not require them to be in any particular machine-readable format. Since the bot cannot fully parse articles to determine if sources either need to be present or are present in a non-standard format, and the bot operator is unwilling to implement rules to avoid false positive tagging, this bot request should be denied. Gimmetrow 16:17, 11 June 2009 (UTC)[reply]
While hypothetical articles which have completely unformatted references may exist, they appear to be exceedingly rare: the bot's two live edit trials for this task [10] [11] seem to not have uncovered any of them; as mentioned above, human error in categorizing unreferenced articles would likely result in a higher rate of false positives. You state above "naive editors put what really are usable references in as external sources", which I believe reflects a misunderstanding of the task for which approval is requested: though what you mean by "external sources" is somewhat vague, any article which has an "external links" section (even an empty one), or any external links using the http:// syntax (or any of hundreds of identified external link templates) would not be categorized as unreferenced. While all members of the Wikipedia community are welcome to comment at BRFAs, I would ask that you adequately familiarize yourself with the technical details of bot requests before doing so. Erik9 (talk) 00:08, 9 June 2009 (UTC)[reply]
Also, the description of the task for which approval was requested was updated in template:cent a mere two minutes before your comment [12]; the MediaWiki software can easily take longer than two minutes to update all transclusions of such a widely used template. Since your objection to the bot's non-identification of "external sources" suggests that you apparently have not reviewed the details of the task presented here, I find it necessary to ask, especially in light of your claim that "New users have enough trouble as it is", whether you object to the original task (which uses template:unreferenced), or the revised task (which only involves hidden categorization)? Erik9 (talk) 01:00, 9 June 2009 (UTC)[reply]
We want to decrease the extent of formality required in writing WP articles. I cannot disagree more. Wikipedia has to operate under a universal writing standard in order to be considered a legitamite encyclopedia. We cannot allow unsourced articles here and just excuse them as being written by new users. New users, like all users, have to obey our verifiability policy. ThemFromSpace 18:20, 9 June 2009 (UTC)[reply]

Suggest a compromise. My stand is that the bot should just slap the unreferenced tag on the articles, and because of the scale of this I hate the idea of just using the hidden categories. However there seem to a be some opposition to that. My suggestion is that the bot instead of slapping the tag on the top adds a reference section, and adds ((unreferenced|date=June 2009|bot=yes)) in that section. The |bot=yes parameter then hides the template, but the categories will be added. The empty reference section will also show the sorry state of the article. All that has to be done is some template coding. I am sure that there is someone that can fix that very quickly. Rettetast (talk) 00:41, 9 June 2009 (UTC)[reply]

This may be a technically superior method of implementing the basic task for which approval is currently requested (hidden categorization), with the important benefit of leaving "unreferenced" at the top of the wikitext of articles, allowing the template to easily be removed when references are added, instead of searching for two categories (a dated subcategory of category:articles lacking sources and category:all articles lacking sources) at the bottom of the wikitext. Erik9 (talk) 00:48, 9 June 2009 (UTC)[reply]
(ec)This suggested compromise makes the whole task quite pointless. I can stomach the stubs being given just a category. Hopefully when they are large enough to have the stub tag removed, the same editor will take care of category as well. (I imagine you are placing the cat adjacent to the stub template) But to add a hidden category to this large a number of articles in such a backlogged category is ridiculous. Without a tag to prompt people, the chance of this category being removed when the article is source are next to nothing. They are not likely to even be editing the same section of the article that would show the cat in wikitext. Considering that working through each monthly category takes longer than six months and that we are currently on August 2006; we are looking at 17 years before this month's category is examined in a comprehensive fashion. I really object to the task being done in this manner unless the bot is also going to regularly be run to remove the hidden cat. It is unreasonable to expect people who add sources in the normal course of editing to recognize they need to remove a cat they are very unlikely to notice. And no one working from the categories will be touching these articles soon enough to make the task reasonable.BirgitteSB 13:42, 9 June 2009 (UTC)[reply]
Just one comment. Editors don't only work on the oldest categories. People can find articles they are interested in through WikiProjects cleanup listings. The categories will help WikiProject identify their articles. Rettetast (talk) 14:18, 9 June 2009 (UTC)[reply]
I am sorry but you do not make sense. There are 140,000 unreferenced articles tagged right now, in 34 monthly subcategories. How can this bot run be necessary to help people populate Wikiproject cleanup lists? This monthly run will be huge. It will make the June 09 subcat the largest most difficult one to use to find interesting articles. There are plenty of articles for people to work. I am not even opposed to tagging more. But not-tagging them while placing an invisible cat that is highly probable to remain in the article long after becoming obsolete is just making pointless work for the future editors who will have to remove these cats from articles that have been referenced in the meantime. Use the template so people recognize they need to remove something, or do it this way and run cat-removal bot tasks regularly. But the revised proposal does nothing to help the encyclopedia nor the maintenance of unreferenced article categories. It will just discourage regular editors from helping with maintenance by hiding it from them, and waste the time that people have committed to referencing articles in the future with false positives.BirgitteSB 17:36, 9 June 2009 (UTC)[reply]

Recap[edit]

Okay, so Erik9 has changed the scope of this project due to comments above. The bot will not add visible templates to any pages. It will only add an invisible template (to add maintenance categories), and only to those pages that clearly have no references in any format. What do we think of this task under those conditions? – Quadell (talk) 13:39, 9 June 2009 (UTC)[reply]

I edit conflicted with you. But adding this invisible stuff to articles is not at all reasonable unless they are being maintained at a good pace.BirgitteSB 14:04, 9 June 2009 (UTC)[reply]
Maintenance categories helps WikiProject identify articles that need work. The tagging of unreferenced BLPs that this bot has done before has helped a lot at WP:FOOTY. Rettetast (talk) 14:14, 9 June 2009 (UTC)[reply]
I am not against maintenance categories, please read my stated objections above.BirgitteSB 17:38, 9 June 2009 (UTC)[reply]
It's fine with me under the new scope (invisible category, no visible tag). Antandrus (talk) 15:02, 9 June 2009 (UTC)[reply]
I'm not sure where to put this, but this bot is exactly the thing that thousands of articles need to get cleaned up. I've gone through many uncited BLPs with tags placed by bots and only a small percentage of them are cited, and those are usually improperly done. I'd support the tags being visible myself as I've stumbled across several BLPs myself with the tag placed by bots and I was able to cite them up to reasonable standards. Bravo to whoever's idea this was, its a shame the tags aren't going to be visible. ThemFromSpace 18:18, 9 June 2009 (UTC)[reply]
Why? You can't use the category? I don't understand why you would "stumble" across one and think it was ok except that a -bot had told you otherwise? I would hope that, even doing Random Page, you're at least looking, if not reading in detail, and that should give you more information than a -bot will gather. Nevertheless, do try using the category system: it will allow you to be very efficient and helpful with these tasks, and, honestly, it's what the tag system should be doing in the first place. Geogre (talk) 13:10, 11 June 2009 (UTC)[reply]
You believe the category system for unreferenced articles is efficient??? Look I understand you dislike tags and I don't care to argue with people about preferences. We can disagree there without argument. But I cannot allow you to claim that the hidden categories alone will achieve anything useful, much less do so efficiently. You are obviously unfamiliar with the practical working of these categories. If you dislike the tags, argue against the bot run entirely. But please don't actively promote messing around with a category system you clearly do not understand and do not care about. Saying the bot run is fine with you if invisible is understandable. But you are just exposing yourself by mocking people for not being content with the "useful and efficient category system". Just because something is invisible and concerns an aspect of Wikipedia that you stay out of and therefore is not an obviously bad idea to you; doesn't mean it is necessarily a good idea.BirgitteSB 13:59, 11 June 2009 (UTC)[reply]
It's fine with me if it's invisible. What I have a problem with is the assumption that a categorization and retrieval need (the idea that we need to file articles into piles whereby we can find those without references, the idea that we should have an easy way of getting to them) should be tantamount to saying that there is an informational need across the board in each of these articles. Without specific reading, there is no way to know this, and the assumption that the mechanical, formal presence of one (no "references" in a formally, wiki markup recognized (i.e. -bot perceptible) way) equals the absolute presence of the other is horrible. To then unleash a -bot that would operate on that assumption and not only assert this, but do so on the top of the article, in visible text, with a tag that tells the reader that there is a fault with the article, is evil. I stay out of discussions of formal matters. If people want to shuffle articles this way or that, it's nothing that concerns me. However, the fact that something does or does not fit a shuffle does not mean that the article has a discursive or rhetorical fault, and this -bot was making that assertion. Geogre (talk) 13:04, 11 June 2009 (UTC)[reply]
I just don't see how a compromise is going to be reached here. You can make the algorithm as clever as you like, it's never going to be clever enough to be able to say - oh look! this article isn't like the others, a naive editor has referenced it in a new way I'm unfamiliar with. I'm not a fan of the tags in the first place, but having a bot slap them on without any judgement or thought seems like folly to me. Can it now cope with parenthetical references - if so how? Is it just look for parentheses? How does it deal with "Venturi went on to discuss the issue further in his 1972 treatise, Learning from Las Vegas, arguing "xyz quote from the text""? --Joopercoopers (talk) 00:47, 14 June 2009 (UTC)[reply]
most of my objections do not apply if the bot is running as now suggested, but this one, on informal references, cetainly does. Adding a screen for OCLC etc. helps, but not sufficiently, as normal citation practice elsewhere often does not include this information. It has to be able to cope with parenthetical references, and with Harvard referencing that does not have a proper references section as it ought to. Articles with such problems are not the serious major problem that true unreferenced is, but just require minor technical upgrades (which partially might well be handled by bot, in fact, though I doubt anyone will find an algorithm which can catch them all.) I have seen many many articles tagged as "unreferenced" although some informal documentation is present--some even get nominated for speedy or prod--which is where I usually encounter them. DGG (talk) 18:32, 15 June 2009 (UTC)[reply]

Special Project[edit]

I set up Wikipedia:Unreferenced_articles#Special_Project to encourage manual review. Jeepday (talk) 09:44, 27 June 2009 (UTC)[reply]

Bot Specific[edit]

  1. Don't auto tag articles with ((unref)), there are more un-formatted references out there then indicated by findings so far.
  2. If you are going to put articles in categories by bot, use a category that indicates it was bot populated. People can go through and add unref.
  3. If you are going to add an unref category by bot, go through by bot and remove them from the category when they get a reference related tag by a person. We have tried bot (Wikipedia:Bot_requests/Archive_14#Template:Unreferenced_bot_request) to change unref to ref improve, and it did not work.
  4. If the unref category indicates it is bot populated (and separate from human populated), and the category is hidden, the only impact to people is to those who choose to work in the category. This would be reminiscent of User:Triddle/stubsensor which was fun and got lots of volunteers.
  5. I think many editors at Wikipedia:Unreferenced articles would enjoy reviewing articles in a bot suspected unref category as a change once in a while. I know I would. We can discuss if it should be a separate project or part of ours over there. In any case I would volunteer to included and maintain it's human interactions at Wikipedia:Unreferenced articles until if/when it gets it's own project.

Jeepday (talk) 20:38, 9 June 2009 (UTC)[reply]

An excellent idea. I would suggest that performance of the task in the manner described above would be sufficiently uncontroversial for this request to be approved on that basis. Erik9 (talk) 11:45, 12 June 2009 (UTC)[reply]
Okay, in response to user feedback, the specs have changed to Jeepday's suggestion. Does anyone have an objection to running the bot under those specs? – Quadell (talk) 12:01, 12 June 2009 (UTC)[reply]
According to #3, the bot is going to periodically go through the category and check for other tags? (Presumably that means also checking for references added.) How often? And I still think stubs should be excluded entirely from these taggings. Gimmetrow 12:17, 12 June 2009 (UTC)[reply]
All articles categorized pursuant to this task will be reviewed at a minimum frequency of once per week, and removed from the category if they no longer meet the same criteria applied for initial categorization. Erik9 (talk) 02:20, 13 June 2009 (UTC)[reply]

Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Okay, let's try 50 edits under these specs. – Quadell (talk) 22:58, 14 June 2009 (UTC)[reply]

Done, [13]. The trial proceeded correctedly except for the categorization of Alexander of Hales (edit | talk | history | protect | delete | links | watch | logs | views), which contained an unformatted parenthetical reference that I have since fixed [14]; this is the first such article encountered during the course of the 250 total edits made pursuant to trials of this task. The present design anticipates the inability to achieve a 100% rate of accuracy due to the existence of an exceedingly small number of articles with unformatted references, through the use of a hidden category specific for the task. It should be noted, of course, that the manually produced Category:All articles lacking sources is likewise not 100% accurate, both due to human error, and the addition of references after template:unreferenced is added to articles, without concurrent removal of the template; the periodic automated review of articles categorized pursuant to this task eliminates the latter problem. Erik9 (talk) 02:18, 15 June 2009 (UTC)[reply]
On a minor note, an extra newline was inserted between the articles' existing categories and Category:Articles lacking sources (Erik9bot), a problem which I have since corrected; however, since making large numbers of edits for the sole purpose of adding or removing whitespace is disfavored as creating excessive server load, I will not be removing the newlines from the articles to which they were added. Erik9 (talk) 02:27, 15 June 2009 (UTC)[reply]
I have also written and successfully tested [15] an implementation of the task to remove all articles from Category:Articles lacking sources (Erik9bot) which no longer meet the criteria initially applied for categorization. Erik9 (talk) 02:46, 15 June 2009 (UTC)[reply]
I certainly have no objections at all to a small trial, to be followed by a larger one of perhaps 500 edits, before going against the entire database--and even then I'd suggest running it in a segment first, with time to review. I think by now the screen is quite subtle, and looks like an reasonable way to approach it. What the real need is, is to identify those unreferenced articles where this also gives a serious doubt about accuracy or completeness or NPOV or spam, rather than treating everything as of equal priority. DGG (talk) 18:37, 15 June 2009 (UTC)[reply]

Approved for extended trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete.How about you do no more than 100 edits per day until June 25. That'll give interested parties time to fully investigate the bot's performance. (Besides, it gives me the rare opportunity to use the ((BotExtendedTrial)) template.) – Quadell (talk) 13:04, 18 June 2009 (UTC)[reply]

Done for June 19 [16]. Also, 15 articles were removed from Category:Articles lacking sources (Erik9bot), as they no longer met the criteria for categorization [17]. Erik9 (talk) 00:27, 19 June 2009 (UTC)[reply]
Done for June 20 [18]. Erik9 (talk) 15:51, 20 June 2009 (UTC)[reply]
Done for June 21 [19]. Erik9 (talk) 14:09, 21 June 2009 (UTC)[reply]
The remaining edits are at [20]. Erik9 (talk) 02:31, 25 June 2009 (UTC)[reply]

Any comments / suggestions / criticism? Last call... – Quadell (talk) 12:32, 25 June 2009 (UTC)[reply]

 Approved. Good to go. – Quadell (talk) 13:11, 27 June 2009 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.