The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Denied.

Cleanbot[edit]

Operator: Lightmouse (talk)

Automatic or Manually Assisted: Automatic

Programming Language(s): AWB

Function Summary: Delink dates except for solitary years.

Edit period(s) (e.g. Continuous, daily, one time run): Continuous

Already has a bot flag (Y/N): No.

Function Details:

Guidance at wp:mosnum says: The linking of dates purely for the purpose of autoformatting is now deprecated..
Guidance at wp:context also deprecates such links.
Featured Articles, Featured lists, Good Articles, Peer review, and wikiprojects are implementing the guidelines. I suspect that the bots that were adding links to date elements have stopped and many editors are removing date links. Some people are not yet comfortable with a bot delinking solitary years, therefore solitary years are specifically excluded from this bot request.

The code already exists and has been well tested on manual edits.

Discussion[edit]

You may be aware that delinking is already taking place on quite a large scale by multiple editors using manual methods. As far as I am aware, the suggestion that you make has not been an issue. There was a discussion about a day+month link being valid because it is a significant annual event but the response seemed to be that annual events should link to the relevant article (e.g. Guy Fawkes Night) rather than the date. Lightmouse (talk) 10:51, 20 October 2008 (UTC)[reply]

In the first line of Guy Fawkes Night, there's a link to November 5. Do you consider this an error? --Carnildo (talk) 21:58, 20 October 2008 (UTC)[reply]
I am not sure. What do you think? Lightmouse (talk) 22:01, 20 October 2008 (UTC)[reply]
I think it's fine, and that your bot shouldn't automatically remove standalone day+month links. --Carnildo (talk) 23:29, 20 October 2008 (UTC)[reply]
the Guy Fawkes Night article does a fine job of clarifying what date it's observed on and why, and a link to a list of other things that have happened on November 5th throughout history contributes nothing (0) to anyone's understanding of the article. i strongly support a bot unlinking this kind of thing along with full dates. Sssoul (talk) 15:25, 21 October 2008 (UTC)[reply]
Only a tiny fraction of articles describe annual anniversaries and it is relatively easy avoid those. I note you used the phrase "standalone day+month links". Are you implying that you would support a bot that removes day+month+year links? Lightmouse (talk) 08:49, 21 October 2008 (UTC)[reply]
I don't care either way on full day+month+year links. --Carnildo (talk) 20:01, 21 October 2008 (UTC)[reply]
I'm all for this task, but why aren't you delinking solitary year links as well? I don't see any added value with them either. And can you make the source code for this bot available? Parsing out date syntax is a bit trickier than you might realize. There are all sorts of edge cases that are valid syntax that require special handling. For instance, here is only a partial implementation in JavaScript. The regular expression approach proved to be too limiting, so you'll really need to use a full-on grammar parser. In particular, the linked script does not handle properly delinking dates that are followed by a word beginning with the same letter as the name of any month, because it only uses single character look-ahead. --Cyde Weys 15:07, 20 October 2008 (UTC)[reply]

Thanks for your support. I agree with you that solitary years are still a problem. I am not delinking solitary years because Shereth said that he/she would block me if I did. See http://en.wikipedia.org/w/index.php?title=User_talk:Lightmouse&oldid=244095391#Bot_stopped

I am aware of the problems of date syntax. In fact, I have improved on User:Cyde/monobook.js/dates.js, you might want to replace that with the vastly superior (I think) User:Lightmouse/monobook.js/script.js. I also have several variants of AWB code that can be made available to you. Feel free to contact me at my talk page. Lightmouse (talk) 21:17, 20 October 2008 (UTC)[reply]

Don't be so dramatic, Lightmouse. I never said I would block you. What I did say is that I would block the bot if it resumed delinking them without consensus. MOSNUM does not mention solitary year links. I understand that, in your point of view, they are low-value links (and I am inclined to agree), however it is a perennial issue where editors are complaining on various notice boards about bots unlinking these years without any kind of mandate to do so. Let me make it clear - I do not oppose de-linking years using a bot, but I will take action to prevent its operation until such a mandate has been established. Given the scope and recurring nature of the complaints, the consensus of a small group of editors (such as those watching MOSNUM or the BAG) is insufficient to demonstrate any kind of mandate. For what it is worth, I support the above proposed bot as-is. Shereth 13:38, 21 October 2008 (UTC)[reply]

Thanks. Lightmouse (talk) 13:43, 21 October 2008 (UTC)[reply]

That rfc sought consensus for linking birth/death dates. The rfc failed by 18 oppose and 17 support. That doesn't look to me like consensus for special treatment. However, to respond at the purely technical level, I can't see how a bot can distinguish the purpose of a date link. If birth/death dates are always in a particular format, it might be technically possible (e.g. check if the 5 preceding characters constitute the word: 'born' followed by a space), desirability of such a solution is another matter. Note that it will not delink any birth/death date that is a solitary year. Lightmouse (talk) 14:57, 21 October 2008 (UTC)[reply]

Ah, can you point to this rash of edit wars between British and American editors? WP has matured enough that this is no longer the issue it was in 2003, when the date autoformatting was foolishly adopted. DA has been removed from many many thousands of articles (weighted towards the prominent and much visited ones) over the past months, and those who complain are restricted to a tiny, vocal group of WPians. The reactions of normal editors range from highly enthusiastic ("about time", etc) to not caring or having thought about it. It's an opportune time for us all to concentrate on the readers, who gain no benefit from the autoformatting whatsoever (they're not registered, logged in and preferenced), yet experience the significant dilution-through-bright-blue-speckling of the high-value links in the vicinity of dates. The sooner this cancer is cleaned out of WP, the better for the project. It was a mistake, and we should all admit it now. Tony (talk) 03:11, 22 October 2008 (UTC)[reply]

How would the bot deal with links such as [[Independence Day (United States)|July 4th]]? Does it have an algorithm to detect that the article linked to is not a date article, or does it have a list of known anniversaries that should be left alone? --Gerry Ashton (talk) 05:30, 22 October 2008 (UTC)[reply]

Proposal modification[edit]

A template or HTML comment shall be created, perhaps called ((NoDateBots)). Every bot that processes dates shall be required, upon pain of blocking, to recognize this template and not automatically process any page containing this template. Editors may place this template in pages that have unusual date syntax that tends to be misprocessed by bots. --Gerry Ashton (talk) 16:16, 21 October 2008 (UTC)[reply]

This seems reasonable to me and a good way to reduce tensions over this issue. This proposal doesn't come across as too "pro-linking" to me; it just requires us to stop and talk about it if there's something special going on in a particular page; we can always remove the "no datebots" tag on a page if it doesn't seem warranted. The "upon pain of blocking" is a bit dramatic for my taste, but I get the idea. - Dan Dank55 (send/receive) 16:24, 21 October 2008 (UTC)[reply]

This does not sound like a 'proposal modification', it sounds like a 'proposal for a template'. I have no objection to people making other proposals but please can we confine this page to discussion about the bot that will:

Lightmouse (talk) 16:32, 21 October 2008 (UTC)[reply]


OPPOSE. All bots encounter situations they are too stupid to deal properly with. There is nothing stated in the proposal to prevent this bot from engaging in an edit war where the bot makes a mistake, an editor fixes it, the bot reprocesses the article and makes the same mistake again, and so on forever. --Gerry Ashton (talk) 16:44, 21 October 2008 (UTC)[reply]

Can you give an example page where it would be a mistake? Lightmouse (talk) 16:49, 21 October 2008 (UTC)[reply]
Of course I can't provide an example, since the Cleanbot does not yet exist. However, a look at the history of Lightmouse's talk page (and archives) shows numerous exaples of LightBot and Lightmouse's use of AutoWikiBot having numerous unintended consequences, and I don't expect the new bot to be any different. Lightmouse's approach to date has been to ask editors to report problems, and try to fix the bot to not make that mistake again. I believe this entire approach is wrong. Once an bot makes a mistake in an article, it should NEVER get another chance to mess up that article.--Gerry Ashton (talk) 17:01, 21 October 2008 (UTC)[reply]
Although the example of 5 November in Guy Fawkes' Night should serve; unless Lightmouse is amending the request. If so, he should say so. Septentrionalis PMAnderson 17:20, 21 October 2008 (UTC)[reply]
Well, lets all be clear, all bots make errors. If there are 100 events and the error rate is 1 in 1,000, you might never see the error. Unfortunately, Wikipedia has tens of millions of dates to be delinked so even a 1 in 100,000 error rate will result in errors becoming visible. It just depends whether people want 99,999 good things in exchange for 1 bad thing. If you know of any bot author claiming a zero error rate, let us know. An approach that involves updating bot code after first use is a good thing to be proud of, not a bad thing to be ashamed of.
It is much easier if the rules are simple, many of the problems arise when people ask for extra constraints/exceptions. Some of the simpler ways of avoiding false positives involve pre-filtration of articles. For example, we could avoid:
  • articles that contain a date related word in the title (e.g. 'Day', 'Night', 'Week', 'Month' '2008 in ...' etc) i.e. includes 'Guy Fawkes Night'
  • articles that contain the word 'calendar'
  • articles that are in the categories 'Anniversaries' and 'Observances' e.g. includes 'Guy Fawkes Night'
  • articles that are on a whitelist (to be defined but could include 'Guy Fawkes Night')
If I understand your comments so far, that seems to address them. N'est-ce pas? Lightmouse (talk) 17:31, 21 October 2008 (UTC)[reply]
Absolutely not. Allowing the bot to trample pages and then require editors to revert the bot and whitelist the article is in direct violation of "is harmless". BJTalk 17:35, 21 October 2008 (UTC)[reply]

Are you suggesting that the other bots have a zero error rate or are you using a different definition of 'is harmless' for this bot? Lightmouse (talk) 17:57, 21 October 2008 (UTC)[reply]

My attitude is that all bots should obey one of two rules:
  1. The bot is only allowed one pass through the articles OR
  2. There is a mechanism to make the bot skip any article forever if an editor notices the bot make an error on its first attempt to process the article.
If this is a new requirement, so be it. --Gerry Ashton (talk) 18:06, 21 October 2008 (UTC)[reply]
No, if the bot makes an error it should be fixed. BJTalk 18:13, 21 October 2008 (UTC)[reply]
Bots cannot always be fixed. If bots were smart enough to always get things right, they could write the articles for us. Of course, any errors found should be evaluated to determine whether they can be fixed, and if not, whether the error rate is low enough. --Gerry Ashton (talk) 19:34, 21 October 2008 (UTC)[reply]
Most bot proposals have a hypothetical error rate of zero, yes. BJTalk 18:13, 21 October 2008 (UTC)[reply]

This bot has a hypothetical error rate of zero. This bot can pass through articles only once. Lightmouse (talk) 18:20, 21 October 2008 (UTC)[reply]

The proposal says:
Edit period(s) (e.g. Continuous, daily, one time run): Continuous
If in fact this bot will have a mechanism to prevent it from visiting an article more than once, that mechanism should be explained. I interpret Lightmouse's statement that "This bot can pass through articles only once" to mean that if it processes, let's say, Gregorian calendar on 5 December 2008, it will never again process the "Gregorian calendar" article, not even on 5 December 2012. Is this interpretation correct? --Gerry Ashton (talk) 18:36, 21 October 2008 (UTC)[reply]

Simple question but the answer isn't so simple. For a start, it will examine 'Gregorian calendar' but it will discover the term 'calendar' and abandon any further processing without making any edits. But that is a minor point.

I would urge Lightmouse to investigate wheter there is any mechanism to determine that an article title is actually a redirect. I think it would be a good idea to avoid processing redirects. My intuition is that the kind of articles that have many redirects are just the sort of articles that are apt to have tricky date syntax. --Gerry Ashton (talk) 19:30, 21 October 2008 (UTC)[reply]

I have just been told how to avoid being redirected. Frankly, I don't share your pessimism about redirects resulting in a bad edit (although that might depend on the definition of 'bad'). It would help if we could discuss a specific example but now that I know the method, I can do it to gain your support. Unfortunately, positive response from swing voters such as yourself will not be sufficient to get this bot elected to approved status in the face of the other negative responses above. Lightmouse (talk) 19:57, 21 October 2008 (UTC)[reply]

There is a better workaround. I can easily write a custom list provider for AWB, so it wont load any redirects into the list at the time of making. Obviously, this wouldnt account for any that changed in the meantime, but seeing as it can skip them if it finds them, that reduces duplication further.
Snippit from API:
* list=allpages (ap) *
  Enumerate all pages sequentially in a given namespace
Parameters:
<snip>
  apfilterredir  - Which pages to list.
                   One value: all, redirects, nonredirects
                   Default: all
Reedy 22:25, 21 October 2008 (UTC)[reply]
Reedy, thanks. I will take you up on that after approval. Lightmouse (talk) 12:19, 22 October 2008 (UTC)[reply]

I oppose this proposal for, among other things, the reasons expressed by Gerry Ashton. Tennis expert (talk) 21:43, 21 October 2008 (UTC)[reply]

Since when does "deprecated" mean "removed ASAP?" This is something that should be put into AWB general fixes, not made into a bot to make God-knows-how-many edits just to do this. Mr.Z-man 23:00, 21 October 2008 (UTC)[reply]

  • I didn't exactly object on the basis of what CleanBot is designed to do; I am concerned both that there may be linked items out there that look like dates, but are not, and also that the coding of the bot may not exactly carry out the design. In either case, it should only get one crack at an article; editors should not have to keep fixing the same article over and over again while any kind of error is fixed in the bot.
  • As for a specific example, it would be interesting to see what it makes of [[March 25]], [[1 BC]] in the Sextus Julius Africanus article. --Gerry Ashton (talk) 05:19, 22 October 2008 (UTC)[reply]
  • BRFA is not the place for debating if the links should be removed. If the removal of date formatting is still contested, consensus needs to be formed before a bot approval can be started. BJTalk 08:13, 22 October 2008 (UTC)[reply]
Also, what's the difference between this bot and Lightbot? Matthewedwards (talk contribs  email) 07:09, 22 October 2008 (UTC)[reply]
Lightbot is not approved to remove date formatting. BJTalk 08:13, 22 October 2008 (UTC)[reply]
Um ... that's news to me, unless you count Tennis expert, who's ruffled quite a few of his tennis-project colleagues over the issue. The removal of date autoformatting appears to be is widely and enthusiastically supported by the community. Tony (talk) 12:52, 22 October 2008 (UTC)[reply]
I haven't heard anything about it, could you point me to this wide support (anything on a subpage of MOS doesn't count)? BJTalk 13:00, 22 October 2008 (UTC)[reply]
  • You said it had wide community support but I'm still not seeing any evidence of that. For example, when bot matters are discussed on Wikipedia:BOTS a proposal may gain support. Only when brought to the attention of the entire community can it have wide community support, as was done with the adminbots RfC. Local consensus and favorable talk page comments are not sufficient for a bot that is going to make hundreds of thousands of edits. BJTalk 14:38, 22 October 2008 (UTC)[reply]
  • How could it have support when the community was never asked to comment? Where is the village pump proposal to remove all auto-formatting? There has been nothing on ((cent)), the noticeboards, or watchlist notices. The last thing I can find on ((cent)) is "Proposal to discourage the auto-formatting of dates". There needs to be a widely advertised VP proposal for the removal of auto-formatting in all articles and the exceptions. If the community shows support then a bot request should be filed. BJTalk 15:08, 22 October 2008 (UTC)[reply]
  • Perhaps it's frustrating to have missed out on the loooong debate that occurred over some two years (intermittently, but more intensively and continuously during 2008 until the change in August). I'm sorry that you were not aware of it (it was certainly promulgated at VP and elsewhere at the time), but the fact that you missed it does not mean that there was not thorough and extensive debate in many locations. I must ask you to resist the temptation to seek to place retrospective caveats on the community's decision because you were not there at the time. Don't worry, there certainly were naysayers to represent your views, but they were very much in the minority, and still are. This is not the place to discuss that decision; this is to discuss a bot application. I do not want to spend more time debating this at the wrong location at the wrong time. Sorry. Tony (talk) 15:18, 22 October 2008 (UTC)[reply]
  • You seem to be confusing consensus to change the MoS with consensus to then apply that change to every article. It seems the community was informed for the former but not the latter. BJTalk 15:36, 22 October 2008 (UTC)[reply]
Just look at Featured Articles, Good Articles, Peer Review etc. and other popular representations of the best that Wikipedia has to offer. A statistical analysis of the ratio of (date links)/(dates that could be linked) would be interesting. I bet it would be a tiny fraction of a percent. Lightmouse (talk) 13:16, 22 October 2008 (UTC)[reply]
Oh yeah. My first question still stands. I see nowhere in MOSNUM or MOS that stand-alone years should be linked. Matthewedwards (talk contribs  email) 09:17, 22 October 2008 (UTC)[reply]

Matthewedwards, I agree with you that there is massive overlinking of solitary years. Unfortunately, a few editors oppose delinking of solitary years. This proposal is an attempt to work within their constraints by delinking all dates except solitary years. Lightmouse (talk) 10:13, 22 October 2008 (UTC)[reply]

Gerry Ashton mentions two things:
  • "one crack at an article" - yes. See extensive discussions above.
  • what it makes of [[March 25]], [[1 BC]] - it will delink it. That is exactly what a human would do. The bot proposal is Delink dates except for solitary years..
If you think that particular date needs an exemption, can you be more specific? Trying to help. Lightmouse (talk) 11:13, 22 October 2008 (UTC)[reply]

Such a minor and utterly inconsequential change should not be given approval. It will result in what can only be called "pointless edits". A much better solution is to gather consensus for, and write a patch for AWB that adds this change to its general fixes, so that other bot tasks which are actually doing something useful can fix it. (note: Lightmouse's AWB access has been revoked, before I saw this thread, for the pointless edits. This is the same way as I'd revoke the access of someone who was going through doing solely general fixes, and I believe that this bot request such be treated in the same way as any other which aims to do just gen. fixes). Martinp23 15:22, 22 October 2008 (UTC)[reply]

(quite frankly, looking at the debate on this page, I can't honestly believe that the consensus with the MOS/appropriate RfCs is mature enough for a bot to be acting on it at all, through general fixes or otherwise). Martinp23 15:31, 22 October 2008 (UTC)[reply]
Agreed, they are insignificant edits. I don't think a bot can be written that is intelligent enough to ignore dates that are appropriately linked. –xeno (talk) 15:42, 22 October 2008 (UTC)[reply]
Denied. As evidenced by this discussion, this is still far too controversial a task for a bot, there is no consensus outside of the MoS talk pages that date autoformatting should actively be removed rather than simply discouraged, and these edits are too minor for a bot to be worth the effort. Mr.Z-man 16:10, 22 October 2008 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.