The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was

Approved.

Citation bot 4[edit]

Operator: Martin (Smith609 – Talk)

Automatic or Manually Assisted: Automatic

Programming Language(s): PHP

Function Overview: Where pages use a mixture of 'citation' and 'cite journal' templates (which produce different output styles), use the dominant template in all cases

Edit period(s): Continuous

Already has a bot flag (Y/N): Y

Function Details: Despite the two templates calling on a common Citation/core, there are still some very minor formatting differences between the output produced by Template:Citation and Template:Cite journal. (When I say 'cite journal', I refer to all the 'cite xxx' series of templates which call on Template:Citation/core.)

I think the only difference is the use of a comma rather than a period to separate fields - insignificant, but enough to irk some editors.

No one reference format is encouraged above others, but each article should be internally consistent.

I envision the bot looking through articles which mostly use {cite journal} and spotting instances where a {citation} has been used by mistake. The bot would count the numbers of each template, work out which was prevalent, and change the others to that format.

I am unaware of any scenarios where it would be beneficial to use both templates in the same page, and encourage any suggestions to where this may occur. I can suggest solutions to this if necessary.

Discussion[edit]

Is this really worth doing? According to your description, it's roughly the equivalent of adding a comma to a great many articles. Does doing so have a significant benefit to the project? --Carnildo (talk) 08:44, 29 March 2009 (UTC)[reply]

I'd say 'no', but FA / GA reviewers would say 'yes'. It's very easy to code and spares a lot of man-hours for people trying to get articles to a certain standard. Martin (Smith609 – Talk) 20:53, 29 March 2009 (UTC)[reply]

FA/GA reviewer that says yes →.Headbomb {^ταλκ_{κοντριβς} – WP Physics} 01:32, 30 March 2009 (UTC)[reply]

This sounds good to me, but changing periods to commas is just the sort of trivial minutia that some editors will get all up in arms about a bot changing against their favorite preference. So it's important that the bot be exclusion-compliant. I'd also recommend you be careful not to run the bot twice on the same page, or it could look like an edit war. (As an aside, it would be very useful to have a user-initiated process to automatedly change all cites on a single page to a given format... does such a thing exist?) – Quadell ^(talk) 13:54, 1 April 2009 (UTC)[reply]

The bot is already exclusion compliant. If there are cases where the activity is undesried, I'm sure it won't take long for people to let me know and me to write a workaround to not edit these cases. In response to your aside, I don't know of such a thing, but it is possible to run citation bot on demand from the toolbar (importScript("User:Smith609/toolbox.js");, see my userpage for info) - when I code this change the bot will do that on-demand. Martin (Smith609 – Talk) 17:25, 1 April 2009 (UTC)[reply]

Sounds good. What sort of parameters are you considering? (E.g., "4 or more of one format, and 2 or fewer of another") – Quadell ^(talk) 21:13, 1 April 2009 (UTC)[reply]

I was thinking of adopting whichever template family (Citation or Cite xxx) was in the majority. Perhaps a >60% rule would be necessary, but I think >50% should be fine. Martin (Smith609 – Talk) 18:46, 2 April 2009 (UTC)[reply]

Will this be converting all Citations to Cite journals, or will it simply set the sep= parameter of the Citation template? How will it count citation templates which have the sep= parameter set? Also, if this bot is correcting mistakes only, it should not be changing formats unless one has a clear majority, perhaps >80%. Wronkiew (talk) 21:42, 3 April 2009 (UTC)[reply]

I think it's a clear mistake if 60% of cites use one format and 40% use another. – Quadell ^(talk) 22:30, 3 April 2009 (UTC)[reply]

I think it is simpler to change all templates to the dominant form; this will also protect against any future drift between the templates. The sep= parameter does complicate matters; I think it will be best if citations using this parameter do not count towards the 50%. This may leave some pages unchanged and requiring manual discretion; I'll have these pages logged somewhere and depending on the number resolve them manually. Martin (Smith609 – Talk) 23:04, 3 April 2009 (UTC)[reply]

IMO, ((Citation)) setting separator/seperator=. and either using quote, setting postscript=., or manually placing a . after the template should be counted as "cite" style; and similarly for ((cite xxx)) setting separator/seperator=, and either using quote or setting postscript empty. You'll also need to maintain mappings between the two formats' templates if there are parameter incompatibilities. Manually-formatted refs, unrecognized citation templates, and refs with separator/seperator or postscript not matching the above rules would probably have to preclude any bot action beyond listing for human attention.

Given that the comma-vs-period issue is something people get up in arms about, I suggest this BRFA be advertised at the appropriate talk pages (that would include the major citation template talk pages, WT:Citation templates, and probably WP:VPR). In particular, ask people to come here and comment on whether the "50% rule" is acceptable or if a higher bar should be used with articles under the bar being posted somewhere for human attention (e.g. post on the article's talk page and categorize it in a hidden category "Category:Articles with inconsistent reference formats"). Anomie ⚔ 19:19, 5 April 2009 (UTC)[reply]

Done - advertised at citation, cite book, cite journal, WT:CT, WP:VPR. Martin (Smith609 – Talk) 20:09, 13 April 2009 (UTC)[reply]

Many instances of ((citation)) should not be ((cite journal)) but rather ((cite conference)), ((cite book)), ((cite news)), etc. If you convert the cite XXX templates to citation consistently, this would not be an issue, but how do you propose to determine the correct kind of cite XXX template to convert a citation template into? —David Eppstein (talk) 21:14, 13 April 2009 (UTC)[reply]

Good question. Most of the 'cite x' templates produce identical output; multiple templates are needed to support 'unusual' parameters. I think the following logic would work:

If it has a URL, use 'cite web'... unless:
- it has a journal parameter: then use 'cite journal'
- it has an isbn=; chapter=; etc: then use 'cite book'
- it has a contribution=; then use 'cite conference'

I will probably have to expand the logic slightly to accommodate other cases but will take a thorough look at each citation type when coding the logic. Martin (Smith609 – Talk) 15:49, 14 April 2009 (UTC)[reply]

And another question. Suppose that an article in which ((cite)) templates are dominant contains a ((citation)) template that is followed by punctuation other than a period as part of a larger block of text. How do you propose to convert it in that case, since ((cite)) insists on using a period to terminate? —David Eppstein (talk) 01:10, 14 April 2009 (UTC)[reply]

Using the |ps= parameter. Martin (Smith609 – Talk) 15:49, 14 April 2009 (UTC)[reply]

As WP:CITE notes, citation formats & templates are both contentious. I don't think having a bot sort out the mess is the right solution for most articles. 50% is not the right metric; it is not great enough to show a consensus that the contributors to an article agree on a method. From discussions I've read, it seems that longevity of the different cite methods is at least as important as the actual number of times different methods are used in an article. I also think that the most divisive articles will probably be those that are least established, with only a handful of references. I can easily see either the same number of references for different template systems or a similar enough number that there would not be consensus on the talk page for a manual change, let alone one made by a bot. That all being said, this would be useful for correcting recently-added citations to well-established pages. Actual thresholds are a bit fuzzy & up for debate. I'd err on the side of conservatism; say ~75% & also that the references being changed were added recently. --Karnesky (talk) 21:20, 13 April 2009 (UTC)[reply]

Bear in mind that this bot will always be changing from articles that are clearly not in line with policy -- different cite styles -- to being inline with policy. If somehow the consensus were to use one style, but the majority of links used the other style (how could that be?), this bot would at least make the links consistent. I just can't see a situation where changing an article from using 40% one style and 60% another, to using 100% one style, would be a bad thing. I can get as petulant about commas-vs-periods as the next guy, I guess, but if I feel strongly that an article should use a certain style then I'm going to make sure the majority of refs in the article actually use that style. – Quadell ^(talk) 01:04, 14 April 2009 (UTC)[reply]

Again: I see this bot being useful for established articles, but it may do more harm than good to stubs and shorter/newer articles. WP:CITE states "You should follow the style already established in an article, if it has one. Where there is disagreement, the style used by the first editor to use one should be respected." Editor A may start a stub with 1-2 references. Editor B comes along, not familiar with this etiquette & adds 2-3 references in some other style. If this conflict was resolved on the talk page of the article, it is likely that Editor A's format would be used. This is contrary to what the proposed bot's actions would be. --Karnesky (talk) 16:27, 14 April 2009 (UTC)[reply]

I would consider the majority style to represent the 'style established in the document'. If the original author had not already changed the format of the new references, then it is probably safe to assume that they didn't disagree with Editor B's decision. If a user did disagree, then it would be easy for them to use the bot to enforce whichever style a talk-page discussion decided was appropriate. To be honest, in the case of stubs, I don't imagine that editors are likely to be that concerned about reference formatting - the lack of content is more likely to be at the forefront of their to-do list. Martin (Smith609 – Talk) 18:05, 14 April 2009 (UTC)[reply]

But time is needed to allow Editor A to change the format of Editor B's additions before the bot does, right? Below, I've proposed that the bot puts a notice on the talk page & waits before a change. --Karnesky (talk) 16:20, 15 April 2009 (UTC)[reply]

I also think it's a mistake to run a bot which changes the style of an article's citations based on a simple majority of existing cites. I think, in this context, that the existence of multiple citation styles in an article is likely to indicate that a "disagreement" exists about which style should be used, in which case WP:CITE defaults to "the style used by the first editor". The Manual of Style's General principles re-iterates: "editors should not change an article from one guideline-defined style to another without a substantial reason unrelated to mere choice of style ... where there is disagreement over which style to use in an article, defer to the style used by the first major contributor."

There may be instances where a substantial reason exists to override the original editor's style choice, but that's the kind of case-by-case decision making which is poorly suited to bot edits. Most cases where subsequent editors fail to follow the original editor's style represent erroneous editing, not a deliberate consensus that there's reason to change. A bot certainly can't determine consensus from silence, or from the state of the page at the time it edits, because it can't read the talk page or page history.

More broadly, I don't think this bot proposal does enough good to justify the friction it's likely to cause. If it's only really worth doing for good and featured article candidates, then it doesn't make sense to run the bot across the whole of en:wiki in order to edit a tiny minority of articles which are already being worked over in detail by contributors who already have other automated tools available. Baileypalblue (talk) 06:07, 15 April 2009 (UTC)[reply]

When WP:CITE says "editors should not change an article from one guideline-defined style to another", I don't think it means you shouldn't change articles from having mixed styles to having one consistent style. I think it means you shouldn't change an article from having one consistent style to another, which this bot wouldn't do. I understand your concerns, but "the friction it's likely to cause", as you put it, would be confined to those editors who (a) strongly feel that article X should use one cite style, (b) did nothing while article X used a different style more than half of the time, and (c) would rather complain about a bot standardizing the style to the "wrong" standard than fix the cites in at least the majority of cases to the preferred style. And I don't find such complaints to be particularly credible, honestly. – Quadell ^(talk) 12:26, 15 April 2009 (UTC)[reply]

I agree that WP:CITE encourages switching from mixed citation styles to a consistent style -- the question is, which style should be implemented? Identifying the problem is not enough, the solution must be correct, and thus far I don't think the bot proposal has hit on the right resolution. Moreover, I don't think it's productive to dismiss the concerns of hypothetical future opponents as not "credible", particularly when their concerns *are* trivial -- that way lies more heated conflict. I can't recall exactly the maxim that says conflict is most bitter when the stakes are lowest, but I think it applies here. The fact remains that the proposed bot will violate style guidelines at least some of the time by overriding the style choice of original editors without demonstrating sufficient cause. Telling aggrieved editors they should have been more aggressive in editing the citation styles of others is not a good way to calm the waters; indeed, a simple majority rule seems like an incitement to edit-warring. At a minimum, if such a simple majority rule is going to be implemented, I think it first needs to gain a more explicit consensus at WP:CITE. Baileypalblue (talk) 15:49, 15 April 2009 (UTC)[reply]

By "more than half of the time" in (b), you really mean "in over half the number of references," right? However, timing is an important issue: if a page has mismatched references for months, there'd be a smaller case to be made that there was consensus. You'd probably be correct that nobody cared enough to stick with the format used by the original author and the bot's actions would be appreciated. But this case can be contrasted with fresher edits, for which the article's authors weren't allowed a chance to follow the guidelines manually before the bot disregards it and mucks up the page.

Are you opposed to implementing a delay between the time the bot notices citation discrepancies & when it fixes them? A notice could be placed on the talk page & then I'd imagine few objections to this proposal--it'd work like the image tagging bots.

I also disagree completely with your stance on (c); multiple commentators on this page have pointed out that it isn't just the number of citations that matter, and you really haven't refuted that point concretely. --Karnesky (talk) 16:14, 15 April 2009 (UTC)[reply]

It looks like there are four ways to proceed here.

The bot could just operate as-is, on the theory that conflicts are unlikely, and can be avoided/undone by reverting the bot and fixing the formatting in those rare cases, or using NOBOTS. I like this idea, but several others seem to oppose it.
The bot could run with a higher percentage required than 50%. This would leave many articles out of compliance, and would not fix the case of "a few old and established cites, and then many new cites in a different format".
The bot could run in two stages, as Karnesky suggests. First it would leave a talk note informing users of the discrepancy in styles, and saying that they would be autoconverted to use the majority style after X days. Then it would actually convert the articles after a delay, where still applicable.
The bot could simply be denied. I think that would be a shame.

Is this a fair summary? – Quadell ^(talk) 16:35, 16 April 2009 (UTC)[reply]

I'd add a couple of options to your summary. One would be for the bot to require manual approval before changing the citation format; this could operate in concert with talk page messages. Again, this would be a shame, as it would deny the bot its full potential. Another would be for the bot to consult the article history and determine the original citation style in 'close-cut' cases.

I would greatly prefer the first option you presented. I would suggest that the best way to proceed would be to run the bot on a trial basis, perhaps on around 50-100 articles. This would allow us to judge whether any editors will actually complain about the bot's activity in practise. Are there any concrete objections to such a trial? Martin (Smith609 – Talk) 17:11, 16 April 2009 (UTC)[reply]

Having a criteria that was stronger than a simple majority and/or a waiting period could also be good trial(s): if there were no significant issues, the percentage needed and/or the number of days could be decreased.

I am skeptical of trying a small number of pages: as I have said, I'd expect that most problems would be for very recent edits to edits with a small number of references. Any trial would begin by having fewer of these cases. As the bot cleaned up pages that have been left to have differences for a while, more and more contentious cases would come to light. I'd be extremely reluctant to support a permanent bot because of this, but perhaps multiple short trials would be acceptable.

If the complexity added by implementing a two-stage bot is not significant, I see no real downsides to that approach. Is the complexity greater than I'd estimate or are there other downsides? --Karnesky (talk) 21:25, 16 April 2009 (UTC)[reply]

It would be easy enough to trial the function on a few articles at a time. The trial should include a large enough sample size that it includes articles which are likely to be contentious. How rare do you think contentious cases are? 1 in a hundred articles? 1 in a thousand?

Also, by 'contentious case', do you mean 'cases where the bot switches to citation where it should have switched to cite xxx'? In such cases, it would be extremely simple for editors (on reaching conseneus, of course) to use the bot to change the format to cite xxx - the bot could even include a link in its edit summary to allow this to be implemented. Of course, as soon as contentions arise, it is possible for me to stop the bot and code a solution to the specific situation. Martin (Smith609 – Talk) 22:05, 16 April 2009 (UTC)[reply]

Just to be clear: I would most prefer a two-stage bot, and still have not seen objections to using this approach over others (such as working with a small subset). Do you have any?

By 'contentious case,' I mean 'cases where the bot makes a citation switch when there is not consensus for that change.' In some cases, discussion on a talk page may adopt the style picked by the bot. That discussion would take place may suggest the bot is too aggressive. --Karnesky (talk) 22:34, 16 April 2009 (UTC)[reply]

Sorry, I'd misunderstood what you meant by 'two-stage'. There is no huge technical barrier to this; the main objection would be on the grounds of resource-usage. It would also make it impossible for me to integrate this function into the existing tasks of the bot. I would personally object to cluttering up thousands of talk pages, but that objection is over-rulable. Perhaps a good way to get started would be to use the 'two-stage' approach for a trial of n articles. It could target relatively major articles in this trial. Then we would have an idea of the extent to which the bot's edits would be contentious - if no complaints were raised after a certain number of edits, we could consider allowing the bot to perform its edits without getting human permission first. Does that sound workable? Martin (Smith609 – Talk) 22:44, 16 April 2009 (UTC)[reply]

That sounds reasonable to me. I wouldn't necessarily target only "major articles," though. Starting with recently changed pages would make most sense to me: these should have a good mix of both established & newer pages and also have editors available to comment on the bot's changes, if needed. --Karnesky (talk) 23:52, 16 April 2009 (UTC)[reply]

I think the major articles are the ones where this bot will be most useful, and the new, small articles are the ones this bot is most likely to mess up. One of the factors that will affect its accuracy is the number of existing citations. I recommend setting a minimum citation cutoff below which the intended/original citation style will be impossible to accurately determine. Otherwise, you'll run into cases where an editor adds a few cites in the wrong style, then Citationbot converts the whole article in violation of the guidelines. Wronkiew (talk) 16:35, 17 April 2009 (UTC)[reply]

Surely it is better for an article to have a single, consistent formatting than a mix? If the bot chooses the 'wrong' format, then its edit serves two purposes: it [1] highlights the fact that the citations require attention; and [2] can provide a link in the edit summary to format the citations as policy would request. An article with its citations in a mixed format is no less in breach of policy than an article with its citations in a format which the original author did not put them it. Martin (Smith609 – Talk) 16:54, 17 April 2009 (UTC)[reply]

Wronkiew, I agree completely with your premise (that small articles are most likely where problems will be found), but disagree with your conclusion. Establishing a minimum number of citations, like establishing a criteria other than "simple majority," would seem to be arbitrary & would leave many articles with mismatched citations. What advantages would it have to implement a minimum number of citations, rather than the proposed compromise of first alerting the article's talk page to a pending change & waiting a bit? As an editor, the latter approach would show me issues with the article & would allow me a chance to fix them, according to the guideline. In the former, I might never realize there was a problem? (Martin does point out some disadvantages that he, as a bot operator, would be at a slight disadvantage of the two-stage bot, but if there are any downsides from the perspective of an editor or reader, we should hash those out.) --Karnesky (talk) 17:16, 17 April 2009 (UTC)[reply]

There are reasonable situations to use both templates, in the same page, without being "inconsistent." For example, users may prefer cite book overall, but:

wish to include a book with a large number of authors (more than the 4 which cite book allows)
- NB: Both 'cite book' and 'citation' now support 9 authors. Martin (Smith609 – Talk)
wish to cite an essay contribution which is printed in a book (cite book has no 'contribution' parameter)
- NB: The contribution parameter is supported by cite book (although it is not documented). Martin (Smith609 – Talk)

and therefore opt to use citation in these cases. Kellen^T 10:14, 17 April 2009 (UTC)[reply]

In these cases, the user may prefer to not use the same template, but he/she should still use the same style. I think that would have to be done manually, without either template, so this bot wouldn't affect those cases. – Quadell ^(talk) 12:50, 17 April 2009 (UTC)[reply]

Kellen, That's exactly the kind of feedback I was hoping for here. In the two cases you mention, cite book is equivalent to citation; if you can think of any other cases, though, that would be incredibly helpful. Thanks, Martin (Smith609 – Talk) 15:45, 17 April 2009 (UTC)[reply]

Ha! Time to update the cite book docs then, I guess. Kellen^T 22:23, 17 April 2009 (UTC)[reply]

I'm going to wait a few more days for comment, but it looks like we have a good plan for a trial here. – Quadell ^(talk) 12:50, 17 April 2009 (UTC)[reply]

Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. This is for the 2-stage method, trying to choose ones that have been recently edited (where possible). It's a rather large trial, but, as I understand it, we need a large trial to determine whether the 2-stage method is necessary. – Quadell ^(talk) 12:57, 20 April 2009 (UTC)[reply]

Great, thanks. Depending on the weather, it will probably be this weekend or the next before I get round to the coding. I will keep you posted here. Martin (Smith609 – Talk) 14:48, 20 April 2009 (UTC)[reply]

Any updates? – Quadell ^(talk) 20:45, 30 April 2009 (UTC)[reply]

Afraid I was busy all weekend, and I still don't have internet access at home (frustratingly!); I'll post an update once I've got things coded. Martin (Smith609 – Talk) 21:35, 30 April 2009 (UTC)[reply]

Responses now that bot is running[edit]

Gentle nudge. – Quadell ^(talk) 14:07, 14 May 2009 (UTC)[reply]

Running. Contributions are being made by User:Citation bot 4; the algorithm I've put together relies on the usual bot script, so some edits may not change citation types. Bug fixes are in progress; please ignore anything before Sensorimotor rhythm , but report anything else I don't spot! Martin (Smith609 – Talk) 21:16, 14 May 2009 (UTC)[reply]

Most of these edits are just the run-of-the-mill wonderful doi updates that your bot always does. Very few have to do with this RfA, but I found a few, e.g. [1] and [2]. These appear do be working just the way you'd intended. – Quadell ^(talk) 22:07, 14 May 2009 (UTC)[reply]

Yes, sorry, it's difficult to separate the two out neatly. Should I let the bot run to around, say, 250 edits instead of 100? (I estimate that around 1 in 4 or 5 of the edits it is making are related to this brfa). Martin (Smith609 – Talk) 22:21, 14 May 2009 (UTC)[reply]

That would be a good plan, I think. – Quadell ^(talk) 22:26, 14 May 2009 (UTC)[reply]

Hmm. This edit seems to have changed the only instance of "cite journal" to "citation", though there were no other cite templates on the page. Is this intentional? – Quadell ^(talk) 22:11, 14 May 2009 (UTC)[reply]

There are several {citation} templates - check under the 'External links' section. Martin (Smith609 – Talk) 22:19, 14 May 2009 (UTC)[reply]

Oops.

– Quadell ^(talk) 22:27, 14 May 2009 (UTC)[reply]

Trial complete. - 250 edits made. Martin (Smith609 – Talk) 15:44, 15 May 2009 (UTC)[reply]

My spot checks have not revealed any problems. Anybody else see a problem with the results? – Quadell ^(talk) 00:16, 17 May 2009 (UTC)[reply]

Approved. Looks good. – Quadell ^(talk) 18:02, 18 May 2009 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.