Archive 1 Archive 2 Archive 3 Archive 4

Introduction

This is the pre-RfC workshop for the RfC(s) about article creation and deletion at scale. Per the rules below, please feel free to add proposed issues or solutions; other suggestions, comments, questions or replies should be made within your own section.

This pre-RfC discussion has been announced at the articles for deletion talk page, the Arbitration Noticeboard, the administrators' noticeboard, the Bot policy talk page, and the Village pump (policy).

Background

Page-related actions done at scale can overwhelm the community's ability to adequately monitor and participate effectively. The issue is exacerbated in the case of article creation at scale because it escapes the normal notification system.

In the past, Wikipedia did not discourage article creation at scale (see Definitions below) under the assumption this was the best way to achieve broad coverage of vast subjects such as sports, plant and animal life, geography. There exists a policy that automated or semi-automated creation requires a bot request for approval. More recently, concerns have been raised in multiple venues that the continuing creation of such articles has overwhelmed editors’ ability to track and assess these articles, and that the churn has become a waste of time and a cause of disruption. In a 2022 August decision, the Arbitration Committee (ArbCom) has ordered an RfC addressing "how to handle mass nominations at Articles for Deletion" (termed "AfD at scale").

A strong argument has been made that the article creation at scale (sometimes known as mass, rapid, or large-scale creation) is one of the causes of dysfunction at AfD with regard to article deletions at scale, and that addressing this issue is a necessary precursor to the ArbCom-ordered RfC addressing AfD at scale.

Purpose of this discussion

This discussion is to identify the issues with article creation/deletion at scale, to workshop initial proposals in preparation for the RfC(s), and to decide how to handle the RfCs.

Specifically, you are asked to address the questions:

  1. What are the primary problematic issues surrounding the article creation or deletion at scale which should be addressed in policy? (Proposed issues)
  2. How might we address these issues? (Proposed solutions)
  3. How should we structure the discussions? That is, do we need to run two RfCs, or can we run one? And if we do need two, do they need to be run consecutively or can they overlap?

Rules

  1. All editors are required to maintain a proper level of decorum. Rudeness, hostility, casting aspersions, and battleground mentality will not be tolerated. Inappropriate conduct will result in a partial block (p-block) from this discussion.
  2. The sole purpose of this discussion is to identify problematic issues surrounding article creation/deletion at scale and to workshop proposed solutions to be used in the resulting RfC(s). It is not a venue for personal opinion on past creation or creators of such articles or about previous tolerance of such creations, nor about past mass deletions, ditto. Editors posting off-topic may be p-blocked from this discussion.
  3. All comments must be about problematic issues and proposed policy changes surrounding article creation/deletion at scale or about structuring the resulting RfC(s). Comments about any contributor are prohibited and will result in a p-block from this discussion. Any violations will be reverted, removed, or redacted.
  4. Please do not make changes in issues/solutions that have already been posted. Anyone is permitted to post additional issues/solutions, below the existing ones. Moderators may at their discretion merge, edit, or condense issues/proposals at any point in the process. Any user may suggest such changes.
  5. Please make all proposals within seven days of the start of this discussion. Subsequent proposals may be brought up in an editor's own section on the talk page for consideration and inclusion at the discretion of the moderators.
  6. Please use subsections to number proposed solutions to correspond to a particular issue; that is, if you have a second proposed solution for Issue 1, number that as Proposed solution 1.2 and insert it between Proposed solution 1 and Proposed solution 2.
  7. This discussion will be unthreaded. Please create your own section within the comments section, placing your username in the section header. Within your own section you may present your opinions on the proposed issues or proposals to be addressed, post questions to other editors, or respond to other editors. Threaded comments will be moved or removed by moderators/clerk.
  8. Within their comment section each editor is limited to 800 words, including questions to and replies to other editors. (word count tool) Overlength statements will be collapsed until shortened.
  9. If you believe someone has violated these rules, please speak to a moderator on their talk page, not here. If you believe the moderators are behaving inappropriately, please speak to an arbcom member on their talk page or by email.
  10. This discussion will be open for at least 7 days and will be closed by the moderators at their discretion.
  11. Per their order, any appeals of a moderator decision may only be made to the Arbitration Committee at WP:Arbitration/Requests/Clarification and Amendment.

Moderators of this discussion

The Arbitration Committee has appointed two moderators for this discussion and the RfCs:

Additional clerking help: MJL (talk · contribs)

Statistics for mass creation

  1. Editors who have created more than seven articles in the past week, including lists and disambiguation pages
  2. Editors who have created more than seven articles in the past week, excluding lists and disambiguation pages
  3. Editors who have created more than ten articles in June
  4. Editors who have created more than ten articles in July
  5. Editors who have created more than ten articles in August
  6. Editors who have created more than 100 articles in the past year
  7. Editors who have created more than 100 articles in the past year, by month
  8. Editors who created more than than 10 articles in 2021, by month
  9. Editors who created more than than 10 articles in 2020, by month
  10. Editors who created more than than 10 articles in 2019, by month
  11. Editors by number of articles created in the past five years

Notes:

  1. None of these contain redirects that were converted into articles by the listed editor, but they do contain redirects that were converted into articles by other editors. I'm looking into fixing the latter; the former can be fixed for smaller datasets, but is too intensive for larger ones.
  2. External links counts can be suggestive about the quality of the article, it can also be meaningless - a low number may be because a large number of offline sources were used, while a high number may be because a template that provides links to a large number of database sources was added.
  1. Articles by editor by day over one year (1138 editor-days exceeded 10 articles; 163 exceeded 25)
  2. Articles by editor by week over one year (922 editor-weeks exceeded 20 articles, 150 exceeded 50)
  3. Articles by editor by month over one year (640 editor-months exceeded 40 articles, 123 exceeded 100)
  4. Articles by editor by year since 2020 (1156 editor-years exceeded 80 articles; 407 exceeded 200)

Note that these do attempt to exclude false positives from editors converting redirects created by the original editor, but some still exist, and this attempt does result in some false negatives. This is also the reason why a hard technical limit will be difficult; we will need some way to identify editors converting redirects into articles, and count those articles towards their count rather than towards the count of the original article creator. BilledMammal

Proposed questions for first of two RfCs

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.



These are the suggested solutions to the issue of article creations at scale that I was able to distill from this workshop. Note that I’ve intentionally combined/shortened/simplified as much as possible, so please point out if I've:

  1. Combined proposed questions that need to be separate
  2. Left out an important consideration or rationale
  3. Missed something altogether

I've created sections below these for endorsement/nonendorsement and any comments or suggestions. Valereee (talk) 14:14, 7 September 2022 (UTC)


1. Clarify SNG policy

Clarify at WP:N to make explicit whether each specific SNG directly confers notability independent of GNG and to eliminate contradictions. Require all creations under SNGs that do not confer notability to have at least one source which would plausibly contribute to GNG. (Note: there was another suggestion to require 2 sources, which I'd originally thought to add as an alternative, but I thought it might discourage consensus.)

2. New Creations Report

Develop a bot to produce a report listing new creations that is sortable/filterable by editor, category, time range.

3. Creator-at-scale permission

Create a userright to allow creation at scale. Users without this permission would be prevented from creating more than 25 articles/day or 50/week or 100/month or 500/year. Create a dedicated forum to request this right and where requesting and granting this right can be discussed.

4. Require consideration of alternatives to creation

Create policy to require consideration of alternatives to creation, with sanctions for those who do not adhere to such policy.

5. Clarify WP:BEFORE

Creations under SNGs can be assumed to be cited to the best readily-available sources.

6. Clarify SNG policy

Clarify at WP:N to make explicit whether each specific SNG directly confers notability independent of GNG and to eliminate contradictions.

7. Require a GNG-quality source

Require all articles created under SNGs (other than those which confer notability) to have at least one source which would plausibly contribute to GNG: that is, that constitutes significant coverage in an independent reliable source.

8. Mass creations noticeboard

Create a dedicated noticeboard to allow for consensus for, notifications of, reports of, and discussions of mass creations and the sources used for such creations. (Details to be developed there.)


Please endorse/not endorse for inclusion in the RfC or make suggestions for each question within the sections below.

Question 1: Clarify SNG policy

Question split to Q6 & Q7
The following discussion has been closed. Please do not modify it.

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.



  • Endorse. I think going with "one source" is the right call; people who want two sources will just say so in their !vote, and the closers will be perfectly capable of determining the consensus. Levivich😃 16:43, 7 September 2022 (UTC)
    Reading the comments below, I can see the benefit of splitting the first sentence of Q1 from the second sentence and just running some version of the 2nd sentence, e.g. "require any articles on topics that must meet GNG to have at least one GNG source", or some variation thereof. This punts on the question of which topics must meet GNG, but I think that's OK (as it doesn't relate solely to mass creation). Levivich😃 01:22, 8 September 2022 (UTC)
  • Agree with Levivich. However, I would reword Require all creations under SNGs that do not confer notability to have at least one source which would plausibly contribute to GNG. to Require all creations not under SNGs that confers notability to have at least one source which would plausibly contribute to GNG., to make it clear that this restriction also applies to creations that are not under any SNG. BilledMammal (talk) 18:38, 7 September 2022 (UTC)
    @BilledMammal, it seems like that broadens it significantly. Valereee (talk) 19:38, 7 September 2022 (UTC)
    I don't think so, unless the intent was for it to not apply to creations that are not covered by any SNG? BilledMammal (talk) 19:45, 7 September 2022 (UTC)
  • Endorse.—S Marshall T/C 22:50, 7 September 2022 (UTC)
  • Oppose the "clarify SNG vs GNG" part. I don't necessarily think this is a bad idea, in principle, but it's a very broad thing to do that is mostly unrelated to the issue at hand, mass creations of low-quality stubs, and something that is better done incrementally as an individual process for SNGs rather than as some kind of catch-all where we can expect most participants to be unfamiliar with the requirements of the individual subjects under discussion. Additionally I think doing this clarification well requires some thoughtfulness about its long-term consequences that might be lost in a poll of editors inflamed by the mass-stub issue. Instead, it is reasonable to expect many respondents to take the position "we must do something about mass creations, this is something, therefore we must do it", regardless of whether the proposed clarifications actually affect mass creations. Additionally, this part is misleadingly named, and inappropriately bundled, because "Require all creations under SNGs that do not confer notability to have at least one source" is not about clarification of SNGs. I would be supportive of polling on a separate bullet point that is just this requirement, with a better title and without the imposition of a new process to clarify SNGs. —David Eppstein (talk) 23:14, 7 September 2022 (UTC)
    • PS "to eliminate contradictions": if this means, eliminate places where the policy says two things that contradict each other, about the same articles, then that's again a laudable goal (although, as above, beyond the scope of this RfC). If it means, fit all SNGs to a single Procrustian bed, eliminating all ways in which some of them do things differently than others, then it's a total non-starter. For one thing, it would eliminate the stricter rules for NCORP. For another, it would eliminate most of our articles about living academics. Essentially, it would eliminate all SNGs, because what would be the point of having a SNG that could only say to follow GNG? At the very least, the wording here is far too ambiguous. —David Eppstein (talk) 05:59, 8 September 2022 (UTC)
      It doesn't meant reducing all SNGs to the same level; we might as well get rid of them. When I made the proposal above, I referred specifically to clarifying whether or not they grant notability independent from GNG, because that isn't clear in many cases. Vanamonde (Talk) 06:23, 8 September 2022 (UTC)
  • Oppose in current form. While I am not against clarifying the relationship of SNGs to the GNG, I do not support the wording on the number of sources required. My position is that simply requiring one source per article does not address the problem of large numbers of articles being created from entries in a database, which is part of what triggered this RfC, and was involved in the case about Carlossuarez46. I would support either requiring two reliable sources, as proposed by BilledMammal, or, as I proposed at the start of this discussion, requiring one additional reliable source in addition to any source from a database. - 23:19, 7 September 2022 (UTC) Donald Albury 23:29, 7 September 2022 (UTC) (re-signed)
    "simply requiring one source per article does not address the problem of large numbers of articles being created from entries in a database" but this would require one GNG source, which would address the problem of articles sourced only to database sources. Levivich😃 01:21, 8 September 2022 (UTC)
    Assuming you believe that there isn't any database in the world that would constitute "a GNG source", which is not something the community has decided ...and if you think there isn't, then I invite you to look at this database entry, which contains about 400 complete sentences about the subject. WhatamIdoing (talk) 01:47, 8 September 2022 (UTC)
    To take this requirement even farther to the point of absurdity: would prohibiting articles created from entries in a database mean we are disallowed from using Google (a database) to find sources for our new articles? —David Eppstein (talk) 06:04, 8 September 2022 (UTC)
    Both of these arguments are straw men. There is no proposal to disallow database sources, it's to require GNG sources. If a database source meets GNG then it would be a GNG source. And David I'm sure you understand the difference between citing to a database and using a database to find a source, and I'm sure you don't cite to Google search results. :-) Levivich😃 13:35, 8 September 2022 (UTC)
    @Donald Albury, are you satisfied, with the understanding that in order to support GNG, a simple short mention -- what I think you mean when you talk about database entries, as opposed to the significant coverage in the entry WAID is linking to above -- wouldn't be sufficient and so Q1 would require at least one other source? Valereee (talk) 14:49, 8 September 2022 (UTC)
  • Endorse but make the one source "in-depth, detailed coverage" that satisfies V and NOR. Atsme 💬 📧 23:52, 7 September 2022 (UTC)
    Neither V nor NOR require in-depth detailed coverage of their sources. Simple claims can be based on simple sources. The requirement for a source to have depth is purely a GNG thing, not V or NOR. —David Eppstein (talk) 06:07, 8 September 2022 (UTC)
  • Oppose in current form per David Eppstein. This conflates multiple things, misses important aspects (e.g. consideration of consequences of changes) and on its own will not solve the main problems. Thryduulf (talk) 00:06, 8 September 2022 (UTC)
  • I doubt this will result in a clear, simple resolution. Even if I am wrong and the RFC results in this proposal being not only agreed to, but also all of the pages updated with clear statements, we will still have fights over what constitutes a "source which would plausibly contribute to GNG", because everyone knows that two sentences about *my* important subjects indicates notability, but that twice as many sentences about *their* worthless subject are not only useless but probably also copied from a secret press release after bribing the publisher. We will also struggle because we have never resolved whether the GNG's requirement of "multiple sources" that are independent, secondary, reliable and containing significant coverage means that a source that is independent, reliable and SIGCOV but not technically secondary (e.g., WP:PRIMARYNEWS) is something that "counts" towards notability, and we definitely haven't found an objective or consistent way to measure significant coverage (two consecutive sentences? 200 words? Ten severable facts that belong in an encyclopedia article?). Bottom line: I think this will fail to reach a decision, and even if it does, I think it will fail to solve the problems as they appear in individual articles. WhatamIdoing (talk) 01:02, 8 September 2022 (UTC)
  • Endorse with a few caveats. Based on the arguments I'm seeing at sports AfDs, we should make it clear that a single SIGCOV source is required to avoid speedy deletion but not necessarily sufficient to meet GNG/SNG or pass AfD. Regarding GNG vs SNG, a common point of conflict is that "meets either the general notability guideline (GNG) below, or the criteria outlined in a subject-specific notability guideline (SNG) listed in the box on the right" (from WP:N) is often interpreted to mean that an article is notable if it meets any criteria within a SNG, even if the SNG lead states that it is subordinate to GNG, so it would be good to clear up. This could be as simple as adding a note at WP:N to that effect. –dlthewave 01:47, 8 September 2022 (UTC)
  • Weakly oppose as stated. Clarify at WP:N to make explicit whether each specific SNG directly confers notability independent of GNG and to eliminate contradictions. This could be read as if we are clarifying the relationship to GNG of each SNG, individually. Require all creations under SNGs that do not confer notability to have at least one source which would plausibly contribute to GNG. What about "require all GNG-based SNGs..." to cut down on ambiguity? I would definitely support any proposal that required at least one (or two, or three, or five...) GNG-contributing source for new creations, but anticipate resistance if we don't also have an idea of how we would enforce this. Also, based on the way this is going at NSPORT, we will definitely have editors insisting an injury report or listicle entry or anything with the subject's name in the headline or single sentences they believe "demonstrate significance" (e.g. "X, a preeminent Y-er with a remarkable career lasting 11 decades, won the prestigious Z award yesterday.") are SIGCOV, so we'll probably end up addressing that eventually too (not in this RfC). JoelleJay (talk) 02:46, 8 September 2022 (UTC)
  • Weak oppose the clarification portion, and I agree that this question is really two parts. No comment for now on the second. My intuition is that this (very broad) question intersects too narrowly with an RfC on "article creation at scale/en masse". I haven't had much time for Wikipedia recently, but I'm really disappointed we got this far -- and are soon having the full RfC?? -- without considering examples. Are people disgruntled over WP:NACADEMIC? (Mass-created stubs on academics would be funny to see. FIRST LAST (born XXXX) is a PROFESSION at the INSTITUTE studying TOPIC. Their dissertation was called TITLE (YEAR).) Over the panoply at WP:NSPORTS? (I see a lot of athletes at NPP, but I dare not patrol those articles because I simply haven't the brain cells to understand that guideline.) Or maybe it's WP:GEOLAND? Then, even if whatever SNG in question were finally deemed subservient to GNG, would that solve the issue of mass creation? Ovinus (talk) 03:34, 8 September 2022 (UTC)
  • Endorse: Clarification of any policy that is frequently misinterpreted is a good thing, and misinterpretation of there policies does appear to be a problem with some mass creators. Contravening a clearly understandable policy becomes a behavioural problem, allowing a different set of remedies for unacceptable conduct. How and where these policies should be clarifies is another question. Differences in application of the policies may be appropriate for occasional article creators vs. creators of batches of similar stubs · · · Peter Southwood (talk): 05:14, 8 September 2022 (UTC)
  • Endorse, and no objections to a split. I don't understand some of the arguments above; David Eppstein, the sufficiency of NSPORTS for supporting creation or continued existence is responsible for approximately half the deletion-related conflict we see; how is it too specific an issue for this discussion? Vanamonde (Talk) 06:12, 8 September 2022 (UTC)
    NSPORTS has been the subject of a recent referendum that made big changes in our interpretation of it and is a fresh wound that needs healing, not immediate reopening. Most of the recent conflict has been because those changes have not had time to settle and become established, especially among some editors who were content with the old consensus, and because they involve a lot of changes to which actual articles we should keep. In that specific case, I think asking for another poll and another do-over is a mistake. But the proposed wording goes far beyond NSPORTS and asks us to revisit the details and independence of all SNGs. That is a huge can of worms that I would prefer not to open and especially not to subject to what is essentially the whim of a torch-and-pitchfork mob focused on a monster and not paying attention to the nearby straw roofs that their torches are lighting on fire. —David Eppstein (talk) 06:17, 8 September 2022 (UTC)
    I can understand not wanting to revisit a difficult conversation, but if we are to avoid discussing NSPORTS and other SNGs used to justify mass creation, we should give this RfC up right now. The issues are inseparable. Vanamonde (Talk) 06:21, 8 September 2022 (UTC)
    NSPORTS, under the new consensus, does not support mass creation. Neither does NGEO. Both of those are (now) the type of SNG that merely suggests to editors situations where sourcing is likely to exist, but defer to the GNG in requiring that the sourcing actually exist. What needs clarification is not what these guidelines say about notability, but rather how clearly actual notability (and not just the likelihood of notability) needs to be demonstrated at article creation time. The wording of the question is also problematic in a different way: it is worded in a way that assumes that there are only two kinds of SNG (those that defer to GNG and those that are independent of GNG). There is a third kind: SNGs that go beyond GNG in their restriction on what kinds of sources can convey notability. Both NCORP (which requires that sources be nonlocal) and, arguably NPOL (which at least as practiced, if not in its literal wording, prevents using coverage of unelected candidates for notability) are of that type. For articles that would fall under one of those SNGs, should the source that is provided meet the stricter standards of the SNG? Your question doesn't say. And who is to judge which SNG or GNG is the right choice for any individual article? It's not always an easy question (for instance the line between WP:PROF and WP:AUTHOR can be very unclear). —David Eppstein (talk) 06:34, 8 September 2022 (UTC)
  • Completely endorse. Would go with two sources as well. --WhoIs 127.0.0.1 ping/loopback 06:39, 8 September 2022 (UTC)
    See, User:Vanamonde93, this kind of answer is exactly why it is a very bad idea to bundle things together. You have asked two questions, and received a positive answer to one that would be used as evidence of consensus for the other even though it does not address it at all. —David Eppstein (talk) 06:57, 8 September 2022 (UTC)
    @David Eppstein: But I didn't bundle them, and am unopposed to bundling, so I don't see why you are directing that remark at me. To answer your point above; NSPORTS never did support mass creation explicitly, but was still used as justification for it. As such the recent RfC changes nothing. NGEO does confer notability independent of GNG, for a subset of geographic features that meet GEOLAND (and if two admins disagree on this point, it makes the need for a clarification that much more obvious). The question doesn't address NCORP at all, and I'm well aware that it's more restrictive than GNG (I've said so elsewhere over the course of this discussion). Most fundamentally, "how clearly actual notability (and not just the likelihood of notability) needs to be demonstrated at article creation time" can't be tackled when thousands of creations, and thousands of AfD !votes, have treated the likelihood of notability as actual notability. We need community consensus affirming that those are different, and how they are different. Vanamonde (Talk) 07:02, 8 September 2022 (UTC)
    Re: "have treated the likelihood of notability as actual notability. We need community consensus affirming that those are different": then state that much more specifically and unambiguously as a poll question, rather than asking "whether each specific SNG directly confers notability independent of GNG", a completely different question that the SNGs already largely state answers to. If you want consensus that likely notability is different from actual notability, and that actual notability needs to be demonstrated, you won't get it from a question that doesn't address that issue. The only reason to ask "whether each specific SNG directly confers notability independent of GNG" is to change what the SNGs already state about which kind of SNG each one is. If you didn't want to change that relation, you could just read the SNG instead of asking in a poll. Changing the individual consensus of all the SNGs at once, as a byproduct of a process focused on something else, is exactly what I'm opposed to. Clarifying the interpretation of "likely notability" is much better focused but is not what the current wording asks for.
    But now I'm confused about your role. Are you a moderator here, overseeing the process in a neutral way and making sure order and consensus are maintained, or are you leading the charge, pushing for change and setting an agenda that is worded in a non-neutral way that guides participants in the direction you think they should be guided? I thought it was the former but this interaction tends to make me think it is the latter instead. —David Eppstein (talk) 07:10, 8 September 2022 (UTC)
    Flattered as I am to be confused with Valereee, we're not the same person. I'm not a moderator, and I certainly have opinions about this issue, so I wouldn't have volunteered to be one. I think we need community consensus affirming that most SNGs do not confer automatic notability, but the question is worded more broadly out of fairness; if the community at large wants to declare that all SNGs do, that's an option they ought to have. Hence my support for the current question. I assume there will be considerable wordsmithing before it's actually posted, and also these aren't actual proposals, I assume; we'd want the community to !vote on what clarification would look like. Vanamonde (Talk) 07:24, 8 September 2022 (UTC) (Added post-ec): I wish you were right that we could "just read the SNG"; but we can't. Because WP:N is confusingly worded, because language use isn't consitent, and because some wolly notion of AfD conventions has not infrequently been used to argue against a simple reading of the SNGs. If we don't change the status of any SNGs, we're still left with wording and convention issues. Vanamonde (Talk) 07:24, 8 September 2022 (UTC)
    Oh, have I confused your identities? Oops. I do apologize, and would not have been as argumentative had I not been confused. —David Eppstein (talk) 07:27, 8 September 2022 (UTC)
    I'm sorry, what about my endorsing the question being in the RFC (and endorsing the proposers initial thoughts about 2 sources) make it 'the kind of answer' that implies any consensus beyond it should be in the RFC? Did you reply to the correct comment? I don't really understand the reply and it doesn't seem to make much sense. --WhoIs 127.0.0.1 ping/loopback 08:03, 8 September 2022 (UTC)
  • Lol...for anyone who was wondering why we needed the first part of the discussion to be unthreaded and to allow limited word counts, this is why. Valereee (talk) 12:40, 8 September 2022 (UTC)
  • Comment - This doesn't touch GeoStub articles which are a massive problem. SPORTSBIO articles already have this requirement. What exactly is this directed to? Species stubs? FOARP (talk) 14:26, 8 September 2022 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Question 2: New creations report

Can proceed without consensus
The following discussion has been closed. Please do not modify it.

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.



  • This seems like a potentially useful thing that could support any outcome of this RfC, but doesn't require consensus here to do, so maybe not necessary to include. — Rhododendrites talk \\ 15:14, 7 September 2022 (UTC)
  • I agree with Rhododendrites. Any interested person should feel free to go ahead and work on this proposal now. isaacl (talk) 15:42, 7 September 2022 (UTC)
  • I agree with the above: it's a great idea, and should be removed because it doesn't need consensus and its removal would simplify the RFC. Interested editors should feel free to go work on it (and to ping me if they want help). Levivich😃 16:45, 7 September 2022 (UTC)
    I'll leave this here for a day or so to give others a chance to chime in, unless someone pings me to say 'We've started work on this.' :D Valereee (talk) 17:33, 7 September 2022 (UTC)
    The database side of this is dead simple; it "just" needs a friendly UI. Example without filtering, though that's just as easy. —Cryptic 23:12, 7 September 2022 (UTC)
  • Endorse. Should be in place irregardless of the outcome. --Enos733 (talk) 17:45, 7 September 2022 (UTC)
  • This idea seems like it has a lot in common with the existing new pages feed. ~ ONUnicorn(Talk|Contribs)problem solving 17:51, 7 September 2022 (UTC)
  • I feel this is out of scope.—S Marshall T/C 22:52, 7 September 2022 (UTC)
  • Agree with Rhododendrites: useful but doable without polling and therefore not poll-worthy. —David Eppstein (talk) 22:58, 7 September 2022 (UTC)
  • Agree, in fact, any and all stats that we can get is A-OK with me! yes Atsme 💬 📧 23:54, 7 September 2022 (UTC)
  • I agree with Rhododendrites. Collecting stats and producing reports doesn't need consensus. Thryduulf (talk) 00:07, 8 September 2022 (UTC)
  • I don't think that the description as written will work. Specifically, I don't think that "filterable by category" is achievable in a wikitext table (which is the only way I know to have anything "sortable" on wiki) because articles can have multiple cats. I think this would have to be done in Toolforge. WhatamIdoing (talk) 01:06, 8 September 2022 (UTC)
  • Yes please -- I'd attempt to make it, but I haven't the time. Numbers are needed to inform the full RfC. (Btw, where did this oft-cited 25 articles/day figure come from? Divine inspiration?) Ovinus (talk) 03:37, 8 September 2022 (UTC)
  • Does not require consensus and would be useful to very useful, depending on what it reports. Should there be a project discussion to consider what we would like to see in the output? When could we see a prototype?· · · Peter Southwood (talk): 05:50, 8 September 2022 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Question 3: Creator-at-scale permission

So we're confident a software rate limit that ignores the creation of redirects is doable? Just checking. Valereee (talk) 18:16, 7 September 2022 (UTC)
Maybe? I think this is something we would need to ask the WMF for. BilledMammal (talk) 18:35, 7 September 2022 (UTC)
Maybe also split this one into two: (1) what specific rate limits for a rate limit policy, if any (I still like the proposed thresholds of 25d/50w/100m/500y), and (2) whether/how to enforce that, e.g. a software rate limit, a userright to exceed it, or both. Levivich😃 16:55, 8 September 2022 (UTC)
Instead, we need a policy that constrains actual mass creation; something that says if you want to create more than 10 highly similar articles, you need approval for that group. BilledMammal (talk) 18:35, 7 September 2022 (UTC)
The question here is whether the question goes to an RFC. I think that the proposal here is close to your alternative - allowing a highly productive article creator to get an advanced user right (approved by a group) to create more than (10/25/X) articles in a specific period - Enos733 (talk) 20:32, 7 September 2022 (UTC)
I'm suggesting a slightly different one go to RfC. My objection is that this will require productive but otherwise unproblematic editors to go through the process, while also not constraining problematic ones. The ideal here would be to maintain something similar to WP:MASSCREATE; require each "group" of mass created articles to receive consensus, rather than editors with this permission having carte blanche to create whatever they like. BilledMammal (talk) 20:37, 7 September 2022 (UTC)
@BilledMammal, you see asking for a userright one time -- a userright which like any other can be removed if it's abused, which means it's not really 'carte blanche' -- as more onerous than asking for consensus for every planned mass creation? Valereee (talk) 12:50, 8 September 2022 (UTC)
It will be more onerous for prolific editors who don't engage in mass creation, and that I something I want to avoid. In addition, I don't think we want to approve mass creation by editor, I think we want to approve it by mass creation; while we might approve an editor mass creating articles on topic A, that doesn't mean we would approve the same editor mass creating articles on topic B.
In addition, I'm not convinced that we'll remove the user right in a timely manner if it is abused; my earlier example of Lugnut's abusing autopatrolled for years demonstrates that. BilledMammal (talk) 04:17, 9 September 2022 (UTC)

Question 4: Require consideration of alternatives to creation

Not endorsed for inclusion in RfC
The following discussion has been closed. Please do not modify it.

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.



  • I don't know how this would work in practice. A subject should meet English Wikipedia's standards for having an article in order for an article on it to be created, but it's still an editorial decision if a new article is the best way to organize the information comprised within the overall domain. Thus editors are always having to judge if content on a subject best fits within another article or a separate article. The only way I can think of to establish if this has been done is for all new articles to be discussed first. If that is the intent, then I think it should be proposed directly. isaacl (talk) 15:52, 7 September 2022 (UTC)
  • Maybe I'm not using my imagination, but a policy to require consideration of alternatives to creation, with sanctions for those who do not adhere to such policy seems like a near-non-starter. The broader community is pretty reluctant to erect barriers to creating content except where it's clearly enforceable, can be clearly communicated, and would clearly prevent more problematic content than good content, and I'm not sure this would qualify. — Rhododendrites talk \\ 16:41, 7 September 2022 (UTC)
  • As above, I don't see how it is possible to require consideration or to determine who has not followed this requirement. "You must think about this, or else!" :-) Maybe it could be rephrased into something broader, like asking whether our PAGs should be changed to encourage alternatives to creation, and if so, how? Levivich😃 16:50, 7 September 2022 (UTC)
    I included this here because it was a serious suggestion, but I too was unsure how this could be made to work. Finally decided 'Just moderating here.' :D Valereee (talk) 17:31, 7 September 2022 (UTC)
    So, basically like AtD, which is also touted as "required" and "a policy" despite no language existing on what violating it would look like or how to enforce it... JoelleJay (talk) 03:11, 8 September 2022 (UTC)
  • I too feel this is unworkable.—S Marshall T/C 22:54, 7 September 2022 (UTC)
  • Let alone not knowing how this would work, I'm not even sure what it's supposed to mean. I have to have read someone's policy page on alternatives before creating? I have to have, in my mind at the time of creation, the possibility of alternatives? I have to go through some checkbox saying I know there are alternatives? Anything I can think of that this might mean either comes across to me as ineffective or both ineffective and bureaucratic. —David Eppstein (talk) 23:09, 7 September 2022 (UTC)
    I think it's supposed to mean that you consider things like making a list article with 10 (detailed) list items instead of 10 articles. WhatamIdoing (talk) 01:42, 8 September 2022 (UTC)
    That doesn't explain what "consider" means, which is more my question. —David Eppstein (talk) 07:25, 8 September 2022 (UTC)
  • I don't see anything to react to in this proposal. - Donald Albury 23:28, 7 September 2022 (UTC)
  • Oppose per David Eppstein. Also, I'm unsure what is even meant by "alternatives to creation" - adding content to existing articles? Doing nothing? Writing a draft (or is that creation)? Asking somebody else to create it for me? Thryduulf (talk) 00:15, 8 September 2022 (UTC)
  • Nope, not seeing it. Atsme 💬 📧 01:17, 8 September 2022 (UTC)
  • How would we objectively measure compliance? I cannot see how this could work. It looks unmeasurable, unenforceable, and more likely to be used for personal attacks than anything constructive. Convince me otherwise with evidence. · · · Peter Southwood (talk): 05:31, 8 September 2022 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Question 5: Clarify WP:BEFORE

Not endorsed for inclusion in creation RfC
The following discussion has been closed. Please do not modify it.

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.



  • I think it's a good question but it's about deletion not creation IMO. Whether articles can be assumed to be cited to the best readily-available sources depends on what sources they are required to have when they are created; thus the answer to this Q5 depends on the answer to Q1. This should be a question in the second RFC about deletion. Levivich😃 16:53, 7 September 2022 (UTC)
    Yeah, I waffled on whether this belonged in this RfC. Valereee (talk) 17:29, 7 September 2022 (UTC)
  • I think we should be allowed to presume the best sources in the article at the end of a full AfD are the best available. That idea might fit better in the second RfC.—S Marshall T/C 22:57, 7 September 2022 (UTC)
  • Oppose. I think any relaxation of the principle that deletion-nominators should actually perform BEFORE, and its proposed replacement by a principle that articles as written can be assumed to have been written with the best possible sources, is a bad idea based on false premises. —David Eppstein (talk) 23:00, 7 September 2022 (UTC)
  • Oppose. Out-of-scope for an RfC about article creation. - Donald Albury 23:31, 7 September 2022 (UTC)
  • Oppose per David Eppstein and Donald Albury. Clarifying BEFORE would be good, but it needs to be strengthened (e.g. always requiring someone to look for sources in the place they are most likely to exist before nominating on the grounds of verifiability or notability) and enforced rather than weekend as suggested here (contrary to WP:V which states that articles must be verifiable not that they must be verified). Thryduulf (talk) 00:19, 8 September 2022 (UTC)
  • If it is determined that this question is out of scope then so be it; however, I will go on the record to say BEFORE is essential and should be the first (mandatory?) step prior to deletion. We are getting too many articles at AfD despite NEXIST and CONTN – the process is being misused for discussions that belong on the article TP or with the article creator. I also need specifics as to what some believe is in need of clarity. Atsme 💬 📧 01:24, 8 September 2022 (UTC)
  • This is not a functional RFC question. It might be helpful to have a discussion around BEFORE, but instead of doing this, this proposed question asks editors to vote on a statement of fact, when they don't have the information necessary to determine whether the statement is true or false (especially for any subjects they're unfamiliar with). Also, I suspect that what's intended here isn't "In our experience, certain articles (almost) always have the best readily-available sourcing at the time of creation", but instead "We never require editors to follow BEFORE if they believe that an article was created under an SNG". And that, by the way, highlights another problem: How would the AFD nom even know whether a given article is a "creation under SNGs"? Articles don't come with color-coded badges that say "I'm a GNG subject" or "I'm an SNG subject". WhatamIdoing (talk) 01:54, 8 September 2022 (UTC)
  • Support, although it might be more appropriate for the second RFC, yes. This is a vital open question with clear relevance to article creation, since many of the people who have most prolifically created articles have cited the interpretation that makes it mandatory in a way that makes it clear that they are leaning on this belief as part of what makes mass article-creations possible. To respond to David Eppstein specifically - there is currently clearly no consensus backing your opinion that WP:BEFORE searches are mandatory; no one, to my knowledge, has ever been sanctioned for "failing" to perform the search that you believe they are required to perform (nor could they be, since there's no consensus backing that interpretation and a clear contradiction between it and WP:BURDEN; or between it and WP:NEXIST, which merely says that such searches are "strongly encouraged.") If you believe that BEFORE searches ought to be mandatory, you should be pushing for an RFC to clearly establish this, but do not state or imply that it is mandatory currently - there is no consensus backing that position. --Aquillion (talk) 01:57, 8 September 2022 (UTC)
  • Clarification may be useful, but how would one objectively measure compliance? Without a measure for compliance, how would it be enforceablr? Either we require evidence of sufficient notability or we do not. In think we should require strong evidence of notability when creating "at scale" (batches of articles on closely related topics), as those editors would be expected to be competent, but keep the status quo for occasional creation to remain reasonably friendly to new editors and occasional article creators. · · · Peter Southwood (talk): 05:44, 8 September 2022 (UTC)
  • Support wholeheartedly. BEFORE is not currently required, just encouraged. I'd like to see a compromise here. Make BEFORE a requirement, and make the BEFORE search be sources IN or MENTIONED IN the article. Cant type up ref formatting? We got you. Make us go on a wild goose chase to prove YOUR work is encyclopedic? That's a no. --WhoIs 127.0.0.1 ping/loopback 06:43, 8 September 2022 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Question 6: Clarify SNG policy

Useful question for a later discussion
The following discussion has been closed. Please do not modify it.

Clarify at WP:N to make explicit whether each specific SNG directly confers notability independent of GNG and to eliminate contradictions. Please limit yourself to a single brief comment. If you must argue a point with another editor, please just take it to their talk; if they convince you to change your mind, come back and revise your single brief comment. Remember that we aren't !voting on these questions, simply trying to refine them and gain consensus on including them in the RfC.

Discussion of Q6

  • Splitting this from Q1. Valereee (talk) 15:37, 8 September 2022 (UTC)
  • Weak non-endorse, as this question could, and I think on balance should, be run as a separate RFC altogether, because it's a broad question (there are lots of SNGs) and it affects non-mass-created articles just as much as mass-created articles. Levivich😃 16:59, 8 September 2022 (UTC)
  • Weak non-endorse for exactly the same reasons as Levivich. Thryduulf (talk) 21:49, 8 September 2022 (UTC)
  • Perfectly fine question for a different time (e.g., 2023). Also, if anyone is making a note to run this later, please spend a while contemplating the other obvious way to address GNG/SNG questions, namely asking editors whether they want a rule that says "The English Wikipedia will not have any articles on subjects for which editors cannot find at least two independent/third-party reliable sources, which together contain enough information to write a short encyclopedia about the subject," which solves the problem in a different way. WhatamIdoing (talk) 02:23, 9 September 2022 (UTC)
  • Agree with the previous three editors. Any future discussion would also need to carefully consider WP:NCORP and other SNG's that apply stricter limits than GNG. BilledMammal (talk) 04:25, 9 September 2022 (UTC)
  • Endorse, because the vast majority of conflict that I have seen at AfD related to mass creation and deletion has to do with GNG and SNG differences, and conflicts in their interpretation. I do not object to handling it separately, but if we don't handle it, nothing else we come up with is going to be meaningful. Vanamonde (Talk) 08:42, 9 September 2022 (UTC)
  • Basically agree with first four responders above. This will likely be useful to pursue, but maybe later, as a separate RFC. - Donald Albury 13:32, 9 September 2022 (UTC)
  • As far as article creation goes, this is probably not necessary if Q7 is asked. It's probably more relevant to the AfD RfC. Scolaire (talk) 16:35, 9 September 2022 (UTC)

Question 7: Require a GNG-quality source

Please limit yourself to a single brief comment. If you must argue a point with another editor, please just take it to their talk; if they convince you to change your mind, come back and revise your single brief comment. Remember that we aren't !voting on these questions, simply trying to refine them and gain consensus on including them in the RfC.

Proposed wording 1: Require all articles created under SNGs (other than those which confer notability) to have at least one source which would plausibly contribute to GNG: that is, that constitutes significant coverage in an independent reliable secondary source.

Proposed wording 2: Require all articles (except those not required to meet GNG) to have at least one source which would plausibly contribute to GNG: that is, that constitutes significant coverage in an independent reliable secondary source.

Proposed wording 3: Require all WP:MASSCREATEd articles to have at least one source which would plausibly contribute to GNG: that is, that constitutes significant coverage in an independent reliable secondary source.

Please in your response indicate whether you'd endorse one, two, or all three for inclusion in the RfC, not whether you'd personally support or oppose. Here we're just trying to refine and gain consensus on wording to include in the RfC. Valereee (talk) 17:41, 8 September 2022 (UTC)

Discussion of Q7

Question 8: Mass creations noticeboard

Create a dedicated noticeboard to allow for consensus for, notifications of, reports of, and discussions of mass creations and the sources used for such creations. (Details to be developed there.)

Discussion of Q8

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

General discussion (please discuss specific proposed questions above in their own sections)

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.



The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

An alternative to a multi-question/multi-variable sequence of RfCs

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.



Valereee, I wasn't sure where to put this, but it seemed useful to discuss here rather than your talk page. If it's disruptive or confusing down here, feel free to move it to my section above.

There are many people who have been following the issues of mass creation/deletion. What about instead of treating every variable as unknown for the purposes of an RfC, what if we workedd together to workshop an actual process for mass creation based on what we've seen in the various threads -- a process that could be refined later, but provides a starting point. Large, multi-stage, multi-question RfCs are thorough, and can produce good results, but they can also be complicated, result in some confusing/contradicting outcomes, and produce results that are hard to modify or implement. The risk of proposing a specific process for the community to !vote on is that the specificity has the potential to lose people who feel passionately about a particular detail, but can also be productive in giving people something actionable to work with (and later implement). I'm thinking of, for example, when WP:NCORP was completely overhauled, and we had an RfC about using the rewrite as the new starting point rather than debating each and every change. — Rhododendrites talk \\ 20:52, 7 September 2022 (UTC)

A process to workshop

So you want to create a bunch of articles.

Does this guidance apply to you?

  1. Are you planning to create more than 50 new articles in the span of a month or 500 in the span of a year?
  2. Are those articles on a similar topic, similar theme, or are they based on the same set of sources?
  3. Will the articles be created manually, rather than through use of a bot or tools like AutoWikiBrowser (these must go through the Bot Approvals Group)?

If the answer to all of these is yes, this guidance applies to you. (Note that even if the answer is no, if an uninvolved administrator has determined your editing fits within the spirit of these requirements, you will still be expected to follow them).

You must post a notice to [new venue to be created] with the following information:

  1. The approximate number of articles you will create
  2. The approximate time frame for creation
  3. A description of the overall topic/theme
  4. Which notability criteria you will be using, and the kind of sourcing you will use to demonstrate that each article meets the criteria

[some additional work on how long these discussions stay open, who approves them, if there's an appeals process, etc. could be added here or deferred to a separate RfC on process for that new venue]

Mass created articles must include sufficient sourcing to show notability, and cannot be based only on simple statistical databases. While there are no firm requirements about the level of quality an article must reach when created, many in the community have a strong preference for mass created articles to be more than one- or two-sentence stubs.

If articles are created after [date this goes into effect] that do not comply with these rules, notice should be posted at [the new venue] for review. An uninvolved administrator may, at their discretion, and with feedback from the community, speedy delete the articles under [criterion TBD, but it should be one that allows refunds], draftify/userfy, or in unusual circumstances even allow keeping the articles and requiring they go through AfD. — Rhododendrites talk \\ 20:52, 7 September 2022 (UTC)

Discussion (proposing a process)

I think that's okay – could be a workable process, could be helpful – but I don't think it solves all the problems. Some of the problems described above have relatively little to do with mass creation per se. Consider, e.g.:
  • complaints about the NPP backlog, even though most of them aren't mass-created articles, and even though most of mass-created articles are really quick and easy to process.
  • complaints about article quality, even though mass-creation of FAs, or heavily sourced jewels of stubs, is just as much mass creation as two-sentence stubs.
I saw some articles created recently by an editor in one of these mass-creation discussions. They had a small infobox and said things like: "Geographic Place is a place near Other Place in State, Country. Part of a Film was filmed there." The sources were a single government database (for the location) and a single article (for the film's name). There were a few of these. Even if they were pre-approved, consider the effects from the two viewpoints above:
  • If this editor doesn't have Wikipedia:Autopatrolled, then the creation of those articles means that someone in NPP has to look at them. It ultimately doesn't much matter whether one editor writes n of these or n editors each write one of these; n articles created is still n articles for NPP to process, and because of how burdensome we've made NPP over the years, the NPP reviewer might spend more time running their checklist than the editor spent writing the two-sentence, two-source article. You can probably imagine why some NPPers shudder at the idea of anyone mass-creating any articles. (NPP used to be all about CSD, but these days, they're trying to be one-stop shopping for every aspect of quality control, including everything from article titles to stub-tagging, even including typo fixing.)
  • It's kind of a lousy article. This doesn't bother me, personally, but it does bother some editors. They are disgusted by the idea of "inadequate" articles. They don't want "embarrassing" articles. They don't actually care about mass creation per se, except to the extent that mass creation is sometimes associated with the creation of extremely brief articles. So for this group, your process doesn't directly address their real problem (short, boring articles on subjects I don't care about), and it might even actively authorize an increase in their problem.
I think we could address these problems (e.g., stop telling NPPers to be human grammar checkers; agree on whether embarrassing articles and imperfect edits are still part of the glorious wiki process, or if only perfect editors are welcome now), and these problems are only partly connected to mass creation, but I don't think your proposed process will appeal to either of these groups, because they have problems that it will not solve. WhatamIdoing (talk) 02:28, 8 September 2022 (UTC)
@WhatamIdoing: complaints about the NPP backlog - I think it does get at this by requiring that any mass creation post a request with some basic information about notability/sourcing. That should ensure anyone reviewing at NPP should have an easy time. These are all compromises, of course.
complaints about article quality - There's the line above that starts While there are no firm requirements about the level of quality.... I feel like it's about what we could find consensus for (a recommendation). I'm quite doubtful that a proposal to require a certain size/quality would find consensus, and the request process should ensure any debates over sourcing re: SNG/GNG are sorted out in advance. It does sort of move those debates rather than solve them, but I think finding a SNG vs. GNG solution is outside the scope of this RfC anyway.
The sources were a single government database (for the location) and a single article (for the film's name) - If the source for the location is a database, that's addressed above (Mass created articles must include sufficient sourcing to show notability, and cannot be based only on simple statistical databases). The point of a process like this would be to ensure something like that doesn't get pre-approved.
doesn't much matter whether one editor writes n of these or n editors each write one of these - but the latter is outside the scope of this RfC, isn't it?
some NPPers shudder at the idea of anyone mass-creating any articles - I do get that, even if it's hard to draw clear lines about what "mass-creating" means. Regardless, again, for this approach of proposing a process IMO it's important to try to find something that would be broadly acceptable even if none of the sides feel like they entirely got their way.
It's kind of a lousy article - Sort of similar to above, but I think this is one of the perspectives that has the potential to sink the productivity of this RfC. There's just no way a large RfC is going to codify requirements for quality beyond something really, really minimal (like enough sourcing to show notability and a recommendation that you do more than a couple sentences). YMMV. — Rhododendrites talk \\ 03:01, 9 September 2022 (UTC)
  • Greater than N should be an option. - By this do you mean propose a process but ask a separate question about the specific numbers? If so, that seems reasonable to me.
  • It is possible that an editor may start with no specific intention to mass create - If we're only talking about creating a large number of articles on a similar topic/theme/using the same sourcing, this is a rule that people will simply need to be aware of (like people are unaware of 3RR until they are, at which point they realize that they'll need to count). Creating that many iterative articles in a month isn't something that just sneaks up on you, I don't think (though I've never done it myself). Ultimately there's not really any difference between someone who mass creates 50 articles in a month and, oops, didn't realize, and someone who just mass creates 50 articles and didn't request permission first. The difference, I suppose, is awareness, and certainly new rules require some flexibility to ensure people are aware of them. Perhaps I've misunderstood.
  • Would this be a setting permission or just a finding at the discussion? - I don't understand. You mean permission (or lack thereof)? I'd think it would just be an "ok" rather than something technical. — Rhododendrites talk \\ 03:12, 9 September 2022 (UTC)
To take one of the quarry searches provided in the above Statistics section and sort for shortest average content, here's 100+ extremely short stubs on species in the genus Carex, created in maybe three months and all sourced to the same database. Before that it was all the entries in a different genus, same database. This editor is working their way through the database in order and creating an article for each entry. They work inconsistently, a few articles a day, then a day off, then ten articles. (Courtesy ping to Hughesdarren)
Surely this is something we'd want the process to include, but that rate wouldn't necessarily be captured by 50 per month or 500 per year, and I don't imagine this editor is necessarily planning to create at that rate...it probably happens one month and not the next, or one year and not the next. Valereee (talk) 14:00, 8 September 2022 (UTC)
This is a good point, and a good reason to workshop this (I don't know how a sprawling, many-question RfC would quite address all of these sorts of situations, either). The process above includes Mass created articles must include sufficient sourcing to show notability, and cannot be based only on simple statistical databases. It may be worth lowering the threshold when databases are concerned (more than 20/month or 200/year, for example). — Rhododendrites talk \\ 03:16, 9 September 2022 (UTC)
@Rhododendrites, yes, my concern is that this is a pretty complex proposal. Could we boil it down to a single idea, like "Create a noticeboard for reporting and discussing mass creations (see WP:MARV for a current similar discussion)." The exact details perhaps could later be worked out there -- what 'mass creation' entails, for instance, and which planned creations would need to be reported up front to allow for consensus to be gained before the work is done? Valereee (talk) 13:02, 9 September 2022 (UTC)
That's why I suggested putting a limit on the number of stubby, undersourced articles an editor can have in their creations, and requiring them to expand/source them before they can add another stub. JoelleJay (talk) 20:43, 9 September 2022 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

The database problem

Summary: Some entries in some databases may represent significant coverage

There are several comments above that contrast "GNG sources" and "databases". I think this is overly simplistic, and I am concerned that this is going to turn into a destructive meme during the RFC.

On the one hand, there are databases that are not independent, do not contain significant coverage, or are otherwise not reliable for any encyclopedic purpose. See, e.g., a database that matches ISBNs with bibliographic data about the book registered to that number. It may be reliable, but it does not have much information in it. You can write some information from a small record like that ("Alice Expert wrote a book, The Sun is Really Big, in 2007") but you can't really turn it into a whole encyclopedia article.

On the other hand, there are database records like https://omim.org/entry/609423, which contains both more complete sentences in prose and more inline citations than most of our articles ever will.

While the extremes might be tolerably obvious, in the extensive middle ground, it will be difficult for editors to decide, fairly and without bias, which ones contain enough information to "count". There will always be a tendency for editors to evaluate the amount of content in a database entry according to whether or not they believe the subject is "worthy". The sports fans will always approve of databases about sports; the Wikipedia is serious business folks will always prefer databases about academic subjects.


And then there is the other problem, which is that some people can get more out of some database entries than others of us.

Consider https://www.fishbase.se/summary/Entomocorus-benjamini.html which I found in a stub created in 2009. A lot of us are going to look at that (please do glance at it now) and say "Ugh, what a useless source". There is not a single sentence on the page. And others of us are going to say that it's a good source with tons of information in it. For those who don't "speak" biology, here's what that source says, in plain English:

That's 15 to 20 severable facts from the main page of a single database entry that doesn't contain a single complete sentence. Just what I've written here is almost long enough to qualify for WP:DYK.

If you click through some of the links to subpages, you find that the fish has been reported in at least two countries (Bolivia and Brazil), that it has a Valid name (that's a thing for animals), that it's been entered into the Catalogue of Life (which will interest Wikidata more than us), that it's found in inland waters (which you already knew, if you knew where the Madeira River is, but, hey, now it's officially a statement that this source Wikipedia:Directly supports), and that it's a Native species of the Madeira region and endemic in the Neotropical realm, plus a complete taxonomic hierarchy (perfect for filling out those infoboxes) and a list of specific places (turns out it's deep rivers, not lakes) where it's been reported in the academic literature, including citations to those reports.

This single source contains plenty of objective, encyclopedic information. But not everyone can see that, even if they're genuinely trying. All some people can see is "The article contains two sentences and the lone cited source is Greek to me."


I hope this illustrates the two problems of talking about "databases". I think the end result of talking about "database sources" is going to be destructive. We're going to end up with one-size-fits-none claims, and with people dismissing rich sources of information because they don't understand them, rather than because of any limitations in the databases themselves. I think we need to be more descriptive and specific, like "I don't want editors mass-creating articles from sources that contain very few actual facts that would be appropriate for an encyclopedia article. It doesn't actually matter whether that fact-deficient source is 'a database entry' or 'a long feature story in a gossip rag'. We need sources that contain a lot of information, no matter what format that information is presented in.".

Of course, if your main issue is "I don't want editors mass-creating two-sentence articles", then that's a separate problem. But I still encourage you to not blame "databases" when it would be possible to use only that database to write a much longer article. WhatamIdoing (talk) 04:23, 9 September 2022 (UTC)

Closing workshopping

Closing this as I think we've got enough input. The RfC will be at WP:ACAS and will be announced in various fora. No confirmed timeline yet, sorry! Valereee (talk) 14:47, 10 September 2022 (UTC)

@MJL, feel free to archive the entire page, we'll want it blank when we start the RfC. Valereee (talk) 14:48, 10 September 2022 (UTC)