Problem statement

We allow new editors to choose just about any unused name (with some restrictions) they wish.

This policy has led to many problems, some of which will get worse, some of which are already serious. I have a possible solution, which won't undo the problems already caused, but can stem the growth in problems.

Problem 1: Someone persuades Jane Doe she should try editing Wikipedia. She wants to use her own name. However, it is already in use. The odds are high that the other user is inactive, but unless the number of edits is low enough to usurp, the name is not available. Even if it is eligible to usurp, that's a barrier, and maybe she just decides not to bother. For every active editor name, there are over 200 inactive names, so the odds are high, and getting higher than the desired user name will be a problem.

Problem 2: Suppose Jane Doe does register, becomes an active editor with thousands of edits, then decides to work on another language or project. However, she is told to get a unified login, and there's a problem. The user name exists on another project, not currently active, but too many edits to usurp. She can pick a new name, but that's a burden. Update No longer an issue with unified login.

Problem 3: Because so many companies have a presence on Wikipedia, and it is so ubiquitous in searches, companies without a presence want to add one. A typical employee at XYZ Corp is told to create an article, and the first thing they do is create an account with user name XYZ Corp. Shortly thereafter, they are blocked indefinitely, and while we provide an explanation, it doesn't always sink in and they are angry and confused and write to OTRS. Worse, they don't know the difference between a user name and an article, so I just spent almost two hours with an angry editor wanting to know why they were told they could not create an article about XYZ Corp. After searching for deleted titles (Maybe they meant XYZCorp or XYZ corp, or XYZ Corporation), and checking the salt list, I finally figured out their user name XYZ corp was blocked. Please don't tell me that the account creation page clearly lets them know that they should not create such a user name, it might even be true, but it isn't sinking in. This is not an isolated incident, I see several instances a week where an editor has assumed that their user name should be the same as the article name.

Problem 4: Jane Doe creates an account, and starts creating an article, then is bombarded with templated acronym soup. As a community, we ought to talk to new editors differently than we do established ones, and many do, but some do not. It is all too easy with semi-automated tools to drop a template on the editor, without checking to see that the editor is brand-new, and ought to be handled differently.

Problem 5: Jane Doe creates an account, gets involved and then gets some unwanted attention. She wants a new user name, which is possible, but has its own set of issues, if nothing other than the bureaucratic efforts to handle the transfer.

Luckily, I have a solution, and it will address every one of these problems, either to make it better, or at least stop it from becoming worse. The solution would have helped more years ago, but I cannot turn back the clock. Let's at least stop from making the problem worse.

Solution in a nutshell

Solution - anyone creating a new account will not have the option/burden of selecting a name, it will be assigned. After some period of time, they can choose a name in the usual way.

Details and discussion

We can work out the details but if the solution were implemented April 1 of this year, I would suggest that the first account is assigned EditorEn130401.00001. The next one gets EditorEn130401.00002. The first one on the next day gets EditorEn130402.00001. I trust the pattern is obvious. This works best if done for all languages, hence the "En". (This name is on the long side, as an alternative, drop "editor"; the generated name starts with the two letter code for the home language wiki, followed by a date, followed by a sequence,)

After the editor reaches 1000 edits (or some combination of edit count and time), the editor will be invited to choose a new name, of their own choosing, but they should read the user name policy. They may have been represented with the user name policy on day one, but it didn't sink in. Now they will have a better understanding of the place, and are less likely to pick their employer, or a lame name. They may choose their own name, but they will have a better appreciation of whether this is a good idea. Over 99 out of 100 initial editors will never get to this stage, so will not deplete the inventory of "real" names. At the same time we assign a real name, I think we should do this on a unified basis, to avoid the problems when someone wants to extend to another language, and finds their name is in use. This automatic unification would be a bad idea if we did it under the present way of assigning names, but the number of people meeting the threshold is small enough (100k of so) that it should not be a problem.

I fully anticipate some opposition to this idea. I think much of it will be of the inertia/IDONTLIKEIT variety, but if there are legitimate issues, I want to address them.

I'll start by explaining what I see as advantages beyond the obvious.

The major disadvantage (other than the effort required to implement) is that some new editors may feel they are second class citizens if their user name is no numerical. I appreciate that concern, but I think on balance it is a benefit. We treat out brand-new editors badly, and sometimes this is simply because we do not recognize that they are new. With this naming scheme, it will be obvious who is new, and maybe someone will even design specialized messages that might be triggered based upon the editor name.

One advantage is that there will be no excuse for not knowing that an editor is new. If the editor name is EditorEn130402.01234, you know that they are a newish editor, and you may need to be careful about how you explain things. As a second benefit, you know exactly how new they are. If you see that editor on 3 April, you know they are in their second day, but if you see them on 3 May, you know they've been around for a month, are still new, but may be expected to know a little more.

Another advantage is that we don't exhaust the finite list of interesting names as quickly, and we won't have as many problems assign names on a unified basis. 99 of 100 people signing up as an editor will give up for a variety of reasons before getting to 1000 edits, so will never use up a "good name".

This will eliminate the problem of COI names, although of course it won't eliminate, and possibly exacerbate COI editing, as they won't be as easily caught. However, while it is obvious that there is a COI when user XYZ Corp edits the article of the same name, I don't think anyone is having difficulty identifying COI editing. There might be the rare occasion where an editor is assigned an initial user name, and after 1000 edits is too clueless to understand that they shouldn't choose a name like XYZ Corp, but these should be rare. It happens often now because it seem like a good idea at the time. I submit that after a few months experience, and then reading the policy before choosing a real name, they will understand in most cases.

I also think this will help with sockpuppet issues, but that is not an area where I have much experience, so I'll defer to others whether there is any advantage. My understanding is that some puppet masters create a stable of names, get a handful of edits so they become auto-confirmed, then can use them if needed. Under this approach, unless they are willing to do 1000 edits, the name will jump out as not an established name, and that might help with identification.

Numbers

Exact numbers are not needed to see the magnitude of the problem or the magnitude of how much better we would be if this had been implemented at the start.

But here are some numbers, some rougher than others.

English Wikipedia Accounts 47,203,983
Number of accounts with over 1000 edits: ~25,000^[1]
Number of accounts with over 316 edits: ~56,000^[1]

This means we have assigned over 18 million names, most of which are not eligible to be used by anyone else, or we have to jump through some hoops to use the name. While my proposal would have more benefit if initiated at the beginning, I am not recommending that existing low edit count users be renamed. They should be grandfathered.

Under the proposal, only 25,000 names would be "used up". If the threshold of 1000 were dropped to 300, the number climbs to only 56,000, leaving over 18.5 million names available. This would virtually eliminate the need for USURPS, would cut back the requests for user rename, and would virtually eliminated the number of rename requests that need to be rejected because a name is already in use someone.

Even if one notes that most of the 18 million do not have a single edit, and the only ones worth worrying about are the 4.3 million with at least one edit,^[1] the same conclusions hold.

Evidence of a problem

On 11 October 2017, I picked up an OTRS inquiry from someone who was quite unhappy that they were being accused of vandalism. This turns out to be the dynamic IP range blocking problem which also ought to be addressed, but I mention it to note that the interaction doesn't start off with a happy reader, it starts off with someone who correctly feels they have been unjustly accused.

I explained that I would be happy to create a username for them to solve the problem of getting misdirected vandalism accusations and blocks. They proposed a name, I tried creating it and found out that it was in use. Well, not exactly in use but it had been registered in 2009. Not a single edit ever made but the username is not available.

I wrote back and apologized that their desired username was not available. They gave me a second option. Almost exactly the same problem. In this case their desired username was registered in 2015 and made one edit then abandoned. I'm now in the awkward position of asking a person to come up with a third option for a username. Had my proposal gone into place both of those usernames would be available.

After returning home from a meeting, I saw an email from the individual suggesting a third option.

Also taken. By someone with two edits, both deleted. Yet another example of a name that would still be available if we had enacted a sensible naming process.

Problem statement

Solution in a nutshell

Details and discussion

Numbers

Evidence of a problem

References