Problem statement

We allow new editors to choose just about any unused name (with some restrictions) they wish.

This policy has led to many problems, some of which will get worse, some of which are already serious. I have a possible solution, which won't undo the problems already caused, but can stem the growth in problems.


Luckily, I have a solution, and it will address every one of these problems, either to make it better, or at least stop it from becoming worse. The solution would have helped more years ago, but I cannot turn back the clock. Let's at least stop from making the problem worse.

Solution in a nutshell

Solution - anyone creating a new account will not have the option/burden of selecting a name, it will be assigned. After some period of time, they can choose a name in the usual way.

Details and discussion

We can work out the details but if the solution were implemented April 1 of this year, I would suggest that the first account is assigned EditorEn130401.00001. The next one gets EditorEn130401.00002. The first one on the next day gets EditorEn130402.00001. I trust the pattern is obvious. This works best if done for all languages, hence the "En". (This name is on the long side, as an alternative, drop "editor"; the generated name starts with the two letter code for the home language wiki, followed by a date, followed by a sequence,)

After the editor reaches 1000 edits (or some combination of edit count and time), the editor will be invited to choose a new name, of their own choosing, but they should read the user name policy. They may have been represented with the user name policy on day one, but it didn't sink in. Now they will have a better understanding of the place, and are less likely to pick their employer, or a lame name. They may choose their own name, but they will have a better appreciation of whether this is a good idea. Over 99 out of 100 initial editors will never get to this stage, so will not deplete the inventory of "real" names. At the same time we assign a real name, I think we should do this on a unified basis, to avoid the problems when someone wants to extend to another language, and finds their name is in use. This automatic unification would be a bad idea if we did it under the present way of assigning names, but the number of people meeting the threshold is small enough (100k of so) that it should not be a problem.

I fully anticipate some opposition to this idea. I think much of it will be of the inertia/IDONTLIKEIT variety, but if there are legitimate issues, I want to address them.

I'll start by explaining what I see as advantages beyond the obvious.

The major disadvantage (other than the effort required to implement) is that some new editors may feel they are second class citizens if their user name is no numerical. I appreciate that concern, but I think on balance it is a benefit. We treat out brand-new editors badly, and sometimes this is simply because we do not recognize that they are new. With this naming scheme, it will be obvious who is new, and maybe someone will even design specialized messages that might be triggered based upon the editor name.

One advantage is that there will be no excuse for not knowing that an editor is new. If the editor name is EditorEn130402.01234, you know that they are a newish editor, and you may need to be careful about how you explain things. As a second benefit, you know exactly how new they are. If you see that editor on 3 April, you know they are in their second day, but if you see them on 3 May, you know they've been around for a month, are still new, but may be expected to know a little more.

Another advantage is that we don't exhaust the finite list of interesting names as quickly, and we won't have as many problems assign names on a unified basis. 99 of 100 people signing up as an editor will give up for a variety of reasons before getting to 1000 edits, so will never use up a "good name".

This will eliminate the problem of COI names, although of course it won't eliminate, and possibly exacerbate COI editing, as they won't be as easily caught. However, while it is obvious that there is a COI when user XYZ Corp edits the article of the same name, I don't think anyone is having difficulty identifying COI editing. There might be the rare occasion where an editor is assigned an initial user name, and after 1000 edits is too clueless to understand that they shouldn't choose a name like XYZ Corp, but these should be rare. It happens often now because it seem like a good idea at the time. I submit that after a few months experience, and then reading the policy before choosing a real name, they will understand in most cases.

I also think this will help with sockpuppet issues, but that is not an area where I have much experience, so I'll defer to others whether there is any advantage. My understanding is that some puppet masters create a stable of names, get a handful of edits so they become auto-confirmed, then can use them if needed. Under this approach, unless they are willing to do 1000 edits, the name will jump out as not an established name, and that might help with identification.

Numbers

Exact numbers are not needed to see the magnitude of the problem or the magnitude of how much better we would be if this had been implemented at the start.

But here are some numbers, some rougher than others.

This means we have assigned over 18 million names, most of which are not eligible to be used by anyone else, or we have to jump through some hoops to use the name. While my proposal would have more benefit if initiated at the beginning, I am not recommending that existing low edit count users be renamed. They should be grandfathered.

Under the proposal, only 25,000 names would be "used up". If the threshold of 1000 were dropped to 300, the number climbs to only 56,000, leaving over 18.5 million names available. This would virtually eliminate the need for USURPS, would cut back the requests for user rename, and would virtually eliminated the number of rename requests that need to be rejected because a name is already in use someone.

Even if one notes that most of the 18 million do not have a single edit, and the only ones worth worrying about are the 4.3 million with at least one edit,[1] the same conclusions hold.

Evidence of a problem

On 11 October 2017, I picked up an OTRS inquiry from someone who was quite unhappy that they were being accused of vandalism. This turns out to be the dynamic IP range blocking problem which also ought to be addressed, but I mention it to note that the interaction doesn't start off with a happy reader, it starts off with someone who correctly feels they have been unjustly accused.

I explained that I would be happy to create a username for them to solve the problem of getting misdirected vandalism accusations and blocks. They proposed a name, I tried creating it and found out that it was in use. Well, not exactly in use but it had been registered in 2009. Not a single edit ever made but the username is not available.

I wrote back and apologized that their desired username was not available. They gave me a second option. Almost exactly the same problem. In this case their desired username was registered in 2015 and made one edit then abandoned. I'm now in the awkward position of asking a person to come up with a third option for a username. Had my proposal gone into place both of those usernames would be available.

After returning home from a meeting, I saw an email from the individual suggesting a third option.

Also taken. By someone with two edits, both deleted. Yet another example of a name that would still be available if we had enacted a sensible naming process.

References

  1. ^ a b c "Wikipedia Statistics - Tables - English". Stats.wikimedia.org. Retrieved 2013-03-11.