Note

The datasets below are old (2006-7), tiny, and not useful except as a historical reference.

Random article survey

I was bored waiting for my very slow program to run, so I clicked "random article" 250 times and kept track of what kinds of articles popped up. 48 articles (19.2%) were stubs or had at least one cleanup tag. (I tried to count "citation needed" as a cleanup tag but may have missed a few.) The results as of 11 Nov 2006:

Type of article Number Percent of sample
Biography 60 24%
Places/geographical locations 34 13.6%
TV shows/movies 17 6.8%
Disambiguation 15 6%
Music/bands/albums 14 5.6%
Company/product/service 13 5.2%
History/war 12 4.8%
Politics/government 9 3.6%
Sports 8 3.2%
Organisms 8 3.2%
Definitions/common phrases/common objects 7 2.8%
Architecture/buildings 7 2.8%
Mythology/religion 5 2%
Astronomy/physics/space science 5 2%
Software/computing 5 2%
Games (including video) 4 1.6%
Literature/publications 4 1.6%
Biology/medicine 3 1.2%
Food/drink 3 1.2%
Schools 3 1.2%
Math 2 0.8%
Nonsense/unclassifiable 2 0.8%
Visual arts 2 0.8%
Philosophy/ethics 2 0.8%
Linguistics/languages 2 0.8%
Charities/nonprofit organizations 2 0.8%
Economics/finance 1 0.4%
Deleted and protected 1 0.4%

"Biography" is probably a bit overinflated because I classified everything about an individual real person as a biography, including historical figures. Articles about fictional characters went in the category of the corresponding fiction (TV, myth, etc.)

Obviously this is a lousy way to determine Wikipedia coverage - 250 articles is a tiny sample. But the advantage over, say, counting category populations is that this avoids duplicate-counting of articles in multiple categories and can find articles that are un- or miscategorized. Special:Random also (as far as I know) excludes recently created articles that haven't yet been indexed, which filters out lots of nonsense speedy candidates. I don't think Special:Random would exclude deletion candidates, but none of these had prod or AfD templates.

First-glance observations:

Recent mainspace changes survey

Inspired by Wikipedia:Wikipedia is failing and User:Worldtraveller/Wikipedia is failing (NB: leaving the redlink, in case further moves occur), I looked at a sample of 250 mainspace edits covering a time span of 04:43 to 04:46 UTC on 18 Feb 2007. (It would be interesting to gather these statistics again at a time when US schools are in session.) In this sample there were 159 edits by registered users, 89 edits by anonymous users, and 2 edits to a subsequently deleted image description page. Thus the percentages below take 248 edits as the total sample.

Change type Percent of total sample (n = 248) Percent by registered editors (n = 248) Percent by anonymous editors (n = 248) Percent of all registered edits (n = 159) Percent of all anonymous edits (n = 89)
Substantial content changes 5.2% 4.0% 1.2% 6.3% 3.4%
Minor content changes 28.6% 17.3% 11.3% 27.0% 31.5%
Copyediting/formatting/wikilinking 40.7% 27.4% 13.3% 42.8% 37.1%
Tagging/maintenance 8.5% 6.5% 2.0% 10.1% 5.6%
Vandalism reversion 8.9% 7.3% 1.6% 11.3% 4.5%
Vandalism 8.1% 1.6% 6.5% 2.5% 18.0%

Other than determining whether an edit was vandalism, I did not make any value judgments. Thus, 'minor content changes' contains considerable amounts of unsourced material and original research that will certainly be reverted.

Other observations:

General thoughts: