|This is a WikiProject, an area for focused collaboration among Wikipedians. New participants are welcome; please feel free to participate!
|Find a language|
|Enter an ISO 639 code to find the corresponding language article|
(Please report any incorrect links
to this talk page)
This WikiProject aims primarily to provide a consistent treatment of each human language on Wikipedia. Many languages already have extensive pages, and the systematic information on those pages is not presented in a consistent way. The purpose of this WikiProject is to present that information consistently, and to ensure that each of the major areas is covered at least briefly for each language.
These are only suggestions, things to give you focus and to get you going, and you shouldn't feel obligated in the least to follow them. However, try to stick to the format for the Infobox for each language. See the template for an example Infobox.
The easiest way to get started writing for a language that doesn't already have an article or to convert an article to the WikiProject format is to start with the template.
Articles for deletion
Categories for discussion
Templates for discussion
Articles to be merged
Articles to be split
Articles for creation
Featured articles marked in bold have appeared on the Main Page.
Languages and language families
Formerly recognized content
((WikiProject Languages)) project banner template on the talk pages of any language-related articles. To rate the article on the quality scale, add one of the following parameters:
class=FAfor featured articles
class=Afor A-class articles
class=GAfor good articles
class=Bfor B-class articles
class=startfor Start-class articles
class=stubfor Stub-class articles (which may not necessarily have a "stub" message on them!)
class=NAfor non-articles (templates, images, etc.)
See WP:GRADES for pointers on classification.
· Statistics · Log
|Language articles by quality and importance|
|WikiWork factors (?)||ω = 72,191||Ω = 5.46|
Further information: Category:Wikipedia lists of language names from common sources
The guidelines for article titles for languages are at Wikipedia:Naming conventions (languages). In short, most language articles should be titled XXX language. Reasons for this recommendation:
When there is nothing to disambiguate a language name from, such as Hindi, Esperanto or Inuktitut, there is no need for the "language".
Whether the varieties of Arabic and Chinese should be called "languages" or "dialects" continues to be a highly controversial issue. The current convention is: use NAME + Arabic for Arabic varieties (e.g. Egyptian Arabic) and NAME + Chinese for Chinese varieties (e.g. Mandarin Chinese). Infoboxes are put at both Arabic language and Chinese language and at their first-level subdivisions. However, where there is little controversy that a variety of Arabic or Chinese is a dialect (when it is demonstrably intelligible to other dialects), then 'dialect' is acceptable in the title.
Even in cases in which there is a consensus that varieties of a language have a dialect status, the number and divisions between such dialects are often vaguely-defined, and controversies exist among dialectologists over whether certain varieties should be treated in a unified way or are best understood as separate though related varieties. Separate articles should only be written on varieties (e.g., Estuary English) or related groups of varieties (e.g., Hispanic English) that have been well-enough studied by linguists that at least a minimal body of literature exists about that variety or group of varieties, as a distinct dialect or group of dialects. Phonological, morphosyntactic, or lexical variation that may be considered subdialectal should be noted as "differences within X dialect,", where X is a dialect as discussed in the relevant literature. Controversies over dialect status can be noted in articles as such, but should also be based on citable work. Names used to refer to that dialect in the title should be preferred over folk-linguistic terms (e.g., Inland North versus Midwestern Accent).
If you would like to create an article on a new language, you can use ((subst:New language article)) to help streamline the process. An example structure and explanation of the sections can be found at /Template for oral languages and /Template (sign language) for sign languages. Language articles are subject to Wikipedia's inclusion criteria.
Population data has been mostly updated from Ethnologue 16 to 17. However, an unknown number of articles which did not have the ref field set to "e16" slipped through the cracks; an example is Cumanagoto, which did not have a ref'd population figure because E16 had mistakenly listed it as extinct. Articles which are not ref'd to Ethnologue could be checked in case E17 has a more recent figure.
User:PotatoBot helps keep ISO redirects in sync with changing WP articles and ISO standards. The results of the latest run are displayed at ISO 639 log and ISO 639 language articles missing.
Names at Spurious_languages#Spurious_according_to_Glottolog with asterisks have not been addressed.
Red links should either be redirected or have their own articles.
Articles with red links
99.9% of ISO language names have articles, though not always one-to-one (e.g. Fulani, Zhuang, and Mazatec); the 0.01% which do not are spurious, dubious, or insufficiently attested to justify their own article, and are redirected to an article stating that.
The lists below are of self-links in our articles, language names from various sources which do not have articles or redirects, and suspicious cases to keep track of.
Lists of obscure names from common refs
Circular and suspicious links
Cases to track
Images for articles in Category:Wikipedia requested photographs of languages.
(no article Ashéninka people; Keres functions as the lang article but reads as a family article)
Only language varieties are included here. Subjects such as 'French language in Jordan' and 'Westernized Chinese language', though in bad shape, are not listed because they would not be representative of the many unreferenced articles that are not about specific varieties.
(same search terms as missing sources)
The following ISO requests for new languages from previous years were still open in 2016 Jan. The articles should be updated if they are accepted. (See the current list, reviewed to 2021-02.)
Old open ISO change requests
2020-039 tki Iraqi Turkman language 2020-009 nww Ndwewe language 2019-007 rrm Moriori language 2011-041 vsn Vedic Sanskrit 2009-081 elr Katharevousa Greek 2009-060 ecg Ecclesiastical Greek 2006-084 gkm Medieval Greek
including WP:AFD, WP:PROD and other processes
The following are language articles which come under repeated POV attack, often for ethnic or nationalistic reasons. Feel free to add ones you've noticed, and to remove languages which have not been a problem for some time. That way, if one of us drops out from editing, the articles we've been watching hopefully won't go to pot.
Ethnologue has long been the default source for language data on WP, despite its often poor referencing. It was the only global reference that was freely available online when the majority of WP language articles were created, but since has become a very expensive pay site (2,400 US$/year with maps as of 2021). For those editors who do not have access to Ethnologue (and perhaps also for those who do), a combination of Glottolog, for classification and for general sourcing, and the Endangered Languages Project, for demographic data, is probably the most reliable default combination of free online sources, though there are also reliable specialized sites such as AIATSIS for Australia. Linguist List/MultiTree maintains some value for long-extinct languages.
[Note, 2021-01-03: Links to ELP should be added to all relevant infoboxes this year. Upon request, ELP has sent us an ELP-to-ISO/LL code mapping, so we're just waiting for approval for a bot to add the links.]
There are several advantages to Ethnologue: for many languages, it's the only demographic data we have; for others, it provides a check on the politicization and population inflation that we experience when we allow advocates of a language to cherry-pick sources. Nonetheless, Ethnologue data needs to be carefully evaluated. Beside the now prohibitive cost, there are a few common and serious problems:
Such problems are understandable: Ethnologue is an enormous project with a very small editorial team. For years, Ethnologue had a reputation for being unresponsive, so many linguists do not bother to correct the errors they find, but since ca. 2012 they have been appreciative of feedback, and the quality of their coverage has improved markedly. Nonetheless, Ethnologue's sources (when they can be identified) should be checked for the accuracy of its claims whenever possible, and other sources used when available and Ethnologue's sources cannot be identified.
Glottolog is a reliably cited and well-researched alternative to Ethnologue. Apart from not covering demographics, it does a generally superior job, for instance in verifying and updating the classifications it adopts, in marking languages as 'spurious' when they cannot be verified to exist, and most importantly in citing its sources both for the languages and for their classifications. But it is largely the work of a single person (Harald Hammarström), and he has not had the time to improve on Ethnologue for all the languages of the world, so in some cases Glottolog is not (yet) an independent source. In most cases Hammarström has personally vetted the sources, even to the extent of doing his own comparison of the raw lexical or morphological data to evaluate which classification is the most accurate, though it may take some digging for the reader to determine all his evidence. He does however not distinguish whether a language with no known relatives is an isolate (a family of one) or simply unclassified due to lack of data or research, listing all such cases as 'isolates'. Maps are included, but the locations are points rather than areas as in Ethnologue (not that the areas in Ethnologue are necessarily accurate), and in some cases appear to be offset from where the language is actually spoken (all points on the map shifted by seemingly the same amount and direction, a problem that besets our automated location maps as well). Finally, Glottolog should not be relied on for dialects, as they were copied wholesale from MultiTree without verification and are often spurious. Only in a few cases has Glottolog since evaluated dialect data. (Dialects are typeset in italics, languages in boldface. Where the Glottolog dialects differ from those of MultiTree, they are likely to be Hammarström's or a colleague's work and thus reliable.)
The Endangered Languages Project does not attempt to include all the world's languages (it ignores languages with millions of speakers, for example, and as of 2021 doesn't cover some poorly documented areas of the world), but as of 2021 it has articles on 3585 languages/lects, 285 without ISO codes. ELP concentrates on demographic data, and tries to provide the most recent reliable sources for speaker population, transmission rate, bilingualism, etc., and so nicely complements Glottolog. In some cases it provides the date of the data, not just the date of its publication. Non-demographic data is minimal, and (like Glottolog) the maps show the languages as points rather than areas. (E.g. Comorian, where the location dot is in the middle of the ocean between the islands.). There are indications that some of the data has been input by people who don't understand it, such as locations of African languages being on the wrong side of the continent. It should therefore be used for its references rather than as a reliable source in its own right.
AIATSIS presents data from multiple sources for the indigenous languages spoken within the national borders of Australia. Its primarily focus is on identifying the many names found in the literature, resolving synonyms and ambiguities, evaluating whether putative lects can be confirmed to be distinct languages or dialects, and identifying which names might benefit from further investigation of archival sources.
Linguist List / MultiTree is a former undergrad student project that includes a large number of language names not found in Ethnologue, but their identification is highly unreliable, and can often be seen to be spurious with even a cursory glance at the literature. Since the creation of Glottolog they are no longer of much value as a source of references for living languages, though they do provide some informative expert summaries of the literature for long-extinct languages.
ISO 639-3 is only a reliable source for ISO codes and names. It should not be relied on for preferred names or spellings, whether a lect is a distinct language or a dialect, or whether it is still spoken. For example, despite its stated ideal of distinguishing languages by mutual intelligibility, for political reasons ISO maintains separate 639-3 codes for Serbian and Croatian, Urdu and Modern Standard Hindi, and Malaysian and Indonesian, despite private acknowledgements that doing so violates their stated aim. (Such pluricentric distinctions would be better maintained at ISO 639-2.) However, because ISO 639-3 codes are widely used to identify languages, WP language articles should include the ISO 639-3 name in the lead or a dedicated section if they use something different for the article name, and we should created redirects for those ISO names and for the codes themselves.
Global Recordings Network copies much of its data from Ethnologue, misidentifies alternative names as languages, and contradicts itself with speaker numbers.
((interlinear))for aligning interlinear glossing
((IPA))to format IPA,
((IPAc-en))to convert ASCII input to normalized IPA for English, etc. (we have IPA templates for many other languages, which link to a pronunciation key)
((Infobox country languages))
((Infobox language family))
((Infobox language game))
((WikiProject Languages)) to talk pages of relevant articles. Articles with this template are put into Category:WikiProject Languages articles.
Language stubs should be tagged with the most appropriate template of these:
After you sign up, you can add the project userbox to your user page by adding the following:
((User WikiProject Languages)). Your username will then automatically be added to the Category:WikiProject Language members.
This WikiProject is a descendant of WikiProject Linguistics. It has descendants of its own, most of which aren't particularly active at present.
If you'd like to help out, be contacted by others interested in this WikiProject's subject, and receive task assignments and project-related updates on your talk page, please add your name here:
|Click on "►" below to display subcategories:|