English grammar
Part of a series on

Morphology Plurals Prefixes (in English) Suffixes (frequentative)
Word types Acronyms Adjectives Adverbs (flat) Articles Coordinators Compounds Demonstratives Determiners (List here) Expletives Intensifier Interjections Interrogatives Nouns Portmanteaus Possessives Prepositions (List here) Pronouns (case · person) Subordinators Verbs
Verbs Auxiliary verbs Mood (conditional · imperative · subjunctive) Aspect (continuous · habitual · perfect) -ing Irregular verbs Modal verbs Passive voice Phrasal verbs Verb usage Transitive and intransitive verbs
Syntax Clauses (in English) Conditional sentences Copula Do-support Inversion Periphrasis Zero-marking
Orthography Abbreviations Capitalization Comma Hyphen
Variant usage African-American Vernacular English AmE and BrE grammatical differences Double negatives Grammar disputes Thou
v t e

In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words that make it up. This contrasts with an idiom, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated.

There are about seven main types of collocations: adjective + noun, noun + noun (such as collective nouns), noun + verb, verb + noun, adverb + adjective, verbs + prepositional phrase (phrasal verbs), and verb + adverb.

Collocation extraction is a computational technique that finds collocations in a document or corpus, using various computational linguistics elements resembling data mining.

Expanded definition

Collocations are partly or fully fixed expressions that become established through repeated context-dependent use. Such terms as crystal clear, middle management, nuclear family, and cosmetic surgery are examples of collocated pairs of words.

Collocations can be in a syntactic relation (such as verb–object: make and decision), lexical relation (such as antonymy), or they can be in no linguistically defined relation. Knowledge of collocations is vital for the competent use of a language: a grammatically correct sentence will stand out as awkward if collocational preferences are violated. This makes collocation an interesting area for language teaching.

Corpus linguists specify a key word in context (KWIC) and identify the words immediately surrounding them. This gives an idea of the way words are used.

The processing of collocations involves a number of parameters, the most important of which is the measure of association, which evaluates whether the co-occurrence is purely by chance or statistically significant. Due to the non-random nature of language, most collocations are classed as significant, and the association scores are simply used to rank the results. Commonly used measures of association include mutual information, t scores, and log-likelihood.^[1]^[2]

Rather than select a single definition, Gledhill^[3] proposes that collocation involves at least three different perspectives: co-occurrence, a statistical view, which sees collocation as the recurrent appearance in a text of a node and its collocates;^[4]^[5]^[6] construction, which sees collocation either as a correlation between a lexeme and a lexical-grammatical pattern,^[7] or as a relation between a base and its collocative partners;^[8] and expression, a pragmatic view of collocation as a conventional unit of expression, regardless of form.^[9]^[10] These different perspectives contrast with the usual way of presenting collocation in phraseological studies. Traditionally speaking, collocation is explained in terms of all three perspectives at once, in a continuum:

Free combination ↔ bound collocation ↔ frozen idiom

In dictionaries

In 1933, Harold Palmer's Second Interim Report on English Collocations highlighted the importance of collocation as a key to producing natural-sounding language, for anyone learning a foreign language.^[11] Thus from the 1940s onwards, information about recurrent word combinations became a standard feature of monolingual learner's dictionaries. As these dictionaries became "less word-centred and more phrase-centred",^[12] more attention was paid to collocation. This trend was supported, from the beginning of the 21st century, by the availability of large text corpora and intelligent corpus-querying software, making it possible to provide a more systematic account of collocation in dictionaries. Using these tools, dictionaries such as the Macmillan English Dictionary and the Longman Dictionary of Contemporary English included boxes or panels with lists of frequent collocations.^[13]

There are also a number of specialized dictionaries devoted to describing the frequent collocations in a language.^[14] These include (for Spanish) Redes: Diccionario combinatorio del español contemporaneo (2004), (for French) Le Robert: Dictionnaire des combinaisons de mots (2007), and (for English) the LTP Dictionary of Selected Collocations (1997) and the Macmillan Collocations Dictionary (2010).^[15]

Statistically significant collocation

Student's t-test can be used to determine whether the occurrence of a collocation in a corpus is statistically significant.^[16] For a bigram ${\displaystyle w_{1}w_{2))$ , let $P(w_{1})={\frac {\#w_{1)){N))$ be the unconditional probability of occurrence of ${\displaystyle w_{1))$ in a corpus with size $N$ , and let $P(w_{2})={\frac {\#w_{2)){N))$ be the unconditional probability of occurrence of ${\displaystyle w_{2))$ in the corpus. The t-score for the bigram ${\displaystyle w_{1}w_{2))$ is calculated as:

t={\frac ((\bar {x))-\mu }{\sqrt {\frac {s^{2)){N)))),

where ${\bar {x))={\frac {\#w_{i}w_{j)){N))$ is the sample mean of the occurrence of ${\displaystyle w_{1}w_{2))$ , ${\displaystyle \#w_{1}w_{2))$ is the number of occurrences of ${\displaystyle w_{1}w_{2))$ , $\mu =P(w_{i})P(w_{j})$ is the probability of ${\displaystyle w_{1}w_{2))$ under the null-hypothesis that ${\displaystyle w_{1))$ and ${\displaystyle w_{2))$ appear independently in the text, and $s^{2}={\bar {x))(1-{\bar {x)))\approx {\bar {x))$ is the sample variance. With a large $N$ , the t-test is equivalent to a Z-test.

References

External links

Authority control databases: National

Expanded definition

In dictionaries

Statistically significant collocation

See also

References

External links