|Related fields and sub-fields|
Categorization is the ability and activity of recognizing shared features or similarities between the elements of the experience of the world (such as objects, events, or ideas), organizing and classifying experience by associating them to a more abstract group (that is, a category, class, or type), on the basis of their traits, features, similarities or other criteria that are universal to the group. Categorization is considered one of the most fundamental cognitive abilities, and as such it is studied particularly by psychology and cognitive linguistics.
Categorization is sometimes considered synonymous with classification (cf., Classification synonyms). Categorization and classification allow humans to organize things, objects, and ideas that exist around them and simplify their understanding of the world. Categorization is something that humans and other organisms do: "doing the right thing with the right kind of thing." The activity of categorizing things can be nonverbal or verbal. For humans, both concrete objects and abstract ideas are recognized, differentiated, and understood through categorization. Objects are usually categorized for some adaptive or pragmatic purposes.
Categorization is grounded in the features that distinguish the category's members from nonmembers. Categorization is important in learning, prediction, inference, decision making, language, and many forms of organisms' interaction with their environments.
Categories are distinct collections of concrete or abstract instances (category members) that are considered equivalent by the cognitive system. Using category knowledge requires one to access mental representations that define the core features of category members (cognitive psychologists refer to these category-specific mental representations as concepts).
To categorization theorists, the categorization of objects is often considered using taxonomies with three hierarchical levels of abstraction. For example, a plant could be identified at a high level of abstraction by simply labeling it a flower, a medium level of abstraction by specifying that the flower is a rose, or a low level of abstraction by further specifying this particular rose as a dog rose. Categories in a taxonomy are related to one another via class inclusion, with the highest level of abstraction being the most inclusive and the lowest level of abstraction being the least inclusive. The three levels of abstraction are as follows:
Main article: Categories (Aristotle)
The classical theory of categorization, is a term used in cognitive linguistics to denote the approach to categorization that appears in Plato and Aristotle and that has been highly influential and dominant in Western culture, particularly in philosophy, linguistics and psychology. Aristotle's categorical method of analysis was transmitted to the scholastic medieval university through Porphyry's Isagoge. The classical view of categories can be summarized into three assumptions: a category can be described as a list of necessary and sufficient features that its membership must have, categories are discrete in that they have clearly defined boundaries (either an element belongs to one or not, with no possibilities in between), and all the members of a category have the same status. (There are no members of the category which belong more than others). In the classical view, categories need to be clearly defined, mutually exclusive and collectively exhaustive; this way, any entity in the given classification universe belongs unequivocally to one, and only one, of the proposed categories.
The classical view of categories first appeared in the context of Western Philosophy in the work of Plato, who, in his Statesman dialogue, introduces the approach of grouping objects based on their similar properties. This approach was further explored and systematized by Aristotle in his Categories treatise, where he analyzes the differences between classes and objects. Aristotle also applied intensively the classical categorization scheme in his approach to the classification of living beings (which uses the technique of applying successive narrowing questions such as "Is it an animal or vegetable?", "How many feet does it have?", "Does it have fur or feathers?", "Can it fly?"...), establishing this way the basis for natural taxonomy.
Examples of the use of the classical view of categories can be found in the western philosophical works of Descartes, Blaise Pascal, Spinoza and John Locke, and in the 20th century in Bertrand Russell, G.E. Moore, the logical positivists. It has been a cornerstone of analytic philosophy and its conceptual analysis, with more recent formulations proposed in the 1990s by Frank Cameron Jackson and Christopher Peacocke. At the beginning of the 20th century, the question of categories was introduced into the empirical social sciences by Durkheim and Mauss, whose pioneering work has been revisited in contemporary scholarship.
The classical model of categorization has been used at least since the 1960s from linguists of the structural semantics paradigm, by Jerrold Katz and Jerry Fodor in 1963, which in turn have influenced its adoption also by psychologists like Allan M. Collins and M. Ross Quillian.
Modern versions of classical categorization theory study how the brain learns and represents categories by detecting the features that distinguish members from nonmembers.
Main article: Prototype theory
The pioneering research by psychologist Eleanor Rosch and colleagues since 1973, introduced the prototype theory, according to which categorization can also be viewed as the process of grouping things based on prototypes. This approach has been highly influential, particularly for cognitive linguistics. It was in part based on previous insights, in particular the formulation of a category model based on family resemblance by Wittgenstein (1953), and by Roger Brown's How shall a thing be called? (1958).
Prototype theory has been then adopted by cognitive linguists like George Lakoff. The prototype theory is an example of a similarity-based approach to categorization, in which a stored category representation is used to assess the similarity of candidate category members. Under the prototype theory, this stored representation consists of a summary representation of the category's members. This prototype stimulus can take various forms. It might be a central tendency that represents the category's average member, a modal stimulus representing either the most frequent instance or a stimulus composed of the most common category features, or, lastly, the "ideal" category member, or a caricature that emphasizes the distinct features of the category. An important consideration of this prototype representation is that it does not necessarily reflect the existence of an actual instance of the category in the world. Furthermore, prototypes are highly sensitive to context. For example, while one's prototype for the category of beverages may be soda or seltzer, the context of brunch might lead them to select mimosa as a prototypical beverage.
The prototype theory claims that members of a given category share a family resemblance, and categories are defined by sets of typical features (as opposed to all members possessing necessary and sufficient features).
Main article: Exemplar theory
Another instance of the similarity-based approach to categorization, the exemplar theory likewise compares the similarity of candidate category members to stored memory representations. Under the exemplar theory, all known instances of a category are stored in memory as exemplars. When evaluating an unfamiliar entity's category membership, exemplars from potentially relevant categories are retrieved from memory, and the entity's similarity to those exemplars is summed to formulate a categorization decision. Medin and Schaffer's (1978) Context model employs a nearest neighbor approach which, rather than summing an entity's similarities to relevant exemplars, multiplies them to provide weighted similarities that reflect the entity's proximity to relevant exemplars. This effectively biases categorization decisions towards exemplars most similar to the to be categorized entity.
Main article: Conceptual clustering
Conceptual clustering is a machine learning paradigm for unsupervised classification that has been defined by Ryszard S. Michalski in 1980. It is a modern variation of the classical approach of categorization, and derives from attempts to explain how knowledge is represented. In this approach, classes (clusters or entities) are generated by first formulating their conceptual descriptions and then classifying the entities according to the descriptions.
Conceptual clustering developed mainly during the 1980s, as a machine paradigm for unsupervised learning. It is distinguished from ordinary data clustering by generating a concept description for each generated category.
Conceptual clustering is closely related to fuzzy set theory, in which objects may belong to one or more groups, in varying degrees of fitness. A cognitive approach accepts that natural categories are graded (they tend to be fuzzy at their boundaries) and inconsistent in the status of their constituent members. The idea of necessary and sufficient conditions is almost never met in categories of naturally occurring things.
Main article: Category learning
While an exhaustive discussion of category learning is beyond the scope of this article, a brief overview of category learning and its associated theories is useful in understanding formal models of categorization.
If categorization research investigates how categories are maintained and used, the field of category learning seeks to understand how categories are acquired in the first place. To accomplish this, researchers often employ novel categories of arbitrary objects (e.g., dot matrices) to ensure that participants are entirely unfamiliar with the stimuli. Category learning researchers have generally focused on two distinct forms of category learning. Classification learning tasks participants with predicting category labels for a stimulus based on its provided features. Classification learning is centered around learning between-category information and the diagnostic features of categories. In contrast, inference learning tasks participants with inferring the presence/value of a category feature based on a provided category label and/or the presence of other category features. Inference learning is centered on learning within-category information and the category's prototypical features.
Category learning tasks can generally be divided into two categories, supervised and unsupervised learning. Supervised learning tasks provide learners with category labels. Learners then use information extracted from labeled example categories to classify stimuli into the appropriate category, which may involve the abstraction of a rule or concept relating observed object features to category labels. Unsupervised learning tasks do not provide learners with category labels. Learners must therefore recognize inherent structures in a data set and group stimuli together by similarity into classes. Unsupervised learning is thus a process of generating a classification structure. Tasks used to study category learning take various forms:
Category learning researchers have proposed various theories for how humans learn categories. Prevailing theories of category learning include the prototype theory, the exemplar theory, and the decision bound theory.
The prototype theory suggests that to learn a category, one must learn the category's prototype. Subsequent categorization of novel stimuli is then accomplished by selecting the category with the most similar prototype.
The exemplar theory suggests that to learn a category, one must learn about the exemplars that belong to that category. Subsequent categorization of a novel stimulus is then accomplished by computing its similarity to the known exemplars of potentially relevant categories and selecting the category that contains the most similar exemplars.
Decision bound theory suggests that to learn a category, one must either learn the regions of a stimulus space associated with particular responses or the boundaries (the decision bounds) that divide these response regions. Categorization of a novel stimulus is then accomplished by determining which response region it is contained within.
Computational models of categorization have been developed to test theories about how humans represent and use category information. To accomplish this, categorization models can be fit to experimental data to see how well the predictions afforded by the model line up with human performance. Based on the model's success at explaining the data, theorists are able to draw conclusions about the accuracy of their theories and their theory's relevance to human category representations.
To effectively capture how humans represent and use category information, categorization models generally operate under variations of the same three basic assumptions. First, the model must make some kind of assumption about the internal representation of the stimulus (e.g., representing the perception of a stimulus as a point in a multi-dimensional space). Second, the model must make an assumption about the specific information that needs to be accessed in order to formulate a response (e.g., exemplar models require the collection of all available exemplars for each category). Third, the model must make an assumption about how a response is selected given the available information.
Though all categorization models make these three assumptions, they distinguish themselves by the ways in which they represent and transform an input into a response representation. The internal knowledge structures of various categorization models reflect the specific representation(s) they use to perform these transformations. Typical representations employed by models include exemplars, prototypes, and rules.
Weighted Features Prototype Model An early instantiation of the prototype model was produced by Reed in the early 1970's. Reed (1972) conducted a series of experiments to compare the performance of 18 models on explaining data from a categorization task that required participants to sort faces into one of two categories. Results suggested that the prevailing model was the weighted features prototype model, which belonged to the family of average distance models. Unlike traditional average distance models, however, this model differentially weighted the most distinguishing features of the two categories. Given this model's performance, Reed (1972) concluded that the strategy participants used during the face categorization task was to construct prototype representations for each of the two categories of faces and categorize test patterns into the category associated with the most similar prototype. Furthermore, results suggested that similarity was determined by each categories most discriminating features.
Generalized Context Model Medin and Schaffer's (1978) context model was expanded upon by Nosofsky (1986) in the mid-1980's, resulting in the production of the Generalized Context Model (GCM). The GCM is an exemplar model that stores exemplars of stimuli as exhaustive combinations of the features associated with each exemplar. By storing these combinations, the model establishes contexts for the features of each exemplar, which are defined by all other features with which that feature co-occurs. The GCM computes the similarity of an exemplar and a stimulus in two steps. First, the GCM computes the psychological distance between the exemplar and the stimulus. This is accomplished by summing the absolute values of the dimensional difference between the exemplar and the stimulus. For example, suppose an exemplar has a value of 18 on dimension X and the stimulus has a value of 42 on dimension X; the resulting dimensional difference would be 24. Once psychological distance has been evaluated, an exponential decay function determines the similarity of the exemplar and the stimulus, where a distance of 0 results in a similarity of 1 (which begins to decrease exponentially as distance increases). Categorical responses are then generated by evaluating the similarity of the stimulus to each category's exemplars, where each exemplar provides a "vote" to their respective categories that varies in strength based on the exemplar's similarity to the stimulus and the strength of the exemplar's association with the category. This effectively assigns each category a selection probability that is determined by the proportion of votes it receives, which can then be fit to data.
RULEX (Rule-Plus-Exception) Model While simple logical rules are ineffective at learning poorly defined category structures, some proponents of the rule-based theory of categorization suggest that an imperfect rule can be used to learn such category structures if exceptions to that rule are also stored and considered. To formalize this proposal, Nosofsky and colleagues (1994) designed the RULEX model. The RULEX model attempts to form a decision tree composed of sequential tests of an object's attribute values. Categorization of the object is then determined by the outcome of these sequential tests. The RULEX model searches for rules in the following ways:
The method that RULEX uses to perform these searches is as follows: First, RULEX attempts an exact search. If successful, then RULEX will continuously apply that rule until misclassification occurs. If the exact search fails to identify a rule, either an imperfect or conjunctive search will begin. A sufficient, though imperfect, rule acquired during one of these search phases will become permanently implemented and the RULEX model will then begin to search for exceptions. If no rule is acquired, then the model will attempt the search it did not perform in the previous phase. If successful, RULEX will permanently implement the rule and then begin an exception search. If none of the previous search methods are successful RULEX will default to only searching for exceptions, despite lacking an associated rule, which equates to acquiring a random rule.
SUSTAIN (Supervised and Unsupervised Stratified Adaptive Incremental Network) It is often the case that learned category representations vary depending on the learner's goals, as well as how categories are used during learning. Thus, some categorization researchers suggest that a proper model of categorization needs to be able to account for the variability present in the learner's goals, tasks, and strategies. This proposal was realized by Love and colleagues (2004) through the creation of SUSTAIN, a flexible clustering model capable of accommodating both simple and complex categorization problems through incremental adaptation to the specifics of problems.
In practice, the SUSTAIN model first converts a stimulus' perceptual information into features that are organized along a set of dimensions. The representational space that encompasses these dimensions is then distorted (e.g., stretched or shrunk) to reflect the importance of each feature based on inputs from an attentional mechanism. A set of clusters (specific instances grouped by similarity) associated with distinct categories then compete to respond to the stimulus, with the stimulus being subsequently assigned to the cluster whose representational space is closest to the stimulus'. The unknown stimulus dimension value (e.g., category label) is then predicted by the winning cluster, which, in turn, informs the categorization decision.
The flexibility of the SUSTAIN model is realized through its ability to employ both supervised and unsupervised learning at the cluster level. If SUSTAIN incorrectly predicts a stimulus as belonging to a particular cluster, corrective feedback (i.e., supervised learning) would signal sustain to recruit an additional cluster that represents the misclassified stimulus. Therefore, subsequent exposures to the stimulus (or a similar alternative) would be assigned to the correct cluster. SUSTAIN will also employ unsupervised learning to recruit an additional cluster if the similarity between the stimulus and the closest cluster does not exceed a threshold, as the model recognizes the weak predictive utility that would result from such a cluster assignment. SUSTAIN also exhibits flexibility in how it solves both simple and complex categorization problems. Outright, the internal representation of SUSTAIN contains only a single cluster, thus biasing the model towards simple solutions. As problems become increasingly complex (e.g., requiring solutions consisting of multiple stimulus dimensions), additional clusters are incrementally recruited so SUSTAIN can handle the rise in complexity.
Social categorization consists of putting human beings into groups in order to identify them based on different criteria. Categorization is a process studied by scholars in cognitive science but can also be studied as a social activity. Social categorization is different from the categorization of other things because it implies that people create categories for themselves and others as human beings. Groups can be created based on ethnicity, country of origin, religion, sexual identity, social privileges, economic privileges, etc. Various ways to sort people exist according to one's schemas. People belong to various social groups because of their ethnicity, religion, or age.
Social categories based on age, race, and gender are used by people when they encounter a new person. Because some of these categories refer to physical traits, they are often used automatically when people don't know each other. These categories are not objective and depend on how people see the world around them. They allow people to identify themselves with similar people and to identify people who are different. They are useful in one's identity formation with the people around them. One can build their own identity by identifying themselves in a group or by rejecting another group.
Social categorization is similar to other types of categorization since it aims at simplifying the understanding of people. However, creating social categories implies that people will position themselves in relation to other groups. A hierarchy in group relations can appear as a result of social categorization.
Scholars argue that the categorization process starts at a young age when children start to learn about the world and the people around them. Children learn how to know people according to categories based on similarities and differences. Social categories made by adults also impact their understanding of the world. They learn about social groups by hearing generalities about these groups from their parents. They can then develop prejudices about people as a result of these generalities.
Another aspect about social categorization is mentioned by Stephen Reicher and Nick Hopkins and is related to political domination. They argue that political leaders use social categories to influence political debates.
The activity of sorting people according to subjective or objective criteria can be seen as a negative process because of its tendency to lead to violence from a group to another. Indeed, similarities gather people who share common traits but differences between groups can lead to tensions and then the use of violence between those groups. The creation of social groups by people is responsible of a hierarchization of relations between groups. These hierarchical relations participate in the promotion of stereotypes about people and groups, sometimes based on subjective criteria. Social categories can encourage people to associate stereotypes to groups of people. Associating stereotypes to a group, and to people who belong to this group, can lead to forms of discrimination towards people of this group. The perception of a group and the stereotypes associated with it have an impact on social relations and activities.
Some social categories have more weight than others in society. For instance, in history and still today, the category of "race" is one of the first categories used to sort people. However, only a few categories of race are commonly used such as "Black", "White", "Asian" etc. It participates in the reduction of the multitude of ethnicities to a few categories based mostly on people's skin color.
The process of sorting people creates a vision of the other as ‘different’, leading to the dehumanization of people. Scholars talk about intergroup relations with the concept of social identity theory developed by H. Tajfel. Indeed, in history, many examples of social categorization have led to forms of domination or violence from a dominant group to a dominated group. Periods of colonisation are examples of times when people from a group chose to dominate and control other people belonging to other groups because they considered them as inferior. Racism, discrimination and violence are consequences of social categorization and can occur because of it. When people see others as different, they tend to develop hierarchical relation with other groups.
There cannot be categorization without the possibility of miscategorization. To do "the right thing with the right kind of thing.", there has to be both a right and a wrong thing to do. Not only does a category of which "everything" is a member lead logically to the Russell paradox ("is it or is it not a member of itself?"), but without the possibility of error, there is no way to detect or define what distinguishes category members from nonmembers.
An example of the absence of nonmembers is the problem of the poverty of the stimulus in language learning by the child: children learning the language do not hear or make errors in the rules of Universal Grammar (UG). Hence they never get corrected for errors in UG. Yet children's speech obeys the rules of UG, and speakers can immediately detect that something is wrong if a linguist generates (deliberately) an utterance that violates UG. Hence speakers can categorize what is UG-compliant and UG-noncompliant. Linguists have concluded from this that the rules of UG must be somehow encoded innately in the human brain.
Ordinary categories, however, such as "dogs," have abundant examples of nonmembers (cats, for example). So it is possible to learn, by trial and error, with error-correction, to detect and define what distinguishes dogs from non-dogs, and hence to correctly categorize them. This kind of learning, called reinforcement learning in the behavioral literature and supervised learning in the computational literature, is fundamentally dependent on the possibility of error, and error-correction. Miscategorization—examples of nonmembers of the category—must always exist, not only to make the category learnable, but for the category to exist and be definable at all.