Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed.[1] It is sometimes referred to as the selection effect. The phrase "selection bias" most often refers to the distortion of a statistical analysis, resulting from the method of collecting samples. If the selection bias is not taken into account, then some conclusions of the study may be false.


Sampling bias

Sampling bias is systematic error due to a non-random sample of a population,[2] causing some members of the population to be less likely to be included than others, resulting in a biased sample, defined as a statistical sample of a population (or non-human factors) in which all participants are not equally balanced or objectively represented.[3] It is mostly classified as a subtype of selection bias,[4] sometimes specifically termed sample selection bias,[5][6][7] but some classify it as a separate type of bias.[8]

A distinction of sampling bias (albeit not a universally accepted one) is that it undermines the external validity of a test (the ability of its results to be generalized to the rest of the population), while selection bias mainly addresses internal validity for differences or similarities found in the sample at hand. In this sense, errors occurring in the process of gathering the sample or cohort cause sampling bias, while errors in any process thereafter cause selection bias.

Examples of sampling bias include self-selection, pre-screening of trial participants, discounting trial subjects/tests that did not run to completion and migration bias by excluding subjects who have recently moved into or out of the study area, length-time bias, where slowly developing disease with better prognosis is detected, and lead time bias, where disease is diagnosed earlier participants than in comparison populations, although the average course of disease is the same.

Time interval





Attrition bias is a kind of selection bias caused by attrition (loss of participants),[13] discounting trial subjects/tests that did not run to completion. It is closely related to the survivorship bias, where only the subjects that "survived" a process are included in the analysis or the failure bias, where only the subjects that "failed" a process are included. It includes dropout, nonresponse (lower response rate), withdrawal and protocol deviators. It gives biased results where it is unequal in regard to exposure and/or outcome. For example, in a test of a dieting program, the researcher may simply reject everyone who drops out of the trial, but most of those who drop out are those for whom it was not working. Different loss of subjects in intervention and comparison group may change the characteristics of these groups and outcomes irrespective of the studied intervention.[13]

Lost to follow-up, is another form of Attrition bias, mainly occurring in medicinal studies over a lengthy time period. Non-Response or Retention bias can be influenced by a number of both tangible and intangible factors, such as; wealth, education, altruism, initial understanding of the study and it's requirements.[14] Researcher's may also be incapable of conducting follow-up contact resulting from inadequate identifying information and contact details collected during the initial recruitment and research phase.[15]

Observer selection

Philosopher Nick Bostrom has argued that data are filtered not only by study design and measurement, but by the necessary precondition that there has to be someone doing a study. In situations where the existence of the observer or the study is correlated with the data, observation selection effects occur, and anthropic reasoning is required.[16]

An example is the past impact event record of Earth: if large impacts cause mass extinctions and ecological disruptions precluding the evolution of intelligent observers for long periods, no one will observe any evidence of large impacts in the recent past (since they would have prevented intelligent observers from evolving). Hence there is a potential bias in the impact record of Earth.[17] Astronomical existential risks might similarly be underestimated due to selection bias, and an anthropic correction has to be introduced.[18]

Volunteer Bias

Self-selection bias or a volunteer bias in studies offer further threats to the validity of a study as these participants may have intrinsically different characteristics from the target population of the study.[19] Studies have shown that volunteers tend to come from a higher social standing than from a lower socio-economic background.[20] Further, to this another study shows that women are more probable to volunteer for studies than males. Volunteer bias is evident throughout the study life-cycle, from recruitment to follow-ups. More generally speaking volunteer response can be put down to individual altruism, a desire for approval, personal relation to the study topic and other reasons.[20][14] As with most instances mitigation in the case of volunteer bias is an increased sample size.[citation needed]


In the general case, selection biases cannot be overcome with statistical analysis of existing data alone, though Heckman correction may be used in special cases. An assessment of the degree of selection bias can be made by examining correlations between exogenous (background) variables and a treatment indicator. However, in regression models, it is correlation between unobserved determinants of the outcome and unobserved determinants of selection into the sample which bias estimates, and this correlation between unobservables cannot be directly assessed by the observed determinants of treatment.[21]

When data are selected for fitting or forecast purposes, a coalitional game can be set up so that a fitting or forecast accuracy function can be defined on all subsets of the data variables.

Related issues

Selection bias is closely related to:

See also


  1. ^ Dictionary of Cancer Terms → selection bias. Retrieved on September 23, 2009.
  2. ^ Medical Dictionary - 'Sampling Bias' Retrieved on September 23, 2009
  3. ^ TheFreeDictionary → biased sample. Retrieved on 2009-09-23. Site in turn cites: Mosby's Medical Dictionary, 8th edition.
  4. ^ Dictionary of Cancer Terms → Selection Bias. Retrieved on September 23, 2009.
  5. ^ Ards, Sheila; Chung, Chanjin; Myers, Samuel L. (1998). "The effects of sample selection bias on racial differences in child abuse reporting". Child Abuse & Neglect. 22 (2): 103–115. doi:10.1016/S0145-2134(97)00131-2. PMID 9504213.
  6. ^ Cortes, Corinna; Mohri, Mehryar; Riley, Michael; Rostamizadeh, Afshin (2008). Sample Selection Bias Correction Theory (PDF). Algorithmic Learning Theory. Lecture Notes in Computer Science. 5254. pp. 38–53. arXiv:0805.2775. CiteSeerX doi:10.1007/978-3-540-87987-9_8. ISBN 978-3-540-87986-2. S2CID 842488.
  7. ^ Cortes, Corinna; Mohri, Mehryar (2014). "Domain adaptation and sample bias correction theory and algorithm for regression" (PDF). Theoretical Computer Science. 519: 103–126. CiteSeerX doi:10.1016/j.tcs.2013.09.027.
  8. ^ Fadem, Barbara (2009). Behavioral Science. Lippincott Williams & Wilkins. p. 262. ISBN 978-0-7817-8257-9.
  9. ^ a b Feinstein AR; Horwitz RI (November 1978). "A critique of the statistical evidence associating estrogens with endometrial cancer". Cancer Res. 38 (11 Pt 2): 4001–5. PMID 698947.
  10. ^ Tamim H; Monfared AA; LeLorier J (March 2007). "Application of lag-time into exposure definitions to control for protopathic bias". Pharmacoepidemiol Drug Saf. 16 (3): 250–8. doi:10.1002/pds.1360. PMID 17245804. S2CID 25648490.
  11. ^ Matthew R. Weir (2005). Hypertension (Key Diseases) (Acp Key Diseases Series). Philadelphia, Pa: American College of Physicians. p. 159. ISBN 978-1-930513-58-7.
  12. ^ Kruskal, William H. (1960). "Some Remarks on Wild Observations". Technometrics. 2 (1): 1–3. doi:10.1080/00401706.1960.10489875.
  13. ^ a b Jüni, P.; Egger, Matthias (2005). "Empirical evidence of attrition bias in clinical trials". International Journal of Epidemiology. 34 (1): 87–88. doi:10.1093/ije/dyh406. PMID 15649954.
  14. ^ a b Jordan, Sue; Watkins, Alan; Storey, Mel; Allen, Steven J.; Brooks, Caroline J.; Garaiova, Iveta; Heaven, Martin L.; Jones, Ruth; Plummer, Sue F.; Russell, Ian T.; Thornton, Catherine A. (2013-07-09). "Volunteer Bias in Recruitment, Retention, and Blood Sample Donation in a Randomised Controlled Trial Involving Mothers and Their Children at Six Months and Two Years: A Longitudinal Analysis". PLOS ONE. 8 (7): e67912. Bibcode:2013PLoSO...867912J. doi:10.1371/journal.pone.0067912. ISSN 1932-6203. PMC 3706448. PMID 23874465.
  15. ^ Small, W. P. (1967-05-06). "Lost to Follow-Up". The Lancet. Originally published as Volume 1, Issue 7497. 289 (7497): 997–999. doi:10.1016/S0140-6736(67)92377-X. ISSN 0140-6736. PMID 4164620.
  16. ^ Bostrom, Nick (2002). Anthropic Bias: Observation Selection Effects in Science and Philosophy. New York: Routledge. ISBN 978-0-415-93858-7.
  17. ^ Ćirković, M. M.; Sandberg, A.; Bostrom, N. (2010). "Anthropic Shadow: Observation Selection Effects and Human Extinction Risks". Risk Analysis. 30 (10): 1495–506. doi:10.1111/j.1539-6924.2010.01460.x. PMID 20626690.
  18. ^ Tegmark, M.; Bostrom, N. (2005). "Astrophysics: Is a doomsday catastrophe likely?". Nature. 438 (7069): 754. Bibcode:2005Natur.438..754T. doi:10.1038/438754a. PMID 16341005. S2CID 4390013.
  19. ^ Tripepi, Giovanni; Jager, Kitty J.; Dekker, Friedo W.; Zoccali, Carmine (2010). "Selection Bias and Information Bias in Clinical Research". Nephron Clinical Practice. 115 (2): c94–c99. doi:10.1159/000312871. ISSN 1660-2110. PMID 20407272.
  20. ^ a b "Volunteer bias". Catalog of Bias. 2017-11-17. Retrieved 2020-10-29.
  21. ^ Heckman, J. J. (1979). "Sample Selection Bias as a Specification Error". Econometrica. 47 (1): 153–161. doi:10.2307/1912352. JSTOR 1912352.