Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The science of why things occur is called etiology. Causal inference is said to provide the evidence of causality theorized by causal reasoning.
Causal inference is widely studied across all sciences. Several innovations in the development and implementation of methodology designed to determine causality have proliferated in recent decades. Causal inference remains especially difficult where experimentation is difficult or impossible, which is common throughout most sciences.
The approaches to causal inference are broadly applicable across all types of scientific disciplines, and many methods of causal inference that were designed for certain disciplines have found use in other disciplines. This article outlines the basic process behind causal inference and details some of the more conventional tests used across different disciplines; however, this should not be mistaken as a suggestion that these methods apply only to those disciplines, merely that they are the most commonly used in that discipline.
Causal inference is difficult to perform and there is significant debate amongst scientists about the proper way to determine causality. Despite other innovations, there remain concerns of misattribution by scientists of correlative results as causal, of the usage of incorrect methodologies by scientists, and of deliberate manipulation by scientists of analytical results in order to obtain statistically significant estimates. Particular concern is raised in the use of regression models, especially linear regression models.
Inferring the cause of something has been described as:
Further information: Causality and Causal analysis
Causal inference is conducted via the study of systems where the measure of one variable is suspected to affect the measure of another. Causal inference is conducted with regard to the scientific method. The first step of causal inference is to formulate a falsifiable null hypothesis, which is subsequently tested with statistical methods. Frequentist statistical inference is the use of statistical methods to determine the probability that the data occur under the null hypothesis by chance; Bayesian inference is used to determine the effect of an independent variable. Statistical inference is generally used to determine the difference between variations in the original data that are random variation or the effect of a well-specified causal mechanism. Notably, correlation does not imply causation, so the study of causality is as concerned with the study of potential causal mechanisms as it is with variation amongst the data. A frequently sought after standard of causal inference is an experiment wherein treatment is randomly assigned but all other confounding factors are held constant. Most of the efforts in causal inference are in the attempt to replicate experimental conditions.
Epidemiological studies employ different epidemiological methods of collecting and measuring evidence of risk factors and effect and different ways of measuring association between the two. Results of a 2020 review of methods for causal inference found that using existing literature for clinical training programs can be challenging. This is because published articles often assume an advanced technical background, they may be written from multiple statistical, epidemiological, computer science, or philosophical perspectives, methodological approaches continue to expand rapidly, and many aspects of causal inference receive limited coverage.
Common frameworks for causal inference include the causal pie model (component-cause), Pearl's structural causal model (causal diagram + do-calculus), structural equation modeling, and Rubin causal model (potential-outcome), which are often used in areas such as social sciences and epidemiology.
Further information: Experiment
Experimental verification of causal mechanisms is possible using experimental methods. The main motivation behind an experiment is to hold other experimental variables constant while purposefully manipulating the variable of interest. If the experiment produces statistically significant effects as a result of only the treatment variable being manipulated, there is grounds to believe that a causal effect can be assigned to the treatment variable, assuming that other standards for experimental design have been met.
Further information: Quasi-experiment
Quasi-experimental verification of causal mechanisms is conducted when traditional experimental methods are unavailable. This may be the result of prohibitive costs of conducting an experiment, or the inherent infeasibility of conducting an experiment, especially experiments that are concerned with large systems such as economies of electoral systems, or for treatments that are considered to present a danger to the well-being of test subjects. Quasi-experiments may also occur where information is withheld for legal reasons.
Epidemiology studies patterns of health and disease in defined populations of living beings in order to infer causes and effects. An association between an exposure to a putative risk factor and a disease may be suggestive of, but is not equivalent to causality because correlation does not imply causation. Historically, Koch's postulates have been used since the 19th century to decide if a microorganism was the cause of a disease. In the 20th century the Bradford Hill criteria, described in 1965 have been used to assess causality of variables outside microbiology, although even these criteria are not exclusive ways to determine causality.
In molecular epidemiology the phenomena studied are on a molecular biology level, including genetics, where biomarkers are evidence of cause or effects.
A recent trend[when?] is to identify evidence for influence of the exposure on molecular pathology within diseased tissue or cells, in the emerging interdisciplinary field of molecular pathological epidemiology (MPE).[third-party source needed] Linking the exposure to molecular pathologic signatures of the disease can help to assess causality.[third-party source needed] Considering the inherent nature of heterogeneity of a given disease, the unique disease principle, disease phenotyping and subtyping are trends in biomedical and public health sciences, exemplified as personalized medicine and precision medicine.[third-party source needed]
Determination of cause and effect from joint observational data for two time-independent variables, say X and Y, has been tackled using asymmetry between evidence for some model in the directions, X → Y and Y → X. The primary approaches are based on Algorithmic information theory models and noise models.
Incorporate an independent noise term in the model to compare the evidences of the two directions.
Here are some of the noise models for the hypothesis Y → X with the noise E:
The common assumption in these models are:
On an intuitive level, the idea is that the factorization of the joint distribution P(Cause, Effect) into P(Cause)*P(Effect | Cause) typically yields models of lower total complexity than the factorization into P(Effect)*P(Cause | Effect). Although the notion of "complexity" is intuitively appealing, it is not obvious how it should be precisely defined. A different family of methods attempt to discover causal "footprints" from large amounts of labeled data, and allow the prediction of more flexible causal relations.
Despite the advancements in the development of methodologies used to determine causality, significant weaknesses in determining causality remain. These weaknesses can be attributed both to the inherent difficulty of determining causal relations in complex systems but also to cases of scientific malpractice.
Separate from the difficulties of causal inference, the perception that large numbers of scholars in the social sciences engage in non-scientific methodology exists among some large groups of social scientists. Criticism of economists and social scientists as passing off descriptive studies as causal studies are rife within those fields.
In the sciences, especially in the social sciences, there is concern among scholars that scientific malpractice is widespread. As scientific study is a broad topic, there are theoretically limitless ways to have a causal inference undermined through no fault of a researcher. Nonetheless, there remain concerns among scientists that large numbers of researchers do not perform basic duties or practice sufficiently diverse methods in causal inference.[failed verification]
One prominent example of common non-causal methodology is the erroneous assumption of correlative properties as causal properties. There is no inherent causality in phenomena that correlate. Regression models are designed to measure variance within data relative to a theoretical model: there is nothing to suggest that data that presents high levels of covariance have any meaningful relationship (absent a proposed causal mechanism with predictive properties or a random assignment of treatment). The use of flawed methodology has been claimed to be widespread, with common examples of such malpractice being the overuse of correlative models, especially the overuse of regression models and particularly linear regression models. The presupposition that two correlated phenomena are inherently related is a logical fallacy known as spurious correlation. Some social scientists claim that widespread use of methodology that attributes causality to spurious correlations have been detrimental to the integrity of the social sciences, although improvements stemming from better methodologies have been noted.
A potential effect of scientific studies that erroneously conflate correlation with causality is an increase in the number of scientific findings whose results are not reproducible by third parties. Such non-reproducibility is a logical consequence of findings that correlation only temporarily being overgeneralized into mechanisms that have no inherent relationship, where new data does not contain the previous, idiosyncratic correlations of the original data. Debates over the effect of malpractice versus the effect of the inherent difficulties of searching for causality are ongoing. Critics of widely practiced methodologies argue that researchers have engaged statistical manipulation in to publish articles that supposedly demonstrate evidence of causality but are actually examples of spurious correlation being touted as evidence of causality: such endeavors may be referred to as P hacking. To prevent this, some have advocated that researchers preregister their research designs prior to conducting to their studies so that they do not inadvertently overemphasize a nonreproducible finding that was not the initial subject of inquiry but was found to be statistically significant during data analysis.