Regression analysis
Part of a series on
Models
Linear regression Simple regression Polynomial regression General linear model
Generalized linear model Vector generalized linear model Discrete choice Binomial regression Binary regression Logistic regression Multinomial logistic regression Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson
Multilevel model Fixed effects Random effects Linear mixed-effects model Nonlinear mixed-effects model
Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented
Errors-in-variables
Estimation
Least squares Linear Non-linear
Ordinary Weighted Generalized Generalized estimating equation
Partial Total Non-negative Ridge regression Regularized
Least absolute deviations Iteratively reweighted Bayesian Bayesian multivariate Least-squares spectral analysis
Background
Regression validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem
Mathematics portal
v t e

In statistics, specifically regression analysis, a binary regression estimates a relationship between one or more explanatory variables and a single output binary variable. Generally the probability of the two alternatives is modeled, instead of simply outputting a single value, as in linear regression.

Binary regression is usually analyzed as a special case of binomial regression, with a single outcome ( $n=1$ ), and one of the two alternatives considered as "success" and coded as 1: the value is the count of successes in 1 trial, either 0 or 1. The most common binary regression models are the logit model (logistic regression) and the probit model (probit regression).

Applications

Binary regression is principally applied either for prediction (binary classification), or for estimating the association between the explanatory variables and the output. In economics, binary regressions are used to model binary choice.

Interpretations

Binary regression models can be interpreted as latent variable models, together with a measurement model; or as probabilistic models, directly modeling the probability.

Latent variable model

The latent variable interpretation has traditionally been used in bioassay, yielding the probit model, where normal variance and a cutoff are assumed. The latent variable interpretation is also used in item response theory (IRT).

Formally, the latent variable interpretation posits that the outcome y is related to a vector of explanatory variables x by

y=1[y^{*}>0]

where $y^{*}=x\beta +\varepsilon$ and $\varepsilon \mid x\sim G$ , $β$ is a vector of parameters and G is a probability distribution.

This model can be applied in many economic contexts. For instance, the outcome can be the decision of a manager whether invest to a program, ${\displaystyle y^{*))$ is the expected net discounted cash flow and x is a vector of variables which can affect the cash flow of this program. Then the manager will invest only when she expects the net discounted cash flow to be positive.^[1]

Often, the error term $\varepsilon$ is assumed to follow a normal distribution conditional on the explanatory variables x. This generates the standard probit model.^[2]

Probabilistic model

The simplest direct probabilistic model is the logit model, which models the log-odds as a linear function of the explanatory variable or variables. The logit model is "simplest" in the sense of generalized linear models (GLIM): the log-odds are the natural parameter for the exponential family of the Bernoulli distribution, and thus it is the simplest to use for computations.

Another direct probabilistic model is the linear probability model, which models the probability itself as a linear function of the explanatory variables. A drawback of the linear probability model is that, for some values of the explanatory variables, the model will predict probabilities less than zero or greater than one.

Applications

Interpretations

Latent variable model

Probabilistic model

See also

References