Part of a series on statistics |
Probability theory |
---|
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.
The expected value of a random variable with a finite number of outcomes is a weighted average of all possible outcomes. In the case of a continuum of possible outcomes, the expectation is defined by integration. In the axiomatic foundation for probability provided by measure theory, the expectation is given by Lebesgue integration.
The expected value of a random variable X is often denoted by E(X), E[X], or EX, with E also often stylized as or E.^{[1]}^{[2]}^{[3]}
The idea of the expected value originated in the middle of the 17th century from the study of the so-called problem of points, which seeks to divide the stakes in a fair way between two players, who have to end their game before it is properly finished.^{[4]} This problem had been debated for centuries. Many conflicting proposals and solutions had been suggested over the years when it was posed to Blaise Pascal by French writer and amateur mathematician Chevalier de Méré in 1654. Méré claimed that this problem could not be solved and that it showed just how flawed mathematics was when it came to its application to the real world. Pascal, being a mathematician, was provoked and determined to solve the problem once and for all.
He began to discuss the problem in the famous series of letters to Pierre de Fermat. Soon enough, they both independently came up with a solution. They solved the problem in different computational ways, but their results were identical because their computations were based on the same fundamental principle. The principle is that the value of a future gain should be directly proportional to the chance of getting it. This principle seemed to have come naturally to both of them. They were very pleased by the fact that they had found essentially the same solution, and this in turn made them absolutely convinced that they had solved the problem conclusively; however, they did not publish their findings. They only informed a small circle of mutual scientific friends in Paris about it.^{[5]}
In Dutch mathematician Christiaan Huygens' book, he considered the problem of points, and presented a solution based on the same principle as the solutions of Pascal and Fermat. Huygens published his treatise in 1657, (see Huygens (1657)) "De ratiociniis in ludo aleæ" on probability theory just after visiting Paris. The book extended the concept of expectation by adding rules for how to calculate expectations in more complicated situations than the original problem (e.g., for three or more players), and can be seen as the first successful attempt at laying down the foundations of the theory of probability.
In the foreword to his treatise, Huygens wrote:
It should be said, also, that for some time some of the best mathematicians of France have occupied themselves with this kind of calculus so that no one should attribute to me the honour of the first invention. This does not belong to me. But these savants, although they put each other to the test by proposing to each other many questions difficult to solve, have hidden their methods. I have had therefore to examine and go deeply for myself into this matter by beginning with the elements, and it is impossible for me for this reason to affirm that I have even started from the same principle. But finally I have found that my answers in many cases do not differ from theirs.
— Edwards (2002)
In the mid-nineteenth century, Pafnuty Chebyshev became the first person to think systematically in terms of the expectations of random variables.^{[6]}
Neither Pascal nor Huygens used the term "expectation" in its modern sense. In particular, Huygens writes:^{[7]}
That any one Chance or Expectation to win any thing is worth just such a Sum, as wou'd procure in the same Chance and Expectation at a fair Lay. ... If I expect a or b, and have an equal chance of gaining them, my Expectation is worth (a+b)/2.
More than a hundred years later, in 1814, Pierre-Simon Laplace published his tract "Théorie analytique des probabilités", where the concept of expected value was defined explicitly:^{[8]}
... this advantage in the theory of chance is the product of the sum hoped for by the probability of obtaining it; it is the partial sum which ought to result when we do not wish to run the risks of the event in supposing that the division is made proportional to the probabilities. This division is the only equitable one when all strange circumstances are eliminated; because an equal degree of probability gives an equal right for the sum hoped for. We will call this advantage mathematical hope.
The use of the letter E to denote "expected value" goes back to W. A. Whitworth in 1901.^{[9]} The symbol has since become popular for English writers. In German, E stands for Erwartungswert, in Spanish for esperanza matemática, and in French for espérance mathématique.^{[10]}
When "E" is used to denote "expected value", authors use a variety of stylizations: the expectation operator can be stylized as E (upright), E (italic), or (in blackboard bold), while a variety of bracket notations (such as E(X), E[X], and EX) are all used.
Another popular notation is μ_{X}, whereas ⟨X⟩, ⟨X⟩_{av}, and are commonly used in physics,^{[11]} and M(X) in Russian-language literature.
As discussed above, there are several context-dependent ways of defining the expected value. The simplest and original definition deals with the case of finitely many possible outcomes, such as in the flip of a coin. With the theory of infinite series, this can be extended to the case of countably many possible outcomes. It is also very common to consider the distinct case of random variables dictated by (piecewise-)continuous probability density functions, as these arise in many natural contexts. All of these specific definitions may be viewed as special cases of the general definition based upon the mathematical tools of measure theory and Lebesgue integration, which provide these different contexts with an axiomatic foundation and common language.
Any definition of expected value may be extended to define an expected value of a multidimensional random variable, i.e. a random vector X. It is defined component by component, as E[X]_{i} = E[X_{i}]. Similarly, one may define the expected value of a random matrix X with components X_{ij} by E[X]_{ij} = E[X_{ij}].
Consider a random variable X with a finite list x_{1}, ..., x_{k} of possible outcomes, each of which (respectively) has probability p_{1}, ..., p_{k} of occurring. The expectation of X is defined as^{[12]}
Since the probabilities must satisfy p_{1} + ⋅⋅⋅ + p_{k} = 1, it is natural to interpret E[X] as a weighted average of the x_{i} values, with weights given by their probabilities p_{i}.
In the special case that all possible outcomes are equiprobable (that is, p_{1} = ⋅⋅⋅ = p_{k}), the weighted average is given by the standard average. In the general case, the expected value takes into account the fact that some outcomes are more likely than others.
Informally, the expectation of a random variable with a countably infinite set of possible outcomes is defined analogously as the weighted average of all possible outcomes, where the weights are given by the probabilities of realizing each given value. This is to say that
However, there are some subtleties with infinite summation, so the above formula is not suitable as a mathematical definition. In particular, the Riemann series theorem of mathematical analysis illustrates that the value of certain infinite sums involving positive and negative summands depends on the order in which the summands are given. Since the outcomes of a random variable have no naturally given order, this creates a difficulty in defining expected value precisely.
For this reason, many mathematical textbooks only consider the case that the infinite sum given above converges absolutely, which implies that the infinite sum is a finite number independent of the ordering of summands.^{[14]} In the alternative case that the infinite sum does not converge absolutely, one says the random variable does not have finite expectation.^{[14]}
Now consider a random variable X which has a probability density function given by a function f on the real number line. This means that the probability of X taking on a value in any given open interval is given by the integral of f over that interval. The expectation of X is then given by the integral^{[15]}
Analogously to the countably-infinite case above, there are subtleties with this expression due to the infinite region of integration. Such subtleties can be seen concretely if the distribution of X is given by the Cauchy distribution Cauchy(0, π), so that f(x) = (x^{2} + π^{2})^{−1}. It is straightforward to compute in this case that
To avoid such ambiguities, in mathematical textbooks it is common to require that the given integral converges absolutely, with E[X] left undefined otherwise.^{[17]} However, measure-theoretic notions as given below can be used to give a systematic definition of E[X] for more general random variables X.
All definitions of the expected value may be expressed in the language of measure theory. In general, if X is a real-valued random variable defined on a probability space (Ω, Σ, P), then the expected value of X, denoted by E[X], is defined as the Lebesgue integral^{[18]}
These conditions are all equivalent, although this is nontrivial to establish.^{[20]} In this definition, f is called the probability density function of X (relative to Lebesgue measure). According to the change-of-variables formula for Lebesgue integration,^{[21]} combined with the law of the unconscious statistician,^{[22]} it follows that
The expected value of any real-valued random variable can also be defined on the graph of its cumulative distribution function by a nearby equality of areas. In fact, with a real number if and only if the two surfaces in the --plane, described by
Expected values as defined above are automatically finite numbers. However, in many cases it is fundamental to be able to consider expected values of ±∞. This is intuitive, for example, in the case of the St. Petersburg paradox, in which one considers a random variable with possible outcomes x_{i} = 2^{i}, with associated probabilities p_{i} = 2^{−i}, for i ranging over all positive integers. According to the summation formula in the case of random variables with countably many outcomes, one has
There is a rigorous mathematical theory underlying such ideas, which is often taken as part of the definition of the Lebesgue integral.^{[19]} The first fundamental observation is that, whichever of the above definitions are followed, any nonnegative random variable whatsoever can be given an unambiguous expected value; whenever absolute convergence fails, then the expected value can be defined as +∞. The second fundamental observation is that any random variable can be written as the difference of two nonnegative random variables. Given a random variable X, one defines the positive and negative parts by X^{ +} = max(X, 0) and X^{ −} = −min(X, 0). These are nonnegative random variables, and it can be directly checked that X = X^{ +} − X^{ −}. Since E[X^{ +}] and E[X^{ −}] are both then defined as either nonnegative numbers or +∞, it is then natural to define:
According to this definition, E[X] exists and is finite if and only if E[X^{ +}] and E[X^{ −}] are both finite. Due to the formula |X| = X^{ +} + X^{ −}, this is the case if and only if E|X| is finite, and this is equivalent to the absolute convergence conditions in the definitions above. As such, the present considerations do not define finite expected values in any cases not previously considered; they are only useful for infinite expectations.
The following table gives the expected values of some commonly occurring probability distributions. The third column gives the expected values both in the form immediately given by the definition, as well as in the simplified form obtained by computation therefrom. The details of these computations, which are not always straightforward, can be found in the indicated references.
Distribution | Notation | Mean E(X) |
---|---|---|
Bernoulli^{[24]} | ||
Binomial^{[25]} | ||
Poisson^{[26]} | ||
Geometric^{[27]} | ||
Uniform^{[28]} | ||
Exponential^{[29]} | ||
Normal^{[30]} | ||
Standard Normal^{[31]} | ||
Pareto^{[32]} | ||
Cauchy^{[33]} | is undefined |
The basic properties below (and their names in bold) replicate or follow immediately from those of Lebesgue integral. Note that the letters "a.s." stand for "almost surely"—a central property of the Lebesgue integral. Basically, one says that an inequality like is true almost surely, when the probability measure attributes zero-mass to the complementary event
Concentration inequalities control the likelihood of a random variable taking on large values. Markov's inequality is among the best-known and simplest to prove: for a nonnegative random variable X and any positive number a, it states that^{[37]}
If X is any random variable with finite expectation, then Markov's inequality may be applied to the random variable |X−E[X]|^{2} to obtain Chebyshev's inequality
The following three inequalities are of fundamental importance in the field of mathematical analysis and its applications to probability theory.
The Hölder and Minkowski inequalities can be extended to general measure spaces, and are often given in that context. By contrast, the Jensen inequality is special to the case of probability spaces.
In general, it is not the case that even if pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on the random variables. To see this, let be a random variable distributed uniformly on For define a sequence of random variables
Analogously, for general sequence of random variables the expected value operator is not -additive, i.e.
An example is easily obtained by setting and for where is as in the previous example.
A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below.
The probability density function of a scalar random variable is related to its characteristic function by the inversion formula:
For the expected value of (where is a Borel function), we can use this inversion formula to obtain
If is finite, changing the order of integration, we get, in accordance with Fubini–Tonelli theorem,
The expectation of a random variable plays an important role in a variety of contexts.
In statistics, where one seeks estimates for unknown parameters based on available data gained from samples, the sample mean serves as an estimate for the expectation, and is itself a random variable. In such settings, the sample mean is considered to meet the desirable criterion for a "good" estimator in being unbiased; that is, the expected value of the estimate is equal to the true value of the underlying parameter.
See also: Estimation theory |
For a different example, in decision theory, an agent making an optimal choice in the context of incomplete information is often assumed to maximize the expected value of their utility function.
It is possible to construct an expected value equal to the probability of an event by taking the expectation of an indicator function that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating probabilities by frequencies.
The expected values of the powers of X are called the moments of X; the moments about the mean of X are expected values of powers of X − E[X]. The moments of some random variables can be used to specify their distributions, via their moment generating functions.
To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the arithmetic mean of the results. If the expected value exists, this procedure estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate). The law of large numbers demonstrates (under fairly mild conditions) that, as the size of the sample gets larger, the variance of this estimate gets smaller.
This property is often exploited in a wide variety of applications, including general problems of statistical estimation and machine learning, to estimate (probabilistic) quantities of interest via Monte Carlo methods, since most quantities of interest can be written in terms of expectation, e.g. where is the indicator function of the set
In classical mechanics, the center of mass is an analogous concept to expectation. For example, suppose X is a discrete random variable with values x_{i} and corresponding probabilities p_{i}. Now consider a weightless rod on which are placed weights, at locations x_{i} along the rod and having masses p_{i} (whose sum is one). The point at which the rod balances is E[X].
Expected values can also be used to compute the variance, by means of the computational formula for the variance
A very important application of the expectation value is in the field of quantum mechanics. The expectation value of a quantum mechanical operator operating on a quantum state vector is written as The uncertainty in can be calculated by the formula .