This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)
This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.Find sources: "Hyperparameter" – news · newspapers · books · scholar · JSTOR (July 2020)
This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details. (December 2020) (Learn how and when to remove this template message)
(Learn how and when to remove this template message)

In Bayesian statistics, a **hyperparameter** is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis.

For example, if one is using a beta distribution to model the distribution of the parameter *p* of a Bernoulli distribution, then:

*p*is a parameter of the underlying system (Bernoulli distribution), and*α*and*β*are parameters of the prior distribution (beta distribution), hence*hyper*parameters.

One may take a single value for a given hyperparameter, or one can iterate and take a probability distribution on the hyperparameter itself, called a hyperprior.

One often uses a prior which comes from a parametric family of probability distributions – this is done partly for explicitness (so one can write down a distribution, and choose the form by varying the hyperparameter, rather than trying to produce an arbitrary function), and partly so that one can *vary* the hyperparameter, particularly in the method of *conjugate priors,* or for *sensitivity analysis.*

Main article: Conjugate prior |

When using a conjugate prior, the posterior distribution will be from the same family, but will have different hyperparameters, which reflect the added information from the data: in subjective terms, one's beliefs have been updated. For a general prior distribution, this is computationally very involved, and the posterior may have an unusual or hard to describe form, but with a conjugate prior, there is generally a simple formula relating the values of the hyperparameters of the posterior to the values of the hyperparameters of the prior, and thus the computation of the posterior distribution is very easy.

Main article: sensitivity analysis |

A key concern of users of Bayesian statistics, and criticism by critics, is the dependence of the posterior distribution on one's prior. Hyperparameters address this by allowing one to easily vary them and see how the posterior distribution (and various statistics of it, such as credible intervals) vary: one can see how *sensitive* one's conclusions are to one's prior assumptions, and the process is called *sensitivity analysis.*

Similarly, one may use a prior distribution with a range for a hyperparameter, perhaps reflecting uncertainty in the correct prior to take, and reflect this in a range for final uncertainty.^{[1]}

Main article: Hyperprior |

Instead of using a single value for a given hyperparameter, one can instead consider a probability distribution of the hyperparameter itself; this is called a "hyperprior." In principle, one may iterate this, calling parameters of a hyperprior "hyperhyperparameters," and so forth.