Normal-Wishart
Notation	$({\boldsymbol {\mu )),{\boldsymbol {\Lambda )))\sim \mathrm {NW} ({\boldsymbol {\mu ))_{0},\lambda ,\mathbf {W} ,\nu )$
Parameters	${\boldsymbol {\mu ))_{0}\in \mathbb {R} ^{D}\,$ location (vector of real) $\lambda >0\,$ (real) ${\displaystyle \mathbf {W} \in \mathbb {R} ^{D\times D))$ scale matrix (pos. def.) $\nu >D-1\,$ (real)
Support	${\displaystyle {\boldsymbol {\mu ))\in \mathbb {R} ^{D};{\boldsymbol {\Lambda ))\in \mathbb {R} ^{D\times D))$ covariance matrix (pos. def.)
PDF	$f({\boldsymbol {\mu )),{\boldsymbol {\Lambda ))\|{\boldsymbol {\mu ))_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N))({\boldsymbol {\mu ))\|{\boldsymbol {\mu ))_{0},(\lambda {\boldsymbol {\Lambda )))^{-1})\ {\mathcal {W))({\boldsymbol {\Lambda ))\|\mathbf {W} ,\nu )$

In probability theory and statistics, the normal-Wishart distribution (or Gaussian-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and precision matrix (the inverse of the covariance matrix).^[1]

Definition

[edit]

Suppose

{\boldsymbol {\mu ))|{\boldsymbol {\mu ))_{0},\lambda ,{\boldsymbol {\Lambda ))\sim {\mathcal {N))({\boldsymbol {\mu ))_{0},(\lambda {\boldsymbol {\Lambda )))^{-1})

has a multivariate normal distribution with mean ${\displaystyle {\boldsymbol {\mu ))_{0))$ and covariance matrix ${\displaystyle (\lambda {\boldsymbol {\Lambda )))^{-1))$ , where

{\boldsymbol {\Lambda ))|\mathbf {W} ,\nu \sim {\mathcal {W))({\boldsymbol {\Lambda ))|\mathbf {W} ,\nu )

has a Wishart distribution. Then $({\boldsymbol {\mu )),{\boldsymbol {\Lambda )))$ has a normal-Wishart distribution, denoted as

({\boldsymbol {\mu )),{\boldsymbol {\Lambda )))\sim \mathrm {NW} ({\boldsymbol {\mu ))_{0},\lambda ,\mathbf {W} ,\nu ).

Characterization

[edit]

Probability density function

[edit]

f({\boldsymbol {\mu )),{\boldsymbol {\Lambda ))|{\boldsymbol {\mu ))_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N))({\boldsymbol {\mu ))|{\boldsymbol {\mu ))_{0},(\lambda {\boldsymbol {\Lambda )))^{-1})\ {\mathcal {W))({\boldsymbol {\Lambda ))|\mathbf {W} ,\nu )

Properties

[edit]

Scaling

[edit]

Marginal distributions

[edit]

By construction, the marginal distribution over ${\boldsymbol {\Lambda ))$ is a Wishart distribution, and the conditional distribution over ${\boldsymbol {\mu ))$ given ${\boldsymbol {\Lambda ))$ is a multivariate normal distribution. The marginal distribution over ${\boldsymbol {\mu ))$ is a multivariate t-distribution.

Posterior distribution of the parameters

[edit]

After making $n$ observations ${\displaystyle {\boldsymbol {x))_{1},\dots ,{\boldsymbol {x))_{n))$ , the posterior distribution of the parameters is

({\boldsymbol {\mu )),{\boldsymbol {\Lambda )))\sim \mathrm {NW} ({\boldsymbol {\mu ))_{n},\lambda _{n},\mathbf {W} _{n},\nu _{n}),

where

\lambda _{n}=\lambda +n,

{\boldsymbol {\mu ))_{n}={\frac {\lambda {\boldsymbol {\mu ))_{0}+n{\boldsymbol {\bar {x)))){\lambda +n)),

\nu _{n}=\nu +n,

\mathbf {W} _{n}^{-1}=\mathbf {W} ^{-1}+\sum _{i=1}^{n}({\boldsymbol {x))_{i}-{\boldsymbol {\bar {x))})({\boldsymbol {x))_{i}-{\boldsymbol {\bar {x))})^{T}+{\frac {n\lambda }{n+\lambda ))({\boldsymbol {\bar {x))}-{\boldsymbol {\mu ))_{0})({\boldsymbol {\bar {x))}-{\boldsymbol {\mu ))_{0})^{T}.

^[2]

Generating normal-Wishart random variates

[edit]

Generation of random variates is straightforward:

Sample ${\boldsymbol {\Lambda ))$ from a Wishart distribution with parameters $\mathbf {W}$ and $\nu$
Sample ${\boldsymbol {\mu ))$ from a multivariate normal distribution with mean ${\displaystyle {\boldsymbol {\mu ))_{0))$ and variance ${\displaystyle (\lambda {\boldsymbol {\Lambda )))^{-1))$

Related distributions

[edit]

The normal-inverse Wishart distribution is essentially the same distribution parameterized by variance rather than precision.
The normal-gamma distribution is the one-dimensional equivalent.
The multivariate normal distribution and Wishart distribution are the component distributions out of which this distribution is made.

Notes

[edit]

^ Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media. Page 690.
^ Cross Validated, https://stats.stackexchange.com/q/324925

References

[edit]

Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.

Probability distributions (list)

Discrete
univariate

with finite support	Benford Bernoulli beta-binomial binomial categorical hypergeometric negative Poisson binomial Rademacher soliton discrete uniform Zipf Zipf–Mandelbrot
with infinite support	beta negative binomial Borel Conway–Maxwell–Poisson discrete phase-type Delaporte extended negative binomial Flory–Schulz Gauss–Kuzmin geometric logarithmic mixed Poisson negative binomial Panjer parabolic fractal Poisson Skellam Yule–Simon zeta

Continuous
univariate

supported on a bounded interval	arcsine ARGUS Balding–Nichols Bates beta beta rectangular continuous Bernoulli Irwin–Hall Kumaraswamy logit-normal noncentral beta PERT raised cosine reciprocal triangular U-quadratic uniform Wigner semicircle
supported on a semi-infinite interval	Benini Benktander 1st kind Benktander 2nd kind beta prime Burr chi chi-squared noncentral inverse scaled Dagum Davis Erlang hyper exponential hyperexponential hypoexponential logarithmic F noncentral folded normal Fréchet gamma generalized inverse gamma/Gompertz Gompertz shifted half-logistic half-normal Hotelling's T-squared inverse Gaussian generalized Kolmogorov Lévy log-Cauchy log-Laplace log-logistic log-normal log-t Lomax matrix-exponential Maxwell–Boltzmann Maxwell–Jüttner Mittag-Leffler Nakagami Pareto phase-type Poly-Weibull Rayleigh relativistic Breit–Wigner Rice truncated normal type-2 Gumbel Weibull discrete Wilks's lambda
supported on the whole real line	Cauchy exponential power Fisher's z Kaniadakis κ-Gaussian Gaussian q generalized normal generalized hyperbolic geometric stable Gumbel Holtsmark hyperbolic secant Johnson's S_U Landau Laplace asymmetric logistic noncentral t normal (Gaussian) normal-inverse Gaussian skew normal slash stable Student's t Tracy–Widom variance-gamma Voigt
with support whose type varies	generalized chi-squared generalized extreme value generalized Pareto Marchenko–Pastur Kaniadakis κ-exponential Kaniadakis κ-Gamma Kaniadakis κ-Weibull Kaniadakis κ-Logistic Kaniadakis κ-Erlang q-exponential q-Gaussian q-Weibull shifted log-logistic Tukey lambda

Mixed
univariate

continuous- discrete	Rectified Gaussian

Multivariate
(joint)

Discrete:
Ewens
multinomial
- Dirichlet
- negative
Continuous:
Dirichlet
- generalized
multivariate Laplace
multivariate normal
multivariate stable
multivariate t
normal-gamma
- inverse
Matrix-valued:
LKJ
matrix normal
matrix t
matrix gamma
- inverse
Wishart
- normal
- inverse
- normal-inverse
- complex

Directional

Univariate (circular) directional: Circular uniform; univariate von Mises; wrapped normal; wrapped Cauchy; wrapped exponential; wrapped asymmetric Laplace; wrapped Lévy
Bivariate (spherical): Kent
Bivariate (toroidal): bivariate von Mises
Multivariate: von Mises–Fisher; Bingham

Degenerate
and singular

Degenerate: Dirac delta function
Singular: Cantor

Families