Notation $({\boldsymbol {\mu )),{\boldsymbol {\Lambda )))\sim {\mathrm {NW} }({\boldsymbol {\mu ))_{0},\lambda ,{\mathbf {W} },\nu )$ ${\boldsymbol {\mu ))_{0}\in {\mathbb {R} }^{D}\,$ location (vector of real)$\lambda >0\,$ (real)${\mathbf {W} }\in {\mathbb {R} }^{D\times D}$ scale matrix (pos. def.)$\nu >D-1\,$ (real) ${\boldsymbol {\mu ))\in {\mathbb {R} }^{D};{\boldsymbol {\Lambda ))\in {\mathbb {R} }^{D\times D}$ covariance matrix (pos. def.) $f({\boldsymbol {\mu )),{\boldsymbol {\Lambda ))|{\boldsymbol {\mu ))_{0},\lambda ,{\mathbf {W} },\nu )={\mathcal {N))({\boldsymbol {\mu ))|{\boldsymbol {\mu ))_{0},(\lambda {\boldsymbol {\Lambda )))^{-1})\ {\mathcal {W))({\boldsymbol {\Lambda ))|{\mathbf {W} },\nu )$ In probability theory and statistics, the normal-Wishart distribution (or Gaussian-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and precision matrix (the inverse of the covariance matrix).

## Definition

Suppose

${\boldsymbol {\mu ))|{\boldsymbol {\mu ))_{0},\lambda ,{\boldsymbol {\Lambda ))\sim {\mathcal {N))({\boldsymbol {\mu ))_{0},(\lambda {\boldsymbol {\Lambda )))^{-1})$ has a multivariate normal distribution with mean ${\boldsymbol {\mu ))_{0}$ and covariance matrix $(\lambda {\boldsymbol {\Lambda )))^{-1}$ , where

${\boldsymbol {\Lambda ))|{\mathbf {W} },\nu \sim {\mathcal {W))({\boldsymbol {\Lambda ))|{\mathbf {W} },\nu )$ has a Wishart distribution. Then $({\boldsymbol {\mu )),{\boldsymbol {\Lambda )))$ has a normal-Wishart distribution, denoted as

$({\boldsymbol {\mu )),{\boldsymbol {\Lambda )))\sim {\mathrm {NW} }({\boldsymbol {\mu ))_{0},\lambda ,{\mathbf {W} },\nu ).$ ## Characterization

### Probability density function

$f({\boldsymbol {\mu )),{\boldsymbol {\Lambda ))|{\boldsymbol {\mu ))_{0},\lambda ,{\mathbf {W} },\nu )={\mathcal {N))({\boldsymbol {\mu ))|{\boldsymbol {\mu ))_{0},(\lambda {\boldsymbol {\Lambda )))^{-1})\ {\mathcal {W))({\boldsymbol {\Lambda ))|{\mathbf {W} },\nu )$ ## Properties

### Marginal distributions

By construction, the marginal distribution over ${\boldsymbol {\Lambda ))$ is a Wishart distribution, and the conditional distribution over ${\boldsymbol {\mu ))$ given ${\boldsymbol {\Lambda ))$ is a multivariate normal distribution. The marginal distribution over ${\boldsymbol {\mu ))$ is a multivariate t-distribution.

## Posterior distribution of the parameters

After making $n$ observations ${\boldsymbol {x))_{1},\dots ,{\boldsymbol {x))_{n)$ , the posterior distribution of the parameters is

$({\boldsymbol {\mu )),{\boldsymbol {\Lambda )))\sim \mathrm {NW} ({\boldsymbol {\mu ))_{n},\lambda _{n},\mathbf {W} _{n},\nu _{n}),$ where

$\lambda _{n}=\lambda +n,$ ${\boldsymbol {\mu ))_{n}={\frac {\lambda {\boldsymbol {\mu ))_{0}+n{\boldsymbol {\bar {x)))){\lambda +n)),$ $\nu _{n}=\nu +n,$ $\mathbf {W} _{n}^{-1}=\mathbf {W} ^{-1}+\sum _{i=1}^{n}({\boldsymbol {x))_{i}-{\boldsymbol {\bar {x))})({\boldsymbol {x))_{i}-{\boldsymbol {\bar {x))})^{T}+{\frac {n\lambda }{n+\lambda ))({\boldsymbol {\bar {x))}-{\boldsymbol {\mu ))_{0})({\boldsymbol {\bar {x))}-{\boldsymbol {\mu ))_{0})^{T}.$ ## Generating normal-Wishart random variates

Generation of random variates is straightforward:

1. Sample ${\boldsymbol {\Lambda ))$ from a Wishart distribution with parameters $\mathbf {W}$ and $\nu$ 2. Sample ${\boldsymbol {\mu ))$ from a multivariate normal distribution with mean ${\boldsymbol {\mu ))_{0}$ and variance $(\lambda {\boldsymbol {\Lambda )))^{-1}$ ## Related distributions

1. ^ Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media. Page 690.
2. ^ Cross Validated, https://stats.stackexchange.com/q/324925
• Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.