Notation ${\displaystyle ({\boldsymbol {\mu )),{\boldsymbol {\Sigma )))\sim \mathrm {NIW} ({\boldsymbol {\mu ))_{0},\lambda ,{\boldsymbol {\Psi )),\nu )}$ ${\displaystyle {\boldsymbol {\mu ))_{0}\in \mathbb {R} ^{D}\,}$ location (vector of real)${\displaystyle \lambda >0\,}$ (real)${\displaystyle {\boldsymbol {\Psi ))\in \mathbb {R} ^{D\times D))$ inverse scale matrix (pos. def.)${\displaystyle \nu >D-1\,}$ (real) ${\displaystyle {\boldsymbol {\mu ))\in \mathbb {R} ^{D};{\boldsymbol {\Sigma ))\in \mathbb {R} ^{D\times D))$ covariance matrix (pos. def.) ${\displaystyle f({\boldsymbol {\mu )),{\boldsymbol {\Sigma ))|{\boldsymbol {\mu ))_{0},\lambda ,{\boldsymbol {\Psi )),\nu )={\mathcal {N))({\boldsymbol {\mu ))|{\boldsymbol {\mu ))_{0},{\tfrac {1}{\lambda )){\boldsymbol {\Sigma )))\ {\mathcal {W))^{-1}({\boldsymbol {\Sigma ))|{\boldsymbol {\Psi )),\nu )}$

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).[1]

## Definition

Suppose

${\displaystyle {\boldsymbol {\mu ))|{\boldsymbol {\mu ))_{0},\lambda ,{\boldsymbol {\Sigma ))\sim {\mathcal {N))\left({\boldsymbol {\mu )){\Big |}{\boldsymbol {\mu ))_{0},{\frac {1}{\lambda )){\boldsymbol {\Sigma ))\right)}$

has a multivariate normal distribution with mean ${\displaystyle {\boldsymbol {\mu ))_{0))$ and covariance matrix ${\displaystyle {\tfrac {1}{\lambda )){\boldsymbol {\Sigma ))}$, where

${\displaystyle {\boldsymbol {\Sigma ))|{\boldsymbol {\Psi )),\nu \sim {\mathcal {W))^{-1}({\boldsymbol {\Sigma ))|{\boldsymbol {\Psi )),\nu )}$

has an inverse Wishart distribution. Then ${\displaystyle ({\boldsymbol {\mu )),{\boldsymbol {\Sigma )))}$ has a normal-inverse-Wishart distribution, denoted as

${\displaystyle ({\boldsymbol {\mu )),{\boldsymbol {\Sigma )))\sim \mathrm {NIW} ({\boldsymbol {\mu ))_{0},\lambda ,{\boldsymbol {\Psi )),\nu ).}$

## Characterization

### Probability density function

${\displaystyle f({\boldsymbol {\mu )),{\boldsymbol {\Sigma ))|{\boldsymbol {\mu ))_{0},\lambda ,{\boldsymbol {\Psi )),\nu )={\mathcal {N))\left({\boldsymbol {\mu )){\Big |}{\boldsymbol {\mu ))_{0},{\frac {1}{\lambda )){\boldsymbol {\Sigma ))\right){\mathcal {W))^{-1}({\boldsymbol {\Sigma ))|{\boldsymbol {\Psi )),\nu )}$

The full version of the PDF is as follows:[2]

${\displaystyle f({\boldsymbol {\mu )),{\boldsymbol {\Sigma ))|{\boldsymbol {\delta )),\gamma ,{\boldsymbol {\Psi )),\alpha )={\frac {\gamma ^{D/2}|{\boldsymbol {\Psi ))|^{\alpha /2}|{\boldsymbol {\Sigma ))|^{-{\frac {\alpha +D+2}{2)))){(2\pi )^{D/2}2^{\frac {\alpha D}{2))\Gamma _{D}({\frac {\alpha }{2))))){\text{exp))\left\{-{\frac {1}{2))Tr({\boldsymbol {\Psi \Sigma ))^{-1})-{\frac {\gamma }{2))({\boldsymbol {\mu ))-{\boldsymbol {\delta )))^{T}{\boldsymbol {\Sigma ))^{-1}({\boldsymbol {\mu ))-{\boldsymbol {\delta )))\right\))$

Here ${\displaystyle \Gamma _{D}[\cdot ]}$ is the multivariate gamma function and ${\displaystyle Tr({\boldsymbol {\Psi )))}$ is the Trace of the given matrix.

## Properties

### Marginal distributions

By construction, the marginal distribution over ${\displaystyle {\boldsymbol {\Sigma ))}$ is an inverse Wishart distribution, and the conditional distribution over ${\displaystyle {\boldsymbol {\mu ))}$ given ${\displaystyle {\boldsymbol {\Sigma ))}$ is a multivariate normal distribution. The marginal distribution over ${\displaystyle {\boldsymbol {\mu ))}$ is a multivariate t-distribution.

## Posterior distribution of the parameters

Suppose the sampling density is a multivariate normal distribution

${\displaystyle {\boldsymbol {y_{i))}|{\boldsymbol {\mu )),{\boldsymbol {\Sigma ))\sim {\mathcal {N))_{p}({\boldsymbol {\mu )),{\boldsymbol {\Sigma )))}$

where ${\displaystyle {\boldsymbol {y))}$ is an ${\displaystyle n\times p}$ matrix and ${\displaystyle {\boldsymbol {y_{i))))$ (of length ${\displaystyle p}$) is row ${\displaystyle i}$ of the matrix .

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly

${\displaystyle ({\boldsymbol {\mu )),{\boldsymbol {\Sigma )))\sim \mathrm {NIW} ({\boldsymbol {\mu ))_{0},\lambda ,{\boldsymbol {\Psi )),\nu ).}$

The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart

${\displaystyle ({\boldsymbol {\mu )),{\boldsymbol {\Sigma ))|y)\sim \mathrm {NIW} ({\boldsymbol {\mu ))_{n},\lambda _{n},{\boldsymbol {\Psi ))_{n},\nu _{n}),}$

where

${\displaystyle {\boldsymbol {\mu ))_{n}={\frac {\lambda {\boldsymbol {\mu ))_{0}+n{\bar {\boldsymbol {y)))){\lambda +n))}$
${\displaystyle \lambda _{n}=\lambda +n}$
${\displaystyle \nu _{n}=\nu +n}$
${\displaystyle {\boldsymbol {\Psi ))_{n}={\boldsymbol {\Psi +S))+{\frac {\lambda n}{\lambda +n))({\boldsymbol ((\bar {y))-\mu _{0))})({\boldsymbol ((\bar {y))-\mu _{0))})^{T}~~~\mathrm {with} ~~{\boldsymbol {S))=\sum _{i=1}^{n}({\boldsymbol {y_{i}-{\bar {y)))))({\boldsymbol {y_{i}-{\bar {y)))))^{T))$.

To sample from the joint posterior of ${\displaystyle ({\boldsymbol {\mu )),{\boldsymbol {\Sigma )))}$, one simply draws samples from ${\displaystyle {\boldsymbol {\Sigma ))|{\boldsymbol {y))\sim {\mathcal {W))^{-1}({\boldsymbol {\Psi ))_{n},\nu _{n})}$, then draw ${\displaystyle {\boldsymbol {\mu ))|{\boldsymbol {\Sigma ,y))\sim {\mathcal {N))_{p}({\boldsymbol {\mu ))_{n},{\boldsymbol {\Sigma ))/\lambda _{n})}$. To draw from the posterior predictive of a new observation, draw ${\displaystyle {\boldsymbol {\tilde {y))}|{\boldsymbol {\mu ,\Sigma ,y))\sim {\mathcal {N))_{p}({\boldsymbol {\mu )),{\boldsymbol {\Sigma )))}$ , given the already drawn values of ${\displaystyle {\boldsymbol {\mu ))}$ and ${\displaystyle {\boldsymbol {\Sigma ))}$.[3]

## Generating normal-inverse-Wishart random variates

Generation of random variates is straightforward:

1. Sample ${\displaystyle {\boldsymbol {\Sigma ))}$ from an inverse Wishart distribution with parameters ${\displaystyle {\boldsymbol {\Psi ))}$ and ${\displaystyle \nu }$
2. Sample ${\displaystyle {\boldsymbol {\mu ))}$ from a multivariate normal distribution with mean ${\displaystyle {\boldsymbol {\mu ))_{0))$ and variance ${\displaystyle {\boldsymbol {\tfrac {1}{\lambda ))}{\boldsymbol {\Sigma ))}$

## Related distributions

• The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If ${\displaystyle ({\boldsymbol {\mu )),{\boldsymbol {\Sigma )))\sim \mathrm {NIW} ({\boldsymbol {\mu ))_{0},\lambda ,{\boldsymbol {\Psi )),\nu )}$ then ${\displaystyle ({\boldsymbol {\mu )),{\boldsymbol {\Sigma ))^{-1})\sim \mathrm {NW} ({\boldsymbol {\mu ))_{0},\lambda ,{\boldsymbol {\Psi ))^{-1},\nu )}$ .
• The normal-inverse-gamma distribution is the one-dimensional equivalent.
• The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.

## Notes

1. ^ Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]
2. ^ Simon J.D. Prince(June 2012). Computer Vision: Models, Learning, and Inference. Cambridge University Press. 3.8: "Normal inverse Wishart distribution".
3. ^ Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.

## References

• Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
• Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [2]