4. Priors as Hidden Mixtures*

4. Priors as Hidden Mixtures*#

Mixture Distributions / Scale Mixtures:#

A mixture distribution is a probability distribution that is constructed as a weighted combination of two or more other probability distributions. Weights can be deterministic or random. For example, if a distribution is parameterized by a location and scale parameter, then the following procedure creates a new distribution called a scale mixture:

Fix the location parameter to a constant
Vary the scale parameter according to another distribution

This idea alone is not inherently Bayesian, but does leverage the hierarchical structure that we’ve talked about. You might be surprised to know that some of the most important distributions you’ve encountered in frequentist courses can be constructed the same way.

Consider \(X\sim \mathcal{N}(0,\sigma^{2})\) If instead of explicitly defining \(\sigma^{2}\), we decide to draw it from an \(\text{Inverse Gamma}(\frac{\nu}{2} , \frac{\nu}{2})\) distribution, where \(\nu\) is some fixed positive real value:

\[\begin{split} \begin{align*} X&\sim \mathcal{N}(0,\sigma^{2})\\ \sigma^{2} &\sim \text{InvGamma}(\frac{\nu}{2} , \frac{\nu}{2}) \end{align*} \end{split}\]

We can apply The definition of Mixture Distribution from Wikipedia (See “Uncountable mixtures” heading) to derive the resulting density function:

\[\begin{align*} p(x) &= \int_0^\infty \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{x^2}{2\sigma^2} \right) \cdot \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)} (\sigma^2)^{-\nu/2-1} \exp\left( -\frac{\nu/2}{\sigma^2} \right) d\sigma^2 \\[2em] &= \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)\sqrt{2\pi}} \int_0^\infty (\sigma^2)^{-\frac{1}{2}-\frac{\nu}{2}-1} \exp\left( -\frac{x^2+\nu}{2\sigma^2} \right) d\sigma^2 \\[2em] &= \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)\sqrt{2\pi}} \int_0^\infty (\sigma^2)^{-\frac{\nu+1}{2}-1} \exp\left( -\frac{x^2+\nu}{2\sigma^2} \right) d\sigma^2\\ &= \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)\sqrt{2\pi}} \int_0^\infty s^{\frac{\nu+1}{2}-1} \exp\left( -\frac{x^2+\nu}{2} s \right) ds\\ &= \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)\sqrt{2\pi}} \cdot \frac{\Gamma\left( \frac{\nu+1}{2} \right)}{\left( \frac{x^2+\nu}{2} \right)^{(\nu+1)/2}}\\ &= \frac{\Gamma\left( \frac{\nu+1}{2} \right)} {\Gamma(\nu/2)\sqrt{\nu\pi} }\left(1+\frac{x^2}{\nu}\right)^{-(\nu+1)/2} \end{align*}\]

which shows that \(X\) has a Student’s \(t\)-Distribution with \(\nu\) degrees of freedom. Notice that we never observe \(\sigma^{2}\): we integrate it out. This is the hallmark of a hidden mixture: when the mixture component (here, \(\sigma^{2}\)) is latent.

Adapted from Unit 7: tasmixture.odc and Jeremymus.odc

Student T likelihood with Normal prior vs Normal likelihood with Gamma prior#

This example demonstrates the equivalence of a couple methods. Variable names from original problem are retained right now, but they are kind of a mess. Will fix later.

Confusingly, the Student’s T distribution parameter sigma in PyMC is equivalent to the tau parameter in the BUGS parameterization.

df = 6
x = y = 10
mu0 = 6
tau1 = 10
tau = 0.4
a = df / 2
b = df / (2 * tau)

with pm.Model() as m1:
    mu1 = pm.StudentT("mu1", nu=df, mu=mu0, sigma=tau)
    pm.Normal("X", mu1, tau=tau1, observed=x)

    prec = pm.Gamma("prec", a, b)
    mu2 = pm.Normal("mu2", mu0, tau=prec)
    pm.Normal("Y", mu2, tau=tau1, observed=y)

    trace = pm.sample(30000)

Jeremy’s IQ - Normal prior as a scale mixture of uniforms#

Any symmetric unimodal distribution is a scale mixture of uniforms.

\[y|\mu, \sigma^2 \sim \mathcal{N}(\mu, \sigma^2)\]

is the same as

\[y|\mu, \xi \sim U(\mu - \sigma d^{1/2}, \mu + \sigma d^{1/2})\]

\[d \sim Ga(3/2, 1/2)\]

If \(d \sim Ga(3/2, s/2)\), \(s \lt 1\) heavy tails, \(s\gt 1\) light tails

precy = 0.0125
precth = 0.0083333
mu = 110
s_list = [0.5, 1, 2]
y = 98

summaries = []

for s in s_list:
    beta = s / 2
    with pm.Model() as m2:
        d = pm.Gamma("d", 1.5, beta)
        lb = mu - (d / precth) ** 0.5
        ub = mu + (d / precth) ** 0.5

        theta = pm.Uniform("theta", lb, ub)

        pm.Normal("y", mu=theta, tau=precy, observed=y)

        trace = pm.sample(3000)

    summaries.append(az.summary(trace, kind="stats"))

for s, summary in zip(s_list, summaries):
    print(f"{s=}")
    print(summary)
    print("\n")

s=0.5
          mean     sd  hdi_3%  hdi_97%
d        5.214  4.249   0.043   12.921
theta  101.093  7.655  87.282  115.913


s=1
          mean     sd  hdi_3%  hdi_97%
d        2.838  2.287   0.008    6.886
theta  102.950  6.902  90.222  116.055


s=2
          mean     sd  hdi_3%  hdi_97%
d        1.512  1.192   0.014    3.715
theta  104.845  5.870  93.745  115.580

%load_ext watermark
%watermark -n -u -v -iv -p pytensor

Last updated: Wed Mar 22 2023

Python implementation: CPython
Python version       : 3.11.0
IPython version      : 8.9.0

pytensor: 2.10.1

pymc : 5.1.2
numpy: 1.24.2
arviz: 0.14.0

4. Priors as Hidden Mixtures*

Contents

4. Priors as Hidden Mixtures*#

Mixture Distributions / Scale Mixtures:#

Student T likelihood with Normal prior vs Normal likelihood with Gamma prior#

Jeremy’s IQ - Normal prior as a scale mixture of uniforms#