import pymc as pm
import numpy as np
import arviz as az

4. Priors as Hidden Mixtures*#

Mixture Distributions / Scale Mixtures:#

A mixture distribution is a probability distribution that is constructed as a weighted combination of two or more other probability distributions. Weights can be deterministic or random. For example, if a distribution is parameterized by a location and scale parameter, then the following procedure creates a new distribution called a scale mixture:

  • Fix the location parameter to a constant

  • Vary the scale parameter according to another distribution

This idea alone is not inherently Bayesian, but does leverage the hierarchical structure that we’ve talked about. You might be surprised to know that some of the most important distributions you’ve encountered in frequentist courses can be constructed the same way.

Consider \(X\sim \mathcal{N}(0,\sigma^{2})\) If instead of explicitly defining \(\sigma^{2}\), we decide to draw it from an \(\text{Inverse Gamma}(\frac{\nu}{2} , \frac{\nu}{2})\) distribution, where \(\nu\) is some fixed positive real value:

\[\begin{split} \begin{align*} X&\sim \mathcal{N}(0,\sigma^{2})\\ \sigma^{2} &\sim \text{InvGamma}(\frac{\nu}{2} , \frac{\nu}{2}) \end{align*} \end{split}\]

We can apply The definition of Mixture Distribution from Wikipedia (See “Uncountable mixtures” heading) to derive the resulting density function:

\[\begin{align*} p(x) &= \int_0^\infty \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{x^2}{2\sigma^2} \right) \cdot \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)} (\sigma^2)^{-\nu/2-1} \exp\left( -\frac{\nu/2}{\sigma^2} \right) d\sigma^2 \\[2em] &= \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)\sqrt{2\pi}} \int_0^\infty (\sigma^2)^{-\frac{1}{2}-\frac{\nu}{2}-1} \exp\left( -\frac{x^2+\nu}{2\sigma^2} \right) d\sigma^2 \\[2em] &= \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)\sqrt{2\pi}} \int_0^\infty (\sigma^2)^{-\frac{\nu+1}{2}-1} \exp\left( -\frac{x^2+\nu}{2\sigma^2} \right) d\sigma^2\\ &= \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)\sqrt{2\pi}} \int_0^\infty s^{\frac{\nu+1}{2}-1} \exp\left( -\frac{x^2+\nu}{2} s \right) ds\\ &= \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)\sqrt{2\pi}} \cdot \frac{\Gamma\left( \frac{\nu+1}{2} \right)}{\left( \frac{x^2+\nu}{2} \right)^{(\nu+1)/2}}\\ &= \frac{\Gamma\left( \frac{\nu+1}{2} \right)} {\Gamma(\nu/2)\sqrt{\nu\pi} }\left(1+\frac{x^2}{\nu}\right)^{-(\nu+1)/2} \end{align*}\]

which shows that \(X\) has a Student’s \(t\)-Distribution with \(\nu\) degrees of freedom. Notice that we never observe \(\sigma^{2}\): we integrate it out. This is the hallmark of a hidden mixture: when the mixture component (here, \(\sigma^{2}\)) is latent.

Adapted from Unit 7: tasmixture.odc and Jeremymus.odc

Student T likelihood with Normal prior vs Normal likelihood with Gamma prior#

This example demonstrates the equivalence of a couple methods. Variable names from original problem are retained right now, but they are kind of a mess. Will fix later.

Confusingly, the Student’s T distribution parameter sigma in PyMC is equivalent to the tau parameter in the BUGS parameterization.

df = 6
x = y = 10
mu0 = 6
tau1 = 10
tau = 0.4
a = df / 2
b = df / (2 * tau)

with pm.Model() as m1:
    mu1 = pm.StudentT("mu1", nu=df, mu=mu0, sigma=tau)
    pm.Normal("X", mu1, tau=tau1, observed=x)

    prec = pm.Gamma("prec", a, b)
    mu2 = pm.Normal("mu2", mu0, tau=prec)
    pm.Normal("Y", mu2, tau=tau1, observed=y)

    trace = pm.sample(30000)
Hide code cell output
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu1, prec, mu2]

Sampling 4 chains for 1_000 tune and 30_000 draw iterations (4_000 + 120_000 draws total) took 6 seconds.

Jeremy’s IQ - Normal prior as a scale mixture of uniforms#

Any symmetric unimodal distribution is a scale mixture of uniforms.

\[y|\mu, \sigma^2 \sim \mathcal{N}(\mu, \sigma^2)\]

is the same as

\[y|\mu, \xi \sim U(\mu - \sigma d^{1/2}, \mu + \sigma d^{1/2})\]
\[d \sim Ga(3/2, 1/2)\]

If \(d \sim Ga(3/2, s/2)\), \(s \lt 1\) heavy tails, \(s\gt 1\) light tails

precy = 0.0125
precth = 0.0083333
mu = 110
s_list = [0.5, 1, 2]
y = 98

summaries = []

for s in s_list:
    beta = s / 2
    with pm.Model() as m2:
        d = pm.Gamma("d", 1.5, beta)
        lb = mu - (d / precth) ** 0.5
        ub = mu + (d / precth) ** 0.5

        theta = pm.Uniform("theta", lb, ub)

        pm.Normal("y", mu=theta, tau=precy, observed=y)

        trace = pm.sample(3000)

    summaries.append(az.summary(trace, kind="stats"))
Hide code cell output
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [d, theta]
100.00% [16000/16000 00:01<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 1 seconds.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [d, theta]
100.00% [16000/16000 00:01<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 1 seconds.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [d, theta]
100.00% [16000/16000 00:01<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 1 seconds.
for s, summary in zip(s_list, summaries):
    print(f"{s=}")
    print(summary)
    print("\n")
s=0.5
          mean     sd  hdi_3%  hdi_97%
d        5.214  4.249   0.043   12.921
theta  101.093  7.655  87.282  115.913


s=1
          mean     sd  hdi_3%  hdi_97%
d        2.838  2.287   0.008    6.886
theta  102.950  6.902  90.222  116.055


s=2
          mean     sd  hdi_3%  hdi_97%
d        1.512  1.192   0.014    3.715
theta  104.845  5.870  93.745  115.580
%load_ext watermark
%watermark -n -u -v -iv -p pytensor
Last updated: Wed Mar 22 2023

Python implementation: CPython
Python version       : 3.11.0
IPython version      : 8.9.0

pytensor: 2.10.1

pymc : 5.1.2
numpy: 1.24.2
arviz: 0.14.0