import pymc as pm
import numpy as np
import arviz as az
4. Priors as Hidden Mixtures*#
Mixture Distributions / Scale Mixtures:#
A mixture distribution is a probability distribution that is constructed as a weighted combination of two or more other probability distributions. Weights can be deterministic or random. For example, if a distribution is parameterized by a location and scale parameter, then the following procedure creates a new distribution called a scale mixture:
Fix the location parameter to a constant
Vary the scale parameter according to another distribution
This idea alone is not inherently Bayesian, but does leverage the hierarchical structure that we’ve talked about. You might be surprised to know that some of the most important distributions you’ve encountered in frequentist courses can be constructed the same way.
Consider \(X\sim \mathcal{N}(0,\sigma^{2})\) If instead of explicitly defining \(\sigma^{2}\), we decide to draw it from an \(\text{Inverse Gamma}(\frac{\nu}{2} , \frac{\nu}{2})\) distribution, where \(\nu\) is some fixed positive real value:
We can apply The definition of Mixture Distribution from Wikipedia (See “Uncountable mixtures” heading) to derive the resulting density function:
which shows that \(X\) has a Student’s \(t\)-Distribution with \(\nu\) degrees of freedom. Notice that we never observe \(\sigma^{2}\): we integrate it out. This is the hallmark of a hidden mixture: when the mixture component (here, \(\sigma^{2}\)) is latent.
Adapted from Unit 7: tasmixture.odc and Jeremymus.odc
Student T likelihood with Normal prior vs Normal likelihood with Gamma prior#
This example demonstrates the equivalence of a couple methods. Variable names from original problem are retained right now, but they are kind of a mess. Will fix later.
Confusingly, the Student’s T distribution parameter sigma in PyMC is equivalent to the tau parameter in the BUGS parameterization.
df = 6
x = y = 10
mu0 = 6
tau1 = 10
tau = 0.4
a = df / 2
b = df / (2 * tau)
with pm.Model() as m1:
mu1 = pm.StudentT("mu1", nu=df, mu=mu0, sigma=tau)
pm.Normal("X", mu1, tau=tau1, observed=x)
prec = pm.Gamma("prec", a, b)
mu2 = pm.Normal("mu2", mu0, tau=prec)
pm.Normal("Y", mu2, tau=tau1, observed=y)
trace = pm.sample(30000)
Show code cell output Hide code cell output
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu1, prec, mu2]
Sampling 4 chains for 1_000 tune and 30_000 draw iterations (4_000 + 120_000 draws total) took 6 seconds.
Jeremy’s IQ - Normal prior as a scale mixture of uniforms#
Any symmetric unimodal distribution is a scale mixture of uniforms.
is the same as
If \(d \sim Ga(3/2, s/2)\), \(s \lt 1\) heavy tails, \(s\gt 1\) light tails
precy = 0.0125
precth = 0.0083333
mu = 110
s_list = [0.5, 1, 2]
y = 98
summaries = []
for s in s_list:
beta = s / 2
with pm.Model() as m2:
d = pm.Gamma("d", 1.5, beta)
lb = mu - (d / precth) ** 0.5
ub = mu + (d / precth) ** 0.5
theta = pm.Uniform("theta", lb, ub)
pm.Normal("y", mu=theta, tau=precy, observed=y)
trace = pm.sample(3000)
summaries.append(az.summary(trace, kind="stats"))
Show code cell output Hide code cell output
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [d, theta]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 1 seconds.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [d, theta]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 1 seconds.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [d, theta]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 1 seconds.
for s, summary in zip(s_list, summaries):
print(f"{s=}")
print(summary)
print("\n")
s=0.5
mean sd hdi_3% hdi_97%
d 5.214 4.249 0.043 12.921
theta 101.093 7.655 87.282 115.913
s=1
mean sd hdi_3% hdi_97%
d 2.838 2.287 0.008 6.886
theta 102.950 6.902 90.222 116.055
s=2
mean sd hdi_3% hdi_97%
d 1.512 1.192 0.014 3.715
theta 104.845 5.870 93.745 115.580
%load_ext watermark
%watermark -n -u -v -iv -p pytensor
Last updated: Wed Mar 22 2023
Python implementation: CPython
Python version : 3.11.0
IPython version : 8.9.0
pytensor: 2.10.1
pymc : 5.1.2
numpy: 1.24.2
arviz: 0.14.0