import pymc as pm
import pandas as pd
import numpy as np
import arviz as az
import pytensor.tensor as pt
2. Dental Development*#
Adapted from Unit 10: growth.odc.
The lecture version of the response data is here.
The updated data, which should match the original paper, is available here.
Problem statement: MVN with gender-Specific means but common precision matrix#
This dataset on dental development was first introduced by Potthoff and Roy in their 1964 paper. It comprises longitudinal observations of 11 girls (gender=1) and 15 boys (gender=2).
Each subject has 4 observations, centered at times -3, -1, 1, and 3, where the units are measured in years.
The measurement for each subject is the distance (in mm) from the center of the pituitary to the pterygomaxillary fissure.
Potthoff and Roy (1964). “A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems,” Biometrika, 51, pp. 313-326.
Lecture errata#
There were only 15 rows for boys in the lecture data. I’ve fixed the dataset to match the study in this version.
Notes#
There’s an interesting discussion in the PyMC Github issues tracker on problems with the Wishart prior, and an intro to using the Lewandowski-Kurowicka-Joe distribution as a prior on the covariance matrix here. Another option might be to use Wishart-Bartlett rather than LKJCholeskyCov, but the PyMC devs still don’t recommend that.
I currently have a working version below, but am not sure if it’s correct because it doesn’t fully match the BUGS results. I split the likelihoods for male and female with a shared covariance matrix. It could just be that the difference is from the different prior on the covariance matrix, but it could also be something else.
dental = pd.read_csv("../data/dental_new.csv")
time = np.array([-3, -1, 1, 3])
girls = dental.query("Girl == 1")
boys = dental.query("Girl == 0")
girls_y = girls.iloc[:, :-1].to_numpy()
boys_y = boys.iloc[:, :-1].to_numpy()
girls_y.shape
(11, 4)
with pm.Model() as m_double:
beta1 = pm.Normal("beta1", 20, tau=0.001, shape=2)
beta2 = pm.Normal("beta2", 1, tau=0.001, shape=2)
sd_dist = pm.Normal.dist(0, 2, shape=4)
T, corr, _ = pm.LKJCholeskyCov(
"T", n=4, eta=2, sd_dist=sd_dist, compute_corr=True
)
mu_male = pm.Deterministic("mu_boys", beta1[0] + beta2[0] * time)
mu_female = pm.Deterministic("mu_girls", beta1[1] + beta2[1] * time)
pm.MvNormal(
"likelihood_boys", mu_male, chol=T, shape=(16, 4), observed=boys_y
)
pm.MvNormal(
"likelihood_girls", mu_female, chol=T, shape=(11, 4), observed=girls_y
)
pm.Deterministic("corr", corr)
trace = pm.sample(1000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta1, beta2, T]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.
az.summary(trace, var_names="beta", filter_vars="like")
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
beta1[0] | 24.943 | 0.462 | 24.079 | 25.812 | 0.006 | 0.004 | 5482.0 | 2889.0 | 1.00 |
beta1[1] | 22.647 | 0.504 | 21.713 | 23.595 | 0.007 | 0.005 | 5038.0 | 2715.0 | 1.01 |
beta2[0] | 0.864 | 0.101 | 0.680 | 1.062 | 0.002 | 0.001 | 4338.0 | 3040.0 | 1.00 |
beta2[1] | 0.476 | 0.111 | 0.262 | 0.676 | 0.002 | 0.001 | 4939.0 | 2816.0 | 1.00 |
az.summary(trace, var_names="corr")
/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
(between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
corr[0, 0] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 4000.0 | 4000.0 | NaN |
corr[0, 1] | 0.415 | 0.147 | 0.129 | 0.667 | 0.002 | 0.002 | 3706.0 | 3057.0 | 1.0 |
corr[0, 2] | 0.507 | 0.135 | 0.268 | 0.759 | 0.002 | 0.002 | 3677.0 | 3099.0 | 1.0 |
corr[0, 3] | 0.340 | 0.156 | 0.043 | 0.616 | 0.003 | 0.002 | 2954.0 | 2893.0 | 1.0 |
corr[1, 0] | 0.415 | 0.147 | 0.129 | 0.667 | 0.002 | 0.002 | 3706.0 | 3057.0 | 1.0 |
corr[1, 1] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 3944.0 | 3758.0 | 1.0 |
corr[1, 2] | 0.411 | 0.147 | 0.134 | 0.673 | 0.002 | 0.002 | 3677.0 | 3032.0 | 1.0 |
corr[1, 3] | 0.623 | 0.116 | 0.405 | 0.823 | 0.002 | 0.001 | 3346.0 | 2861.0 | 1.0 |
corr[2, 0] | 0.507 | 0.135 | 0.268 | 0.759 | 0.002 | 0.002 | 3677.0 | 3099.0 | 1.0 |
corr[2, 1] | 0.411 | 0.147 | 0.134 | 0.673 | 0.002 | 0.002 | 3677.0 | 3032.0 | 1.0 |
corr[2, 2] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 3923.0 | 4000.0 | 1.0 |
corr[2, 3] | 0.609 | 0.114 | 0.404 | 0.812 | 0.002 | 0.001 | 4690.0 | 2838.0 | 1.0 |
corr[3, 0] | 0.340 | 0.156 | 0.043 | 0.616 | 0.003 | 0.002 | 2954.0 | 2893.0 | 1.0 |
corr[3, 1] | 0.623 | 0.116 | 0.405 | 0.823 | 0.002 | 0.001 | 3346.0 | 2861.0 | 1.0 |
corr[3, 2] | 0.609 | 0.114 | 0.404 | 0.812 | 0.002 | 0.001 | 4690.0 | 2838.0 | 1.0 |
corr[3, 3] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 3779.0 | 3644.0 | 1.0 |
az.summary(trace, var_names=["mu_boys", "mu_girls"])
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
mu_boys[0] | 22.353 | 0.526 | 21.368 | 23.332 | 0.007 | 0.005 | 5618.0 | 2769.0 | 1.00 |
mu_boys[1] | 24.080 | 0.462 | 23.213 | 24.935 | 0.006 | 0.004 | 5699.0 | 2916.0 | 1.00 |
mu_boys[2] | 25.807 | 0.483 | 24.914 | 26.713 | 0.007 | 0.005 | 5210.0 | 2903.0 | 1.00 |
mu_boys[3] | 27.534 | 0.578 | 26.501 | 28.665 | 0.008 | 0.006 | 4760.0 | 2889.0 | 1.00 |
mu_girls[0] | 21.219 | 0.571 | 20.184 | 22.331 | 0.008 | 0.005 | 5669.0 | 2573.0 | 1.00 |
mu_girls[1] | 22.171 | 0.504 | 21.227 | 23.098 | 0.007 | 0.005 | 5311.0 | 2836.0 | 1.00 |
mu_girls[2] | 23.123 | 0.529 | 22.109 | 24.084 | 0.008 | 0.005 | 4816.0 | 2660.0 | 1.01 |
mu_girls[3] | 24.075 | 0.635 | 22.900 | 25.262 | 0.009 | 0.007 | 4559.0 | 2841.0 | 1.00 |
%load_ext watermark
%watermark -n -u -v -iv -p pytensor
Last updated: Tue Nov 19 2024
Python implementation: CPython
Python version : 3.12.7
IPython version : 8.29.0
pytensor: 2.26.0
arviz : 0.20.0
pandas : 2.2.3
pytensor: 2.26.0
pymc : 5.18.0
numpy : 1.26.4