import pymc as pm
import pandas as pd
import numpy as np
import arviz as az
import pytensor.tensor as pt

2. Dental Development*#

Adapted from Unit 10: growth.odc.

The lecture version of the response data is here.

The updated data, which should match the original paper, is available here.

Problem statement: MVN with gender-Specific means but common precision matrix#

This dataset on dental development was first introduced by Potthoff and Roy in their 1964 paper. It comprises longitudinal observations of 11 girls (gender=1) and 15 boys (gender=2).

Each subject has 4 observations, centered at times -3, -1, 1, and 3, where the units are measured in years.

The measurement for each subject is the distance (in mm) from the center of the pituitary to the pterygomaxillary fissure.

Potthoff and Roy (1964). “A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems,” Biometrika, 51, pp. 313-326.

Lecture errata#

There were only 15 rows for boys in the lecture data. I’ve fixed the dataset to match the study in this version.

Notes#

There’s an interesting discussion in the PyMC Github issues tracker on problems with the Wishart prior, and an intro to using the Lewandowski-Kurowicka-Joe distribution as a prior on the covariance matrix here. Another option might be to use Wishart-Bartlett rather than LKJCholeskyCov, but the PyMC devs still don’t recommend that.

I currently have a working version below, but am not sure if it’s correct because it doesn’t fully match the BUGS results. I split the likelihoods for male and female with a shared covariance matrix. It could just be that the difference is from the different prior on the covariance matrix, but it could also be something else.

dental = pd.read_csv("../data/dental_new.csv")
time = np.array([-3, -1, 1, 3])
girls = dental.query("Girl == 1")
boys = dental.query("Girl == 0")
girls_y = girls.iloc[:, :-1].to_numpy()
boys_y = boys.iloc[:, :-1].to_numpy()
girls_y.shape
(11, 4)
with pm.Model() as m_double:
    beta1 = pm.Normal("beta1", 20, tau=0.001, shape=2)
    beta2 = pm.Normal("beta2", 1, tau=0.001, shape=2)

    sd_dist = pm.Normal.dist(0, 2, shape=4)
    T, corr, _ = pm.LKJCholeskyCov(
        "T", n=4, eta=2, sd_dist=sd_dist, compute_corr=True
    )

    mu_male = pm.Deterministic("mu_boys", beta1[0] + beta2[0] * time)
    mu_female = pm.Deterministic("mu_girls", beta1[1] + beta2[1] * time)

    pm.MvNormal(
        "likelihood_boys", mu_male, chol=T, shape=(16, 4), observed=boys_y
    )
    pm.MvNormal(
        "likelihood_girls", mu_female, chol=T, shape=(11, 4), observed=girls_y
    )

    pm.Deterministic("corr", corr)

    trace = pm.sample(1000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta1, beta2, T]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.
az.summary(trace, var_names="beta", filter_vars="like")
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
beta1[0] 24.943 0.462 24.079 25.812 0.006 0.004 5482.0 2889.0 1.00
beta1[1] 22.647 0.504 21.713 23.595 0.007 0.005 5038.0 2715.0 1.01
beta2[0] 0.864 0.101 0.680 1.062 0.002 0.001 4338.0 3040.0 1.00
beta2[1] 0.476 0.111 0.262 0.676 0.002 0.001 4939.0 2816.0 1.00
az.summary(trace, var_names="corr")
/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
corr[0, 0] 1.000 0.000 1.000 1.000 0.000 0.000 4000.0 4000.0 NaN
corr[0, 1] 0.415 0.147 0.129 0.667 0.002 0.002 3706.0 3057.0 1.0
corr[0, 2] 0.507 0.135 0.268 0.759 0.002 0.002 3677.0 3099.0 1.0
corr[0, 3] 0.340 0.156 0.043 0.616 0.003 0.002 2954.0 2893.0 1.0
corr[1, 0] 0.415 0.147 0.129 0.667 0.002 0.002 3706.0 3057.0 1.0
corr[1, 1] 1.000 0.000 1.000 1.000 0.000 0.000 3944.0 3758.0 1.0
corr[1, 2] 0.411 0.147 0.134 0.673 0.002 0.002 3677.0 3032.0 1.0
corr[1, 3] 0.623 0.116 0.405 0.823 0.002 0.001 3346.0 2861.0 1.0
corr[2, 0] 0.507 0.135 0.268 0.759 0.002 0.002 3677.0 3099.0 1.0
corr[2, 1] 0.411 0.147 0.134 0.673 0.002 0.002 3677.0 3032.0 1.0
corr[2, 2] 1.000 0.000 1.000 1.000 0.000 0.000 3923.0 4000.0 1.0
corr[2, 3] 0.609 0.114 0.404 0.812 0.002 0.001 4690.0 2838.0 1.0
corr[3, 0] 0.340 0.156 0.043 0.616 0.003 0.002 2954.0 2893.0 1.0
corr[3, 1] 0.623 0.116 0.405 0.823 0.002 0.001 3346.0 2861.0 1.0
corr[3, 2] 0.609 0.114 0.404 0.812 0.002 0.001 4690.0 2838.0 1.0
corr[3, 3] 1.000 0.000 1.000 1.000 0.000 0.000 3779.0 3644.0 1.0
az.summary(trace, var_names=["mu_boys", "mu_girls"])
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
mu_boys[0] 22.353 0.526 21.368 23.332 0.007 0.005 5618.0 2769.0 1.00
mu_boys[1] 24.080 0.462 23.213 24.935 0.006 0.004 5699.0 2916.0 1.00
mu_boys[2] 25.807 0.483 24.914 26.713 0.007 0.005 5210.0 2903.0 1.00
mu_boys[3] 27.534 0.578 26.501 28.665 0.008 0.006 4760.0 2889.0 1.00
mu_girls[0] 21.219 0.571 20.184 22.331 0.008 0.005 5669.0 2573.0 1.00
mu_girls[1] 22.171 0.504 21.227 23.098 0.007 0.005 5311.0 2836.0 1.00
mu_girls[2] 23.123 0.529 22.109 24.084 0.008 0.005 4816.0 2660.0 1.01
mu_girls[3] 24.075 0.635 22.900 25.262 0.009 0.007 4559.0 2841.0 1.00
%load_ext watermark
%watermark -n -u -v -iv -p pytensor
Last updated: Tue Nov 19 2024

Python implementation: CPython
Python version       : 3.12.7
IPython version      : 8.29.0

pytensor: 2.26.0

arviz   : 0.20.0
pandas  : 2.2.3
pytensor: 2.26.0
pymc    : 5.18.0
numpy   : 1.26.4