import pymc as pm
import pandas as pd
import numpy as np
import arviz as az
import pytensor.tensor as pt

2. Dental Development*#

Adapted from Unit 10: growth.odc.

The lecture version of the response data is here.

The updated data, which should match the original paper, is available here.

Problem statement: MVN with gender-Specific means but common precision matrix#

This dataset on dental development was first introduced by Potthoff and Roy in their 1964 paper. It comprises longitudinal observations of 11 girls (gender=1) and 15 boys (gender=2).

Each subject has 4 observations, centered at times -3, -1, 1, and 3, where the units are measured in years.

The measurement for each subject is the distance (in mm) from the center of the pituitary to the pterygomaxillary fissure.

Potthoff and Roy (1964). “A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems,” Biometrika, 51, pp. 313-326.

Lecture errata#

There were only 15 rows for boys in the lecture data. I’ve fixed the dataset to match the study in this version.

Notes#

There’s an interesting discussion in the PyMC Github issues tracker on problems with the Wishart prior, and an intro to using the Lewandowski-Kurowicka-Joe distribution as a prior on the covariance matrix here. Another option might be to use Wishart-Bartlett rather than LKJCholeskyCov, but the PyMC devs still don’t recommend that.

I currently have a working version below, but am not sure if it’s correct because it doesn’t fully match the BUGS results. I split the likelihoods for male and female with a shared covariance matrix. It could just be that the difference is from the different prior on the covariance matrix, but it could also be something else.

dental = pd.read_csv("../data/dental_new.csv")
time = np.array([-3, -1, 1, 3])
girls = dental.query("Girl == 1")
boys = dental.query("Girl == 0")
girls_y = girls.iloc[:, :-1].to_numpy()
boys_y = boys.iloc[:, :-1].to_numpy()
with pm.Model() as m_double:
    beta1 = pm.Normal("beta1", 20, tau=0.001, shape=2)
    beta2 = pm.Normal("beta2", 1, tau=0.001, shape=2)

    sd_dist = pm.Normal.dist(0, 2, shape=4)
    T, corr, _ = pm.LKJCholeskyCov("T", n=4, eta=2, sd_dist=sd_dist, compute_corr=True)

    mu_male = pm.Deterministic("mu_boys", beta1[0] + beta2[0] * time)
    mu_female = pm.Deterministic("mu_girls", beta1[1] + beta2[1] * time)

    pm.MvNormal("likelihood_boys", mu_male, chol=T, shape=(11, 4), observed=girls_y)
    pm.MvNormal(
        "likelihood_girls", mu_female, chol=T, shape=(16, 4), observed=boys_y
    )

    pm.Deterministic("corr", corr)

    trace = pm.sample(1000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta1, beta2, T]
100.00% [8000/8000 00:02<00:00 Sampling 4 chains, 0 divergences]
/Users/aaron/mambaforge/envs/pymc/lib/python3.11/site-packages/pytensor/compile/function/types.py:970: RuntimeWarning: invalid value encountered in accumulate
  self.vm()
/Users/aaron/mambaforge/envs/pymc/lib/python3.11/site-packages/pytensor/compile/function/types.py:970: RuntimeWarning: invalid value encountered in accumulate
  self.vm()
/Users/aaron/mambaforge/envs/pymc/lib/python3.11/site-packages/pytensor/compile/function/types.py:970: RuntimeWarning: invalid value encountered in accumulate
  self.vm()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 3 seconds.
az.summary(trace, var_names="beta", filter_vars="like")
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
beta1[0] 22.664 0.509 21.744 23.647 0.007 0.005 5807.0 2831.0 1.0
beta1[1] 24.940 0.447 24.129 25.789 0.006 0.004 5139.0 2933.0 1.0
beta2[0] 0.476 0.110 0.269 0.682 0.002 0.001 5355.0 2797.0 1.0
beta2[1] 0.865 0.102 0.664 1.045 0.001 0.001 5063.0 3028.0 1.0
az.summary(trace, var_names="corr")
/Users/aaron/mambaforge/envs/pymc/lib/python3.11/site-packages/arviz/stats/diagnostics.py:592: RuntimeWarning: invalid value encountered in scalar divide
  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
corr[0, 0] 1.000 0.000 1.000 1.000 0.000 0.000 4000.0 4000.0 NaN
corr[0, 1] 0.413 0.146 0.147 0.683 0.003 0.002 3404.0 2840.0 1.0
corr[0, 2] 0.500 0.138 0.235 0.741 0.002 0.002 3368.0 2659.0 1.0
corr[0, 3] 0.334 0.158 0.025 0.613 0.003 0.002 2490.0 2991.0 1.0
corr[1, 0] 0.413 0.146 0.147 0.683 0.003 0.002 3404.0 2840.0 1.0
corr[1, 1] 1.000 0.000 1.000 1.000 0.000 0.000 3503.0 3524.0 1.0
corr[1, 2] 0.413 0.142 0.148 0.669 0.002 0.002 3780.0 2744.0 1.0
corr[1, 3] 0.626 0.117 0.408 0.823 0.002 0.001 3561.0 3120.0 1.0
corr[2, 0] 0.500 0.138 0.235 0.741 0.002 0.002 3368.0 2659.0 1.0
corr[2, 1] 0.413 0.142 0.148 0.669 0.002 0.002 3780.0 2744.0 1.0
corr[2, 2] 1.000 0.000 1.000 1.000 0.000 0.000 3911.0 3785.0 1.0
corr[2, 3] 0.608 0.113 0.392 0.800 0.002 0.001 4558.0 2589.0 1.0
corr[3, 0] 0.334 0.158 0.025 0.613 0.003 0.002 2490.0 2991.0 1.0
corr[3, 1] 0.626 0.117 0.408 0.823 0.002 0.001 3561.0 3120.0 1.0
corr[3, 2] 0.608 0.113 0.392 0.800 0.002 0.001 4558.0 2589.0 1.0
corr[3, 3] 1.000 0.000 1.000 1.000 0.000 0.000 3707.0 3953.0 1.0
az.summary(trace, var_names=["mu_boys", "mu_girls"])
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
mu_boys[0] 21.235 0.578 20.207 22.399 0.008 0.005 5646.0 3012.0 1.0
mu_boys[1] 22.188 0.510 21.279 23.194 0.007 0.005 5785.0 3034.0 1.0
mu_boys[2] 23.140 0.531 22.117 24.106 0.007 0.005 5622.0 3215.0 1.0
mu_boys[3] 24.092 0.634 22.858 25.239 0.009 0.006 5449.0 3052.0 1.0
mu_girls[0] 22.344 0.508 21.342 23.232 0.007 0.005 5833.0 3474.0 1.0
mu_girls[1] 24.075 0.445 23.246 24.900 0.006 0.004 5368.0 3210.0 1.0
mu_girls[2] 25.805 0.471 24.960 26.695 0.007 0.005 5010.0 3140.0 1.0
mu_girls[3] 27.536 0.572 26.458 28.584 0.008 0.006 4758.0 3029.0 1.0
%load_ext watermark
%watermark -n -u -v -iv -p pytensor
Last updated: Tue Nov 14 2023

Python implementation: CPython
Python version       : 3.11.5
IPython version      : 8.15.0

pytensor: 2.17.1

pymc    : 5.9.0
pandas  : 2.1.0
numpy   : 1.25.2
arviz   : 0.16.1
pytensor: 2.17.1