Dental Development*

2. Dental Development*#

The lecture version of the response data is here.

The updated data, which should match the original paper, is available here.

Problem statement: MVN with gender-Specific means but common precision matrix#

This dataset on dental development was first introduced by Potthoff and Roy in their 1964 paper. It comprises longitudinal observations of 11 girls (gender=1) and 15 boys (gender=2).

Each subject has 4 observations, centered at times -3, -1, 1, and 3, where the units are measured in years.

The measurement for each subject is the distance (in mm) from the center of the pituitary to the pterygomaxillary fissure.

Potthoff and Roy (1964). “A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems,” Biometrika, 51, pp. 313-326.

Lecture errata#

There were only 15 rows for boys in the lecture data. I’ve fixed the dataset to match the study in this version.

Notes#

There’s an interesting discussion in the PyMC Github issues tracker on problems with the Wishart prior, and an intro to using the Lewandowski-Kurowicka-Joe distribution as a prior on the covariance matrix here. Another option might be to use Wishart-Bartlett rather than LKJCholeskyCov, but the PyMC devs still don’t recommend that.

I currently have a working version below, but am not sure if it’s correct because it doesn’t fully match the BUGS results. I split the likelihoods for male and female with a shared covariance matrix. It could just be that the difference is from the different prior on the covariance matrix, but it could also be something else.

dental = pd.read_csv("../data/dental_new.csv")

time = np.array([-3, -1, 1, 3])
girls = dental.query("Girl == 1")
boys = dental.query("Girl == 0")
girls_y = girls.iloc[:, :-1].to_numpy()
boys_y = boys.iloc[:, :-1].to_numpy()

girls_y.shape

(11, 4)

with pm.Model() as m_double:
    beta1 = pm.Normal("beta1", 20, tau=0.001, shape=2)
    beta2 = pm.Normal("beta2", 1, tau=0.001, shape=2)

    sd_dist = pm.Normal.dist(0, 2, shape=4)
    T, corr, _ = pm.LKJCholeskyCov(
        "T", n=4, eta=2, sd_dist=sd_dist, compute_corr=True
    )

    mu_male = pm.Deterministic("mu_boys", beta1[0] + beta2[0] * time)
    mu_female = pm.Deterministic("mu_girls", beta1[1] + beta2[1] * time)

    pm.MvNormal(
        "likelihood_boys", mu_male, chol=T, shape=(16, 4), observed=boys_y
    )
    pm.MvNormal(
        "likelihood_girls", mu_female, chol=T, shape=(11, 4), observed=girls_y
    )

    pm.Deterministic("corr", corr)

    trace = pm.sample(1000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta1, beta2, T]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.

az.summary(trace, var_names="beta", filter_vars="like")

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
beta1[0]	24.943	0.462	24.079	25.812	0.006	0.004	5482.0	2889.0	1.00
beta1[1]	22.647	0.504	21.713	23.595	0.007	0.005	5038.0	2715.0	1.01
beta2[0]	0.864	0.101	0.680	1.062	0.002	0.001	4338.0	3040.0	1.00
beta2[1]	0.476	0.111	0.262	0.676	0.002	0.001	4939.0	2816.0	1.00

az.summary(trace, var_names="corr")

/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/arviz/stats/diagnostics.py:596: RuntimeWarning: invalid value encountered in scalar divide
  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
corr[0, 0]	1.000	0.000	1.000	1.000	0.000	0.000	4000.0	4000.0	NaN
corr[0, 1]	0.415	0.147	0.129	0.667	0.002	0.002	3706.0	3057.0	1.0
corr[0, 2]	0.507	0.135	0.268	0.759	0.002	0.002	3677.0	3099.0	1.0
corr[0, 3]	0.340	0.156	0.043	0.616	0.003	0.002	2954.0	2893.0	1.0
corr[1, 0]	0.415	0.147	0.129	0.667	0.002	0.002	3706.0	3057.0	1.0
corr[1, 1]	1.000	0.000	1.000	1.000	0.000	0.000	3944.0	3758.0	1.0
corr[1, 2]	0.411	0.147	0.134	0.673	0.002	0.002	3677.0	3032.0	1.0
corr[1, 3]	0.623	0.116	0.405	0.823	0.002	0.001	3346.0	2861.0	1.0
corr[2, 0]	0.507	0.135	0.268	0.759	0.002	0.002	3677.0	3099.0	1.0
corr[2, 1]	0.411	0.147	0.134	0.673	0.002	0.002	3677.0	3032.0	1.0
corr[2, 2]	1.000	0.000	1.000	1.000	0.000	0.000	3923.0	4000.0	1.0
corr[2, 3]	0.609	0.114	0.404	0.812	0.002	0.001	4690.0	2838.0	1.0
corr[3, 0]	0.340	0.156	0.043	0.616	0.003	0.002	2954.0	2893.0	1.0
corr[3, 1]	0.623	0.116	0.405	0.823	0.002	0.001	3346.0	2861.0	1.0
corr[3, 2]	0.609	0.114	0.404	0.812	0.002	0.001	4690.0	2838.0	1.0
corr[3, 3]	1.000	0.000	1.000	1.000	0.000	0.000	3779.0	3644.0	1.0

az.summary(trace, var_names=["mu_boys", "mu_girls"])

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
mu_boys[0]	22.353	0.526	21.368	23.332	0.007	0.005	5618.0	2769.0	1.00
mu_boys[1]	24.080	0.462	23.213	24.935	0.006	0.004	5699.0	2916.0	1.00
mu_boys[2]	25.807	0.483	24.914	26.713	0.007	0.005	5210.0	2903.0	1.00
mu_boys[3]	27.534	0.578	26.501	28.665	0.008	0.006	4760.0	2889.0	1.00
mu_girls[0]	21.219	0.571	20.184	22.331	0.008	0.005	5669.0	2573.0	1.00
mu_girls[1]	22.171	0.504	21.227	23.098	0.007	0.005	5311.0	2836.0	1.00
mu_girls[2]	23.123	0.529	22.109	24.084	0.008	0.005	4816.0	2660.0	1.01
mu_girls[3]	24.075	0.635	22.900	25.262	0.009	0.007	4559.0	2841.0	1.00

%load_ext watermark
%watermark -n -u -v -iv -p pytensor

Last updated: Tue Nov 19 2024

Python implementation: CPython
Python version       : 3.12.7
IPython version      : 8.29.0

pytensor: 2.26.0

arviz   : 0.20.0
pandas  : 2.2.3
pytensor: 2.26.0
pymc    : 5.18.0
numpy   : 1.26.4

Dental Development*

Contents

2. Dental Development*#

Problem statement: MVN with gender-Specific means but common precision matrix#

Lecture errata#

Notes#