import pymc as pm
import numpy as np
import arviz as az
import pandas as pd

6. Gesell*#

Adapted from Unit 9: Gesell.odc.

Gesell Developmental Schedules measure child development. The scoring is devised by the American child psychologist and pediatrician Arnold Gesell who founded the Clinic of Child Development at Yale in 1911 and directed it for many years. The Gesell Developmental Schedules are a gauge of the status of a child’s motor and language development and personal-social and adaptive behaviors.

For 21 children the age in months at which they first spoke and their Gesell Adaptive Score, which is the result of an aptitude test taken much later was recorded. These data were originally collected by L. M. Linde of UCLA but were first published by M. R. Mickey, O. J. Dunn, and V. Clark, “Note on the use of stepwise regression in detecting outliers,” Computers and Biomedical Research, 1 (1967), pp.105-111.

The data have been used by several authors, e.g., N. R. Draper and J. A. John, “Influential observations and outliers in regression,” Technometrics, 23 (1981), pp. 21-26.

Let \(X\) be the age in months a child speaks his/her first word and let \(Y\) be the Gesell adaptive score, a measure of a child’s aptitude (observed later on):

Child

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

Age

15

26

10

9

15

20

18

11

8

20

7

9

10

11

11

10

12

42

17

11

10

Score

95

71

83

91

102

87

93

100

104

94

113

96

83

84

102

100

105

57

121

86

100

A correlation of \(r = 0.640\) indicates a moderately strong negative relationship between age at first word spoken and Gesell score.

How does the child’s aptitude change with how long it takes them to speak?

Are there any influential observations/potential outliers?

The solution follows a two stage modeling procedure.

  1. Obtain mean ICPOs then take reciprocal of mean ICPOs (these are CPOs). and consder them as an input for model B

  2. Read in and calibrate CPOs with posterior ordinates.

data = pd.read_csv("../data/gesell.csv")
y = data["Score"].to_numpy()
x = data["Age"].to_numpy()
# fmt: off
x = np.array(
    (15, 26, 10, 9, 15, 20, 18, 11, 8, 20, 7, 9, 10, 11, 11, 10, 12, 42, 17, 11, 10)
)

y = np.array(
    (95, 71, 83, 91, 102, 87, 93, 100, 104, 94, 113, 96, 83, 84, 102, 100, 105, 57, 121, 86, 100)
)
# fmt: on

Part 1#

with pm.Model() as model_a:
    tau = pm.Exponential("tau", 0.001)
    s2 = pm.Deterministic("s2", 1 / tau)
    b = pm.Normal("b", 0, tau=0.00001, shape=2)

    m = b[0] + b[1] * x
    r = y - m
    f = pm.math.sqrt(tau / (2 * np.pi)) * pm.math.exp(-0.5 * tau * r**2)
    icpo = pm.Deterministic("icpo", 1 / f)
    pm.Normal("likelihood", mu=m, tau=tau, observed=y)

    trace = pm.sample(100000, target_accept=0.99)
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [tau, b]

Sampling 4 chains for 1_000 tune and 100_000 draw iterations (4_000 + 400_000 draws total) took 58 seconds.
icpo_df = az.summary(trace, var_names="icpo", round_to=5, filter_vars="like")
icpo_df
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
icpo[0] 28.48327 4.77307 20.41117 37.51935 0.01270 0.01188 148960.05627 173110.29799 1.00002
icpo[1] 47.92952 24.85691 22.41912 86.46739 0.05466 0.31052 220511.62804 259434.33259 1.00001
icpo[2] 92.84870 50.92318 35.89598 174.35683 0.11278 0.35184 190963.10961 229419.07626 1.00002
icpo[3] 40.58560 10.56225 24.85333 59.60008 0.02511 0.03683 189295.97665 227151.47057 1.00000
icpo[4] 40.80165 8.67577 26.81541 56.64274 0.01550 0.02208 346546.49815 269409.34522 1.00003
icpo[5] 28.33425 4.90017 20.00769 37.45983 0.01337 0.01309 141828.52652 168852.19890 1.00002
icpo[6] 29.65078 5.16877 20.88954 39.27953 0.01308 0.01266 165804.19985 194693.88076 1.00002
icpo[7] 28.91738 4.95209 20.51775 38.21846 0.01332 0.01266 146811.99594 170878.63863 1.00002
icpo[8] 29.79045 5.55431 20.54595 40.03911 0.01531 0.01700 143278.03986 166315.48737 1.00002
icpo[9] 34.92688 7.72001 22.90260 49.05657 0.01627 0.02298 244349.99573 252190.08120 1.00001
icpo[10] 51.87899 20.47637 26.15311 86.36591 0.05005 0.09936 168305.49618 224023.92178 1.00001
icpo[11] 30.18129 5.55633 20.95247 40.49282 0.01484 0.01645 152106.11097 174638.16992 1.00001
icpo[12] 92.84870 50.92318 35.89598 174.35683 0.11278 0.35184 190963.10961 229419.07626 1.00002
icpo[13] 67.01750 26.18279 33.21117 111.86408 0.05529 0.11506 222353.75040 261778.86813 1.00002
icpo[14] 30.86499 5.49400 21.67276 41.18366 0.01406 0.01380 164015.12371 191403.07774 1.00003
icpo[15] 28.42835 4.87392 20.07938 37.49944 0.01339 0.01291 140597.34183 162885.94627 1.00002
icpo[16] 39.67712 8.55294 25.87483 55.14984 0.01787 0.02203 244943.44626 254942.08236 1.00002
icpo[17] 66.70702 899.07151 18.14270 135.15314 2.04616 393.60416 167435.38657 198769.29890 1.00002
icpo[18] 6440.91285 93978.34059 76.80006 16744.04601 176.78980 31602.63650 181705.03772 219452.62340 1.00001
icpo[19] 52.36048 15.96305 30.01894 80.82660 0.03352 0.05572 233658.27862 266970.25560 1.00001
icpo[20] 28.42835 4.87392 20.07938 37.49944 0.01339 0.01291 140597.34183 162885.94627 1.00002

ICPO results match BUGS.

cpo_means = 1 / icpo_df["mean"].to_numpy()
cpo_means
array([0.0351101 , 0.02082093, 0.01081196, 0.02470228, 0.02447827,
       0.03530324, 0.03372785, 0.03456731, 0.03353138, 0.02862887,
       0.01916826, 0.0331743 , 0.01081196, 0.01496989, 0.03236822,
       0.03516906, 0.02515469, 0.01424554, 0.00016167, 0.01915313,
       0.03516906])

Part 2#

mlcpo = -2 * np.log(cpo_means)
MPQ = np.sum(mlcpo)
with pm.Model() as model_b:
    tau = pm.Gamma("tau", 1, 0.002)
    s2 = pm.Deterministic("s2", 1 / tau)
    b = pm.Normal("b", 0, tau=0.00001, shape=2)

    m = b[0] + b[1] * x
    r = y - m
    f = pm.Deterministic(
        "f", (tau / (2 * np.pi)) ** 0.5 * pm.math.exp(-0.5 * tau * r**2)
    )
    c = pm.Deterministic("c", cpo_means / f)

    g1 = pm.Deterministic("g1", c * pm.math.log(c))
    g2 = -pm.math.log(c)
    g3 = 0.5 * np.absolute((c - 1))
    g4 = (c - 1) ** 2

    pm.Deterministic("s_g1", pm.math.sum(g1))
    pm.Deterministic("s_g2", pm.math.sum(g2))
    pm.Deterministic("s_g3", pm.math.sum(g3))
    pm.Deterministic("s_g4", pm.math.sum(g4))

    pm.Normal("likelihood", mu=m, tau=tau, observed=y)

    traceb = pm.sample(
        2000, target_accept=0.95, idata_kwargs=dict(log_likelihood=True)
    )
Hide code cell output
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [tau, b]

Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 1 seconds.
az.summary(
    traceb, var_names="c", stat_focus="median", filter_vars="like", kind="stats"
)
median mad eti_3% eti_97%
c[0] 0.975 0.100 0.749 1.352
c[1] 0.853 0.184 0.541 2.066
c[2] 0.862 0.225 0.484 2.278
c[3] 0.952 0.141 0.674 1.638
c[4] 0.959 0.123 0.706 1.472
c[5] 0.975 0.103 0.740 1.372
c[6] 0.976 0.105 0.745 1.372
c[7] 0.974 0.103 0.743 1.355
c[8] 0.969 0.108 0.731 1.396
c[9] 0.960 0.125 0.709 1.499
c[10] 0.892 0.179 0.571 1.865
c[11] 0.973 0.109 0.733 1.413
c[12] 0.862 0.225 0.484 2.278
c[13] 0.912 0.185 0.577 1.909
c[14] 0.972 0.106 0.736 1.372
c[15] 0.975 0.102 0.743 1.363
c[16] 0.958 0.123 0.702 1.474
c[17] 0.502 0.117 0.321 2.784
c[18] 0.249 0.169 0.038 5.029
c[19] 0.938 0.158 0.638 1.722
c[20] 0.975 0.102 0.743 1.363
az.summary(traceb, var_names="s_g", filter_vars="like", kind="stats")
mean sd hdi_3% hdi_97%
s_g1 2.970 39.716 -2.703 8.927
s_g2 2.409 1.224 0.231 3.887
s_g3 2.802 3.926 1.167 5.292
s_g4 55.516 2490.039 0.592 14.393
az.summary(traceb, var_names="g1", kind="stats")
mean sd hdi_3% hdi_97%
g1[0] 0.011 0.174 -0.252 0.322
g1[1] 0.081 0.895 -0.363 0.957
g1[2] 0.123 0.898 -0.368 1.187
g1[3] 0.039 0.323 -0.312 0.563
g1[4] 0.018 0.233 -0.288 0.432
g1[5] 0.012 0.180 -0.258 0.338
g1[6] 0.013 0.181 -0.253 0.339
g1[7] 0.011 0.178 -0.249 0.336
g1[8] 0.012 0.198 -0.262 0.364
g1[9] 0.022 0.245 -0.282 0.458
g1[10] 0.051 0.579 -0.357 0.788
g1[11] 0.019 0.205 -0.275 0.371
g1[12] 0.123 0.898 -0.368 1.187
g1[13] 0.071 0.537 -0.364 0.842
g1[14] 0.011 0.187 -0.267 0.333
g1[15] 0.012 0.179 -0.261 0.325
g1[16] 0.016 0.240 -0.280 0.426
g1[17] 0.609 15.244 -0.368 1.017
g1[18] 1.656 36.209 -0.368 2.677
g1[19] 0.048 0.385 -0.332 0.658
g1[20] 0.012 0.179 -0.261 0.325
az.waic(traceb)
/Users/aaron/miniforge3/envs/pymc_spr26/lib/python3.14/site-packages/arviz/stats/stats.py:1652: UserWarning: For one or more samples the posterior variance of the log predictive densities exceeds 0.4. This could be indication of WAIC starting to fail. 
See http://arxiv.org/abs/1507.04544 for details
  warnings.warn(
Computed from 8000 posterior samples and 21 observations log-likelihood matrix.

          Estimate       SE
elpd_waic   -82.57     5.02
p_waic        3.28        -

There has been a warning during the calculation. Please check the results.
%load_ext watermark
%watermark -n -u -v -iv -p pytensor
Last updated: Tue, 17 Mar 2026

Python implementation: CPython
Python version       : 3.14.3
IPython version      : 9.11.0

pytensor: 2.38.2

arviz : 0.23.4
numpy : 2.4.2
pandas: 3.0.1
pymc  : 5.28.1