import arviz as az
import numpy as np
import pandas as pd
import pymc as pm
import matplotlib.pyplot as plt

8. Prediction*#

We previously covered Bayesian prediction in Unit 4, Lesson 13. Recall that the core of Bayesian prediction is the posterior predictive distribution

\[ f(x_{n+1}\vert x_{1} , \cdots , x_{n}) = \int_{\Theta}f(x_{n+1} \vert \mathbf{\theta})\pi(\theta \vert x_{1} , \cdots , x_{n})d\theta \]

Think of this as a “weighted average” of the likelihood for new data, with “weights” prescribed by for \(\pi(\theta \vert x_{1} , \cdots , x_{n})\). This is a departure from the “plug-in predictions” you may have encountered in a frequentist course. The result is a distribution of predictions. It should be clear by now that analytical methods are often intractible, and that is no different for the posterior predictive distribution. Fortunately, pymc has some tools that make this easier.

Taste of Cheese*#

Adapted from Unit 6: cheese.odc.

The link in the original .odc file is dead. I downloaded the data from here and have a copy here.

Problem Statement#

As cheddar cheese matures, a variety of chemical processes take place. The taste of matured cheese is related to the concentration of several chemicals in the final product. In a study of cheddar cheese from the Latrobe Valley of Victoria, Australia, samples of cheese were analyzed for their chemical composition and were subjected to taste tests. Overall taste scores were obtained by combining the scores from several tasters.

Can the score be predicted well by the predictors: Acetic, H2S, and Lactic?

data = pd.read_csv("../data/cheese.csv", index_col=0)
X = data[["Acetic", "H2S", "Lactic"]].to_numpy()
# add intercept column to X
X_aug = np.concatenate((np.ones((X.shape[0], 1)), X), axis=1)
y = data["taste"].to_numpy()

# out of sample data for our later predictions
new_obs = np.array([[1.0, 5.0, 7.1, 1.5]])
data.head(5)
taste Acetic H2S Lactic
1 12.3 4.543 3.135 0.86
2 20.9 5.159 5.043 1.53
3 39.0 5.366 5.438 1.57
4 47.9 5.759 7.496 1.81
5 5.6 4.663 3.807 0.99

There are multiple ways to get predictions on out-of-sample data in PyMC. One way is to set up a shared variable using pm.Data in the original model, then using pm.set_data to change to the new observations before calling pm.sample_posterior_predictive. If you’re doing this, it’s useful to tie the shape of the likelihood to the number of rows in your X data. Note the shape=X_data.shape[0] line in the likelihood.

Another way is to do what we call “BUGS-style” posterior predictives. This is just setting up the posterior predictive within the model context, as labeled with a comment below.

with pm.Model() as m:
    # associate data with model
    X_data = pm.Data("X", X_aug)
    y_data = pm.Data("y", y)

    # priors
    beta = pm.Normal("beta", mu=0, sigma=1000, shape=X.shape[1] + 1)
    tau = pm.Gamma("tau", alpha=0.001, beta=0.001)
    sigma = pm.Deterministic("sigma", 1 / pm.math.sqrt(tau))

    mu = pm.math.dot(X_data, beta)

    # likelihood
    pm.Normal(
        "taste_score",
        mu=mu,
        sigma=sigma,
        observed=y_data,
        shape=X_data.shape[0],
    )

    oos_mean_response = pm.Deterministic(
        "oos_mean_response", pm.math.dot(new_obs, beta)
    )
    pm.Normal("oos_prediction", oos_mean_response, sigma=sigma)

    # start sampling
    trace = pm.sample(5000, target_accept=0.95)
    # this first sample_posterior_predictive is with the in-sample data.
    # it's used for the az.r2_score function
    pm.sample_posterior_predictive(trace, extend_inferencedata=True)
Hide code cell output
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta, tau, oos_prediction]

Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 15 seconds.
Sampling: [taste_score]

az.summary(trace, hdi_prob=0.95)
mean sd hdi_2.5% hdi_97.5% mcse_mean mcse_sd ess_bulk ess_tail r_hat
beta[0] -28.623 20.216 -68.795 10.327 0.219 0.161 8525.0 10493.0 1.0
beta[1] 0.252 4.558 -8.983 8.947 0.051 0.038 7861.0 9713.0 1.0
beta[2] 3.911 1.283 1.413 6.408 0.013 0.009 9885.0 12008.0 1.0
beta[3] 19.799 8.883 2.955 38.032 0.086 0.070 10558.0 10750.0 1.0
oos_prediction[0] 30.158 11.190 8.515 52.880 0.102 0.088 12048.0 11751.0 1.0
tau 0.010 0.003 0.005 0.015 0.000 0.000 10985.0 11872.0 1.0
sigma 10.429 1.512 7.697 13.403 0.015 0.014 10985.0 11872.0 1.0
oos_mean_response[0] 30.104 3.715 22.496 37.211 0.039 0.027 9210.0 11373.0 1.0
y_pred = trace.posterior_predictive.stack(sample=("chain", "draw"))[
    "taste_score"
].values.T
az.r2_score(y, y_pred)
r2        0.576355
r2_std    0.076174
dtype: float64

PyMC may give warnings about the model unless we increase the target_accept parameter of pm.sample. This is because PyMC uses more diagnostics to check if there are any problems with its exploration of the parameter space. Divergences indicate bias in the results.

For further reading, check out Diagnosing Biased Inference with Divergences.

# single prediction on out-of-sample data (new_obs defined above)
pm.set_data({"X": new_obs}, model=m)
ppc = pm.sample_posterior_predictive(trace, model=m, predictions=True)
Hide code cell output
Sampling: [taste_score]

az.summary(ppc.predictions, hdi_prob=0.95)
mean sd hdi_2.5% hdi_97.5% mcse_mean mcse_sd ess_bulk ess_tail r_hat
taste_score[0] 30.132 11.18 8.127 52.018 0.084 0.06 17839.0 18593.0 1.0
%load_ext watermark
%watermark -n -u -v -iv -p pytensor
Last updated: Tue Mar 24 2026

Python implementation: CPython
Python version       : 3.13.8
IPython version      : 9.6.0

pytensor: 2.28.3

pandas    : 2.3.3
matplotlib: 3.10.6
arviz     : 0.22.0
pymc      : 5.21.1
numpy     : 2.3.3