1. Model Fit, Selection, and Diagnostics#
Deviance#
For model with likelihood \(f(y|\theta)\), deviance is given by
\[ D(\theta) = -2\log(f(y|\theta)) \]
When comparing 2 models with deviance, the model with smaller deviance is better, in that it better fits the data. An alternate method for calculating deviance is
\[ D_s(\theta) = -2(\log(f(y|\theta)) - \log(f(y|\theta_s))) \]
where \(f(y|\theta_s)\) is the saturated model, that is a model that perfectly fits the data. A few examples of saturated deviance are
\[\begin{split}\begin{align*}
y_i \sim Bin(n_i,\theta_i): & \space \space D_s(\underset{\sim}{\theta}) =
2 \sum_i \left ( y_i \log \frac{y_i/n_i}{\theta_i} + (n_i - y_i)\log \frac{1-y_i/n_i}{1-\theta_i} \right ) \\
y_i \sim Poi(\theta_i): & \space \space D_s(\underset{\sim}{\theta}) =
2 \sum_i \left ( y_i \log \frac{y_i}{\theta_i} - (y_i - \theta_i) \right ) \\
y_i \sim N(\theta_i,\delta_i^2): & \space \space D_s(\underset{\sim}{\theta}) =
2 \sum_i \left ( \frac{y_i - \theta_i}{\delta_i} \right ) ^2 \\
\end{align*}\end{split}\]
Deviance example#
Consider 2 models, Weibull and Exponential, for the data
\[ y = (1,1,2,2,3,4,4,5,5,8) \]
with non-informative priors. Which model has the smaller deviance?
import pymc as pm
import arviz as az
import numpy as np
y = np.array([1,1,2,2,3,4,4,5,5,8])
#Weibull Model
with pm.Model() as m1:
#priors
mu = pm.Normal("mu",0,tau=.001)
gamma = pm.Gamma("gamma",0.001,0.001)
#convert Weibull parameterization
beta = gamma ** (-1 / mu)
#likelihood
like = pm.Weibull("like",mu,beta,observed=y)
log_like = pm.math.log(mu*y**(mu-1)*pm.math.exp(-(y/beta)**mu)/(beta**mu))
deviance = pm.Deterministic("deviance",-2*pm.math.sum(log_like))
trace1 = pm.sample()
az.summary(trace1,var_names=["deviance"])
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu, gamma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 0 seconds.
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
deviance | 42.911 | 2.121 | 40.869 | 46.741 | 0.08 | 0.126 | 984.0 | 906.0 | 1.0 |
#Exponential Model
with pm.Model() as m2:
#prior
lambda_ = pm.Gamma("lambda_",0.001,0.001)
#likelihood
like = pm.Exponential("like",lambda_,observed=y)
log_like = pm.math.log(lambda_*pm.math.exp(-lambda_ * y))
deviance = pm.Deterministic("deviance",-2*pm.math.sum(log_like))
trace2 = pm.sample()
az.summary(trace2,var_names="deviance")
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [lambda_]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 0 seconds.
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
deviance | 46.045 | 1.402 | 45.055 | 48.603 | 0.033 | 0.058 | 1864.0 | 2570.0 | 1.0 |
We see the Weibull model has a lower deviance, and therefore fits the data better than the exponential model.
Extra#
Papers to check out: