import arviz as az
import pymc as pm
from pymc.math import log, sqr

10. The Zero Trick and Custom Likelihoods*#

Zero-trick Jeremy#

This introduces the “zero trick”, which is a method for specifying custom likelihoods in BUGS. For a more detailed treatment of these methods, see Ntzoufras [2009], page 276, which is where I got this explanation. These tricks are unnecessary in PyMC since we can just define custom distributions directly, but they do seem to work.

Adapted from Unit 6: zerotrickjeremy.odc.

Here’s the model we’re using:

\[\begin{split} \begin{align*} X_i \mid \theta &\sim N(\theta, 80) \\ \theta &\sim N(110, 120) \end{align*} \end{split}\]

Of course, BUGS can handle this model just fine. But, let’s pretend for a moment that there is no built-in normal distribution. The zero trick takes advantages of some properties of the Poisson distribution to recreate an arbitrary likelihood.

Given a log-likelihood of the form \(l_i = \log f(y; \theta)\),

\[\begin{split} \begin{align*} f(y | \theta) &= \prod_{i=1}^{n} e^{l_i} && \text{Log-likelihood to likelihood.}\\ &= \prod_{i=1}^{n} \frac{e^{-(-l_i)}(-l_i)^0}{0!} && \text{Matching the form of the Poisson PMF.} \\ &= \prod_{i=1}^{n} f_P(0; -l_i) && \text{Poisson evaluated at zero with mean $-l_i$.} \end{align*} \end{split}\]

Ntzoufras [2009] page 276.

But the rate, \(\lambda\), can’t be negative. So we need to add a constant, C, to keep that from happening.

The normal log-likelihood is:

\[l_i = -0.5 \log(2\pi) - \log(\sigma^2) - \frac{(y_i - \mu_i)^2}{2\sigma^2}\]

But the constant terms won’t affect the posterior.

Here’s the model in PyMC:

y = 98
μ = 110
σ = 80**0.5
τ = 120**0.5
C = 10000

inits = {"θ": 100}
with pm.Model() as m:
    θ = pm.Flat("θ")

    λ1 = pm.Deterministic("λ1", log(σ) + 0.5 * sqr(((y - θ) / σ)) + C)
    λ2 = pm.Deterministic("λ2", log(τ) + 0.5 * sqr(((θ - μ) / τ)) + C)

    z1 = pm.Poisson("z1", λ1, observed=0)
    z2 = pm.Poisson("z2", λ2, observed=0)

    trace = pm.sample(5000, tune=1000, initvals=inits, target_accept=0.88)
Hide code cell output
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [θ]
100.00% [24000/24000 00:01<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 1 seconds.
az.summary(trace, hdi_prob=0.95, var_names="θ", kind="stats")
mean sd hdi_2.5% hdi_97.5%
θ 102.852 6.972 89.6 116.956

Custom likelihoods in PyMC#

PyMC is a lot more flexible. For one, many more built-in distributions are available, including mixtures and zero-inflated models, which is where I’ve seen the zero trick used most often (Karlis and Ntzoufras [2008], Ntzoufras [2009] page 288).

If you need a custom likelihood, take a look at the pm.Potential or pm.CustomDist classes.

%load_ext watermark
%watermark -n -u -v -iv -p pytensor
Last updated: Sun Aug 06 2023

Python implementation: CPython
Python version       : 3.11.4
IPython version      : 8.14.0

pytensor: 2.14.2

pymc : 5.7.1
arviz: 0.16.1