14. Change Point Problem#

In this lesson, we’re trying to infer the “change point” in the widely-used UK coal mining disaster dataset, originally published by MAGUIRE et al. [1952] and updated by Jarrett [1979]. The dataset spans from 1851 to 1962 and was compiled to include only incidents where at least 10 people died. The idea is that at some point the number of disasters decreased significantly, so let’s figure out which year that happened.

Model#

The model comes from Carlin et al. [1992]. This paper is really readable and useful for double-checking your understanding of the concepts we’ve seen so far. The model itself is a great example of how intuitive and flexible Bayesian hierarchical frameworks can be.

We assume two different Poisson processes governing the rates of coal mining disasters. Before the change point \(m\), we’ll call the Poisson likelihood’s rate \(\lambda\). After \(m\), the rate will be \(\mu\) for our second likelihood.

We’ll assume the prior on \(m\) is discrete uniform \([1, n]\), where \(n=112\) the number of years in the dataset. Priors for \(\lambda\) and \(\mu\) are gamma distributions with hyperparameters \( (\alpha, \beta) \) and \( (\gamma, \delta) \), respectively.

Summarized:

\[\begin{split} \begin{align*} f(x_i | \lambda) &\sim \text{Pois}(\lambda) && \text{For the first $ m $ years.} \\ f(x_i | \mu) &\sim \text{Pois}(\mu) && \text{For the remaining $ n-m $ years.} \\ m &\sim \text{Uniform}(1, n) && \\ \lambda &\sim \text{Gamma}(\alpha, \beta) && \\ \mu &\sim \text{Gamma}(\gamma, \delta) && \end{align*} \end{split}\]

Odds ratio#

This model will find some change point since it’s built in to our assumptions. But how do we know whether that change point is actually meaningful? (Carlin et al. [1992]) say: “The question of whether or not a change has occurred is addressed through the posterior odds for no change.”

In our case, that would be \(P(m=n \mid X)/\left({1 - P(m=n \mid X)}\right)\), comparing the odds that \(m=112\) to the odds it doesn’t.

Conditionals#

The posterior is proportional to the product of the likelihoods and the priors for \( \lambda \), \( \mu \), and \( m \).

\[\begin{split} \begin{align*} p(\lambda, \mu, m | X) &\propto \left[ \prod_{i=1}^{m} \text{Poisson}(x_i | \lambda) \right] \left[ \prod_{i=m+1}^{n} \text{Poisson}(x_i | \mu) \right]\cdot \text{Gamma}(\lambda | \alpha, \beta) \cdot \text{Gamma}(\mu | \gamma, \delta) \cdot \text{Uniform}(m | 1, n) \\ &\propto \left[ \lambda^{\sum_{i=1}^{m} x_i} e^{-m\lambda} \right] \left[ \mu^{\sum_{i=m+1}^{n} x_i} e^{-(n-m)\mu} \right] \lambda^{\alpha - 1} e^{-\beta \lambda} \mu^{\gamma - 1} e^{-\delta \mu} \\ &\propto \lambda^{\alpha + \sum_{i=1}^{m} x_i - 1} e^{-(\beta + m)\lambda} \mu^{\gamma + \sum_{i=m+1}^{n} x_i - 1} e^{-(\delta + n - m)\mu} \end{align*} \end{split}\]

So, the full conditional for \(\lambda\) is:

\[ \lambda | \mu, m, X \sim \text{Gamma} \left( \alpha + \sum_{i=1}^{m} x_i, \beta + m \right) \]

And for \(\mu\):

\[ \mu | \lambda, m, X \sim \text{Gamma} \left( \gamma + \sum_{i=m+1}^{n} x_i, \delta + (n - m) \right) \]

A lot of the time I skip full derivations, but finding the full conditional for \(m\) is a little different than what we’ve seen in lectures so far and I didn’t immediately understand it, so I worked out the steps below.

Let’s start with the joint distribution again, without any of the priors since we know they won’t have any terms with \(m\).

\[\begin{split} \begin{align*} \pi(m|\mu, \lambda, X) &\propto \left[ \prod_{i=1}^{m} \text{Poisson}(x_i | \lambda) \right] \left[ \prod_{i=m+1}^{n} \text{Poisson}(x_i | \mu) \right] \\ &\propto \left( \prod_{i=1}^{m} \frac{\lambda^{x_i} e^{-\lambda}}{x_i!} \prod_{i=m+1}^{n} \frac{\mu^{x_i} e^{-\mu}}{x_i!} \right) \\ & \propto e^{-m\lambda} e^{-(n-m)\mu} \left( \prod_{i=1}^{m} \frac{\lambda^{x_i}}{x_i!} \right) \left( \prod_{i=m+1}^{n} \frac{\mu^{x_i}}{x_i!} \right) && \text{Group exponential terms.} \\ & \propto e^{m(\mu -\lambda)} e^{-\mu n} \left( \prod_{i=1}^{m} \frac{\lambda^{x_i}}{x_i!} \right) \left( \prod_{i=m+1}^{n} \frac{\mu^{x_i}}{x_i!} \right) && \text{Rearrange exponential terms.} \\ & \propto e^{m(\mu -\lambda)} e^{-\mu n} \left( \prod_{i=1}^{m} \frac{\left(\lambda \frac{\mu}{\mu} \right)^{x_i}}{x_i!} \right) \left( \prod_{i=m+1}^{n} \frac{\mu^{x_i}}{x_i!} \right) && \text{Introduce a factor of 1 as } \frac{\mu}{\mu} \text{.} \\ & \propto e^{m(\mu -\lambda)} e^{-\mu n} \left( \prod_{i=1}^{m} \frac{\left(\mu \frac{\lambda}{\mu} \right)^{x_i}}{x_i!} \right) \left( \prod_{i=m+1}^{n} \frac{\mu^{x_i}}{x_i!} \right) && \text{Rearrange terms.} \\ & \propto e^{m(\mu -\lambda)} e^{-\mu n} \left(\frac{\lambda}{\mu}\right)^{\sum_{i=1}^{m} x_i} \left( \prod_{i=1}^{m} \frac{\mu^{x_i}}{x_i!} \right) \left( \prod_{i=m+1}^{n} \frac{\mu^{x_i}}{x_i!} \right) && \text{Pull out the } \frac{\lambda}{\mu} \text{ term.} \\ & \propto e^{m(\mu -\lambda)} e^{-\mu n} \left(\frac{\lambda}{\mu}\right)^{\sum_{i=1}^{m} x_i} \left( \prod_{i=1}^{n} \frac{\mu^{x_i}}{x_i!} \right) && \text{Combine the two products since they go from 1 to n.}\\ & \propto e^{m(\mu -\lambda)} \left(\frac{\lambda}{\mu}\right)^{\sum_{i=1}^{m} x_i} && \text{Remove constant terms.} \end{align*} \end{split}\]

We know our PMF will be proportional to \(\pi(m) = e^{m(\mu -\lambda)} \left(\frac{\lambda}{\mu}\right)^{\sum_{i=1}^{m} x_i}\). To normalize \(\pi(m)\), we divide by the sum of \(\pi\) evaluated at all possible values of \(m\).

\[P(m=k) = \frac{\pi(k)}{\sum_{i=1}^{n} \pi(i)}\]

Now that we have all these full conditionals, we can code the Gibbs sampler in the next lesson.