8. Metropolis–Hastings: Weibull-Exponential Example#

In this lecture, Professor Vidakovic goes over an example from Robert [2007] page 305.

Model#

\[\begin{split} \begin{align*} T_i | \alpha, \eta &\sim \text{Weib}(\alpha, \eta) \\ \alpha &\sim \text{Exp}(1) \\ \eta &\sim \text{Ga}(\beta, \xi) \end{align*} \end{split}\]

We can write the joint posterior distribution as:

\[ \pi(\alpha, \eta | T_1,\ldots,T_n) \propto \left( \prod_{i=1}^{n} \alpha \eta t_i^{\alpha - 1} e^{-\eta t_i^{\alpha}} \right) e^{-\alpha} \eta^{\beta - 1} e^{-\xi \eta} \]

Proposal#

The proposal distribution is:

\[q(\alpha',\eta'|\alpha, \eta) \propto \frac{1}{\alpha \eta} \exp\left({-\frac{\alpha'}{\alpha}-\frac{\eta'}{\eta}}\right)\]

But how is this the product of two exponentials with that first term? It’s because they set lambda equal to the mean of alpha and eta (remember the mean of the exponential distribution is \(1/\lambda\)).

Exponential PDF:

\(\lambda e^{-\lambda x}\)

For alpha:

\(\alpha = 1/\lambda\)

\(\lambda = 1/\alpha\)

\(\frac{1}{\alpha} e^{-\frac{1}{\alpha} \alpha'}\)

Likewise for eta:

\(\frac{1}{\eta} e^{-\frac{1}{\eta} \eta'}\)

Multiply them to get the given proposal:

\(\frac{1}{\alpha}\frac{1}{\eta} e^{-\frac{\alpha'}{\alpha}} e^{-\frac{\eta'}{\eta} } = \frac{1}{\alpha \eta} e^{-\frac{\alpha'}{\alpha}-\frac{\eta'}{\eta}}\)

Acceptance criteria#

We will accept the proposal with probability rho:

\[ \rho = \min\left(1, \frac { \left[ \prod_{i=1}^{n} \alpha' \eta' t_i^{\alpha' - 1} e^{-\eta' t_i^{\alpha'}} \right] e^{-\alpha'} \eta'^{\beta - 1} e^{-\xi \eta'} \cdot \frac{1}{\alpha'} \frac{1}{\eta'} e^{-\frac{\alpha}{\alpha'}} e^{-\frac{\eta}{\eta'}} } { \left[ \prod_{i=1}^{n} \alpha \eta t_i^{\alpha - 1} e^{-\eta t_i^{\alpha}} \right] e^{-\alpha} \eta^{\beta - 1} e^{-\xi \eta} \cdot \frac{1}{\alpha} \frac{1}{\eta} e^{-\frac{\alpha'}{\alpha}} e^{-\frac{\eta'}{\eta}} }\right) \]

This is becoming unwieldy. Remember, \(\rho\) is a ratio: the likelihood times prior times proposal evaluated at the proposal values for that iteration, divided by the same evaluated at the last accepted value. This is why, on the next page, I divide the prior, likelihood, and proposal into separate functions. It’s easy to make a mistake when your equations get this long, even if you want to go to the trouble of simplifying it.

What is the purpose of the \(\min(1, r)\) line to get \(\rho\)? Consider what would happen if \(r\) is than 1: that would mean that the latest proposed values evaluate higher than the previous ones, so we will definitely accept. This pushes the sampler towards areas of the sample space that are more probable.