3. Laplace’s Method#

Lecture errata#

There are two mistakes in the lecture slides, and one extra result (2) is provided:

  1. In the expression for univariate \(Q\), we should retain the negative sign: it should be

\[Q=\left[-\frac{\partial^2}{\partial\theta^2}\log\left(g\left(\theta\right)\right)\right] \]

instead of

\[Q=\left[\frac{\partial^2}{\partial\theta^2}\log\left(g\left(\theta\right)\right)\right]\]
  1. We know that \(r+ \alpha -1>0\) because \(\frac{d}{d \theta} \log(g(\theta))=\frac{r+ \alpha -1}{\theta} - (\beta +x) =0\) \(\rightarrow\) \(\frac{r+ \alpha -1}{\theta} =\beta +x\), and since \(\beta,x,\theta >0\) the result follows.

  2. The second derivative of \(\log(g(\theta))\) should be \(-\frac{r+\alpha-1}{\theta^2}\) not \(-\frac{r+\alpha-1}{(x+\beta)^2}\). The important point is that the second derivative is negative due to (2), so the value obtained for \(\hat{\theta}\) is a maximum.

All credit goes to Head TA Greg for these lecture corrections!

Notes#

Laplace’s method is another integral approximation technique. This is faster than MCMC, but not as flexible.

We expand the log of the function around its mode in a second-order Taylor expansion. This process results in a quadratic approximation of the function in the log space, which translates to a normal approximation in the original space.

This method is only useful for unimodal, twice-differentiable distributions that aren’t too skewed. You can extend the concept to apply to more challenging distributions with Integrated Nested Laplace Approximations (INLA), but we don’t go into that in this course.

Say we have a posterior \(g(\theta|x)\). We can’t normalize it for whatever reason, so we’ll try to approximate it using a normal distribution:

\[g(\theta|x) \approx N(\hat{\theta},Q^{-1})\]

A normal distribution is defined by mean and variance (or precision), so we need to find these parameters.

From Bayes’ theorem, we know:

\[g(\theta|x) = \frac{f(x|\theta)\pi(\theta)}{\int f(x|\theta)\pi(\theta)d\theta}\]

We want to approximate the joint log-likelihood, so let’s manipulate the above a bit:

\[g(\theta|x) = \frac{e^{\log\left(f(x|\theta)\pi(\theta)\right)}}{\int e^{\log\left( f(x|\theta)\pi(\theta)\right)}d\theta}\]

The expression proportional to the log-posterior is \(\log\left(f(x|\theta)\pi(\theta)\right)\). Let’s shorten that to \(h(\theta)\) for now. We’re going to approximate it with the second-order Taylor expansion, which is:

\[h(\theta) \approx h(\hat{\theta}) + (\theta - \hat{\theta})^T \nabla h(\hat{\theta}) + \frac{1}{2}(\theta - \hat{\theta})^T \nabla^2 h(\hat{\theta})(\theta - \hat{\theta})\]

in the multivariate case, or

\[h(\theta) \approx h(\hat{\theta}) + h'(\hat{\theta})(\theta - \hat{\theta}) + \frac{1}{2}h''(\hat{\theta})(\theta - \hat{\theta})^2\]

in the univariate case. We can drop the first-order term since it will always evaluate to 0 at the MAP solution, which leaves us with

\[h(\theta) \approx h(\hat{\theta}) + \frac{1}{2}(\theta - \hat{\theta})^T \nabla^2 h(\hat{\theta})(\theta - \hat{\theta})\]

The first-order term will drop out since it the gradient or slope of the posterior evaluated at the MAP will be zero, leaving us with:

\[h(\theta) \approx h(\hat{\theta}) + \frac{1}{2}(\theta - \hat{\theta})^T \nabla^2 h(\hat{\theta})(\theta - \hat{\theta})\]

Let’s plug that back in to the equation above:

\[g(\theta|x) = \frac{e^{h(\hat{\theta}) + \frac{1}{2}(\theta - \hat{\theta})^T \nabla^2 h(\hat{\theta})(\theta -\hat{\theta})}}{\int e^{h(\hat{\theta}) + \frac{1}{2}(\theta - \hat{\theta})^T \nabla^2 h(\hat{\theta})(\theta - \hat{\theta})} d\theta}\]

We can pull out the first term, which is a constant.

\[\begin{split} \begin{align*} g(\theta|x) & = \frac{e^{h(\hat{\theta})}e^{\frac{1}{2}(\theta - \hat{\theta})^T \nabla^2 h(\hat{\theta})(\theta -\hat{\theta})}}{e^{h(\hat{\theta})}\int e^{\frac{1}{2}(\theta - \hat{\theta})^T \nabla^2 h(\hat{\theta})(\theta - \hat{\theta})\, }d\theta} \\ & = \frac{e^{\frac{1}{2}(\theta - \hat{\theta})^T \nabla^2 h(\hat{\theta})(\theta -\hat{\theta})}}{\int e^{\frac{1}{2}(\theta - \hat{\theta})^T \nabla^2 h(\hat{\theta})(\theta - \hat{\theta})\, }d\theta} \end{align*} \end{split}\]

Let’s let the inverse of our covariance matrix \(Q = -\left(\nabla^2 h(\hat{\theta})\right) = -\left[\nabla^2 \log\left(f(x|\hat{\theta})\pi(\hat{\theta})\right)\right]\). Then:

\[g(\theta|x) = \frac{1}{\int e^{-\frac{1}{2}(\theta - \hat{\theta})^T Q(\theta - \hat{\theta})\, }d\theta} e^{-\frac{1}{2}(\theta - \hat{\theta})^T Q(\theta -\hat{\theta})}\]

The multivariate normal distribution is specified as:

\[ f(x | \mu, \Sigma) = \frac{1}{\sqrt{(2\pi)^n |\Sigma|}} e^{-\frac{1}{2}(x-\mu)^T \Sigma^{-1} (x-\mu)} \]

Where \(|\Sigma|\) is the determinant of \(\Sigma\). We can see that the forms are very similar, with different normalizing constants. We’ll check how well our \(N(\hat{\theta},Q^{-1})\) approximation works with some examples in the next lecture.