18. Parametric#

Lecture errata#

Sample variance#

The sample variance for the Jeremy’s IQ example here was calculated incorrectly. The variance of the sample \((98, 107, 89, 88, 108)\) is \(S^2 = \frac{(0+9^2+9^2+10^2+10^2)}{(5-1)} = 90.5\). The calculation may be updated as follows:

\[ \hat{B} = \frac{\sigma^2}{\sigma^2 + \hat{\tau}^2} = \frac{80}{80 + (90.5 - 80)} = \frac{80}{90.5} \]

Then the estimated parameters are:

\[ \hat{\theta}_1 = \frac{80}{90.5} \cdot 98 + \frac{10.5}{90.5} \cdot 98 = 98 \]
\[ \hat{\theta}_2 = \frac{80}{90.5} \cdot 98 + \frac{10.5}{90.5} \cdot 107 = 99.044, \text{ etc.} \]

Incorrect conditional on slide 3#

The last line of slide 3 in the lecture is:

\[\prod_{i=1}^{n} m_i(x_i|\theta_i)\]

but it should be:

\[\prod_{i=1}^{n} m_i(x_i|\xi)\]

Lecture Notes#

I’m going to add the entire slides for this one since we get so many questions about this lecture, as well as additional algebraic steps.

Empirical Bayes can be categorized into parametric and non-parametric approaches, a taxonomy proposed by Carl Morris in his 1983 JASA paper (Morris [1983]).

Empirical Bayes#

  • Carl Morris (1983, JASA paper) divided Empirical Bayes into parametric and non-parametric.

Given the following:

\[\begin{split} \begin{align*} X_i \mid \theta_i &\overset{\text{ind.}}{\sim} f_i(x_i \mid \theta_i), \quad i = 1, 2, \dots, n \\ \theta_i &\overset{\text{i.i.d.}}{\sim} \pi(\theta_i \mid \xi), \quad \xi \text{ is a common hyperparameter} \end{align*} \end{split}\]

Then the marginal distribution of \(x_i\) given \(\xi\) is:

\[ m_i(x_i \mid \xi) = \int f_i(x_i \mid \theta_i) \cdot \pi(\theta_i \mid \xi) d\theta_i \]

The joint marginal distribution is:

\[\begin{split} \begin{align*} m(\underset{\sim}{x} \mid \xi) &= \int \prod_{i=1}^{n} f_i(x_i \mid \theta_i) \cdot \prod_{i=1}^{n} \pi(\theta_i \mid \xi) d\theta_1 \cdots d\theta_n \\ &= \prod_{i=1}^{n} \int f_i(x_i \mid \theta_i) \cdot \pi(\theta_i \mid \xi) d\theta_i \\ &= \prod_{i=1}^{n} m_i(x_i \mid \xi) \quad \text{(independent)} \end{align*} \end{split}\]

From \( m(\underset{\sim}{x} \mid \xi) = \prod_{i=1}^{n} m_i(x_i \mid \xi) \): \( X_i \) are marginally independent if \( \theta_i \overset{\text{i.i.d.}}{\sim} \pi(\theta_i \mid \xi) \).


If \( f_i \equiv f \), then \( X_i \) are i.i.d. (marginally).

Also, the posterior is:

\[ \pi(\theta_i \mid X_i, \xi) = \frac{f(x_i \mid \theta_i) \cdot \pi(\theta_i \mid \xi)}{m(x_i \mid \xi)} \]

  • \( \xi \) is unknown and can be estimated from \( X_1, X_2, \dots, X_n \) via:

    • MLE (called MLE II approach). This is what we’ll focus on.

    • MM (moment matching)

Jeremy in Empirical Bayes#

Let \( \underset{\sim}{X} = (98, 107, 89, 88, 108) \) be Jeremy’s scores on \( n = 5 \) independent IQ tests.

For this data:

\[\begin{split} \begin{align*} X_i \mid \theta_i &\overset{\text{ind.}}{\sim} N(\theta_i, \sigma^2), \quad \sigma^2 = 80 \\ \theta_i &\overset{\text{i.i.d.}}{\sim} N(\mu, \tau^2), \quad \text{Goal: estimate } \theta_i \end{align*} \end{split}\]

The marginal distribution of \( X_i \) is:

\[ m(\underset{\sim}{x} \mid \mu, \tau^2) = \prod_{i=1}^{n} m(x_i \mid \mu, \tau^2) \]

Given that:

\[ X_i \overset{\text{i.i.d.}}{\sim} N(\mu, \sigma^2 + \tau^2) \]

This implies:

\[\begin{split} \begin{align*} m(\underset{\sim}{x} \mid \mu, \tau^2) &=\prod_{i=1}^{n}m(x_i \mid \mu, \tau^2) \\ &= \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi(\sigma^2 + \tau^2)}} \exp\left(-\frac{(x_i - \mu)^2}{2(\sigma^2 + \tau^2)}\right) \end{align*} \end{split}\]

Then, the MLE of \( \mu \) is \( \hat{\mu} = \bar{X} \) and of \( \tau^2 \) is:

\[ \hat{\tau}^2 = (s^2 - \sigma^2)_+ \equiv \text{max}\{0, s^2 - \sigma^2\}, \quad s^2 = \text{sample variance for } X. \]

With these estimators from the data \( \underset{\sim}{X} \), the (estimated) posterior becomes:

\[ \pi(\theta_i \mid X_i, \hat{\mu}, \hat{\tau}^2) = N(\hat{B} \hat{\mu} + (1 - \hat{B}) x_i, (1 - \hat{B}) \cdot \sigma^2), \]

where \( \hat{\mu} = \bar{X} \), \( \hat{\tau}^2 = (s^2 - \sigma^2)_+ \), and \( \hat{B} = \frac{\sigma^2}{\sigma^2 + \hat{\tau}^2} \).

Thus, for Jeremy’s data:

\[ s^2 = 101, \]
\[ \hat{B} = \frac{\sigma^2}{\sigma^2 + \hat{\tau}^2} = \frac{80}{80 + (90.5 - 80)} = \frac{80}{90.5}. \]

Then, the estimated parameters are:

\[ \hat{\theta}_1 = \frac{80}{90.5} \cdot 98 + \frac{21}{90.5} \cdot 98 = 98, \]
\[ \hat{\theta}_2 = \frac{80}{90.5} \cdot 98 + \frac{21}{90.5} \cdot 107 = 99.044, \space \text{etc.} \]

Poisson-Exponential example:#

\[\begin{split} \begin{align*} X_i\mid \lambda_i &\sim \text{Pois}(\lambda_i), \quad i = 1, 2, \dots, n \\ \lambda_i &\sim \text{Exp}(\mu), \quad \pi(\lambda_i) = \mu e^{-\mu \lambda_i} \end{align*} \end{split}\]

Find the Empirical Bayes estimators of \( \lambda_i \).

  • Bayes estimator: \( X_i \sim \text{Pois}(\lambda_i), \quad \lambda_i \sim \text{Exp}(\mu) \)

    \( \lambda_i \mid X_i \sim \text{Ga}(x_i + 1, 1 + \mu) \) by conjugacy.

    \( \mathbb{E}(\lambda_i \mid X_i) = \frac{x_i + 1}{1 + \mu}, \) but \( \mu \) may not be known…

  • Empirical Bayes:

The marginal distribution \( m(x_i) \) is:

\[\begin{split} \begin{align*} m(x_i) &= \int_{0}^{+\infty} \frac{\lambda_i^{x_i}}{x_i!} e^{-\lambda_i} \cdot \mu e^{-\mu \lambda_i} d\lambda_i\\ m(x_i) &= \frac{1}{(1+\mu)^{x_i+1}} \cdot \mu \int_{0}^{+\infty} \lambda_i^{x_i+1} e^{-(1+\mu)\lambda_i} d\lambda_i \\ &= \left(\frac{1}{1+\mu}\right)^{x_i} \cdot \frac{\mu}{1+\mu}, \quad x_i = 0, 1, \ldots \end{align*} \end{split}\]

This is a geometric distribution! Denote \( \frac{\mu}{1+\mu} = p \):

\[ \text{Ge}(p): \quad P(X_i = x_i) = (1-p)^{x_i} \cdot p \]

The likelihood is:

\[ L = \prod_{i=1}^{n} m(x_i) = (1-p)^{\sum x_i} \cdot p^n \]

Taking the log-likelihood:

\[ \ell = \log L = \sum_{i=1}^{n} \left[ \log(1-p) \cdot \sum x_i + n \log p \right] \]

Taking the derivative and setting it equal to 0:

\[ \ell' = -\frac{\sum x_i}{1-p} + \frac{n}{p} = 0 \]

Solving for \( p \):

\[ \hat{p} = \frac{n}{n + \sum x_i} = \frac{1}{1 + \bar{x}} \]

Thus,

\[ \frac{\mu}{1+\mu} = \frac{1}{1+\bar{X}} \quad \Rightarrow \quad \hat{\mu} = \frac{1}{\bar{X}} \]

Back to Bayes estimator with \( \mu \) estimated from the data:

\[ \frac{X_i + 1}{1 + \mu} \quad \rightarrow \quad \frac{X_i + 1}{1 + \hat{\mu}} = \frac{X_i + 1}{1 + \frac{1}{\bar{X}}} = \frac{\bar{X}}{1 + \bar{X}} \cdot (X_i + 1). \]

Thus,

\[ \hat{\lambda}_i = \frac{\bar{X}}{1 + \bar{X}}(X_i + 1) \]

The long way#

Here we’ll ignore conjugacy and the Geometric distribution.

Let \( X_i \mid \lambda_i \) be independently distributed following a Poisson distribution with parameter \( \lambda_i \), i.e.

\[\begin{split} \begin{align} X_i \mid \lambda_i &\sim^\text{ind.} \text{Poisson}(\lambda_i) \\ \lambda_i &\sim^\text{iid} \text{Exp}(\mu) \end{align} \end{split}\]

for \( i = 1, \ldots, n \), where \( \mu \) is unknown. Find the empirical Bayes estimator of \( \lambda_i \), \( i = 1, \ldots, n \).

Note: If \(Z \sim \text{Gamma}(a, b)\), then its pdf is

\[ p(z) = \frac{b^a}{\Gamma(a)} z^{a-1} e^{-bz} \quad \text{for} \quad z \geq 0, \, a, b > 0. \]
Solution#

From the form of the likelihood and the prior, we have:

\[ f(X_i \mid \lambda_i) = \frac{\lambda_i^{X_i}}{X_i!} e^{-\lambda_i} \quad \text{for} \quad X_i = 0, 1, 2, \ldots \]

and

\[ \pi(\lambda_i) = \frac{\mu}{\Gamma(1)} e^{- \mu \lambda_i} \]

Compute the marginal for \( X_i \).

The marginal for \( X_i \) is the distribution of \( X_i \) averaged over all possible values of \( \lambda_i \):

\[ m(X_i) = \int_0^\infty \frac{\lambda_i^{X_i}}{X_i!} e^{-\lambda_i} \cdot \frac{\mu}{\Gamma(1)} e^{-\mu \lambda_i} \, d\lambda_i, \]
\[ m(X_i) = \frac{\mu}{X_i!} \int_0^\infty \lambda_i^{X_i} e^{-(\mu + 1) \lambda_i} \, d\lambda_i \]

where \( \lambda_i^{X_i} e^{-(\mu + 1) \lambda_i} \) is the kernel of a Gamma distribution with parameters \( (X_i + 1, \mu + 1) \). This will integrate to 1 if we normalize it, so we’ll multiply by 1 in the form of the normalizing constant and its reciprocal:

\[\begin{split} \begin{align} m(X_i) &= \frac{\mu}{X_i!} \frac{\Gamma(\alpha)}{\beta^\alpha}\frac{\beta^\alpha}{\Gamma(\alpha)}\int_0^\infty \lambda_i^{X_i} e^{-(\mu + 1) \lambda_i} d\lambda_i && \text{Multiply by 1.} \\ &= \frac{\mu}{X_i!} \frac{\Gamma(X_i + 1)}{(\mu + 1)^{X_i + 1}}\frac{{(\mu + 1)}^{X_i + 1}}{\Gamma(X_i + 1)}\int_0^\infty \lambda_i^{X_i} e^{-(\mu + 1) \lambda_i} d\lambda_i && \text{Substitute known parameter values.} \\ &= \frac{\mu}{X_i!} \frac{\Gamma(X_i + 1)}{(\mu + 1)^{X_i + 1}}\left[\frac{{(\mu + 1)}^{X_i + 1}}{\Gamma(X_i + 1)}\int_0^\infty \lambda_i^{X_i} e^{-(\mu + 1) \lambda_i} d\lambda_i\right] && \text{[Gamma dist.] will integrate to 1 now.}\\ &= \frac{\mu}{X_i!} \frac{\Gamma(X_i + 1)}{(\mu + 1)^{X_i + 1}} \\ &= \frac{\mu}{(\mu + 1)^{X_i + 1}} && \text{Remember, } \Gamma(n)=(n-1)!\\ \end{align} \end{split}\]

So, the marginal for all \( X \) is the joint distribution (independence is enough here):

\[ m(X) = \prod_{i=1}^n m(X_i) = \prod_{i=1}^n \frac{\mu}{(\mu + 1)^{X_i + 1}}. \]

We view \( m(X) \) as a function of the unknown parameter \( \mu \), and find a parameter estimate for \( \mu \) using Maximum Likelihood Estimation (MLE) as follows:

Convert to log-likelihood:

\[ \log m(X) = \sum_{i=1}^n \left\{ \log \mu - (X_i + 1) \log(1 + \mu) \right\} \]

Take the derivative and set equal to 0:

\[ \frac{\partial \log m(X)}{\partial \mu} = \sum_{i=1}^n \left( \frac{1}{\mu} - \frac{X_i + 1}{1 + \mu} \right) = 0, \]
\[ \frac{n}{\mu} = \frac{n \bar{X} + n}{1 + \mu}, \]
\[ n (1 + \mu) = \mu n (\bar{X} + 1), \]
\[ n + n\mu = \mu n \bar{X} + n\mu. \]

Solving for \( \mu \) yields:

\[ \hat{\mu} = \frac{1}{\bar{X}}. \]

Finally, the posterior of \( \lambda_i \) given \( X_i \) is:

\[ p(\lambda_i \mid X_i) \propto p(X_i \mid \lambda_i) p(\lambda_i) \propto e^{-\lambda_i} \lambda_i^{X_i} \times e^{-\mu \lambda_i} = e^{-(1+\mu)\lambda_i} \lambda_i^{X_i}, \]
\[ \Rightarrow \lambda_i \mid X_i \sim \text{Gamma}(X_i + 1, 1 + \mu). \]

Hence, the empirical Bayes (EB) estimate of \( \lambda_i \) is obtained as follows:

\[ \text{EB estimate of } \lambda_i: \hat{\lambda}_i = \mathbb{E}(\lambda_i \mid X_i) = \frac{X_i + 1}{1 + \hat{\mu}} = \frac{X_i + 1}{1 + \frac{1}{\bar{X}}} = \frac{\bar{X}}{1 + \bar{X}} (X_i + 1). \]