Empirical Bayes
Given the following:
\[\begin{split}
\begin{align*}
X_i \mid \theta_i &\overset{\text{ind.}}{\sim} f_i(x_i \mid \theta_i), \quad i = 1, 2, \dots, n \\
\theta_i &\overset{\text{i.i.d.}}{\sim} \pi(\theta_i \mid \xi), \quad \xi \text{ is a common hyperparameter}
\end{align*}
\end{split}\]
Then the marginal distribution of \(x_i\) given \(\xi\) is:
\[
m_i(x_i \mid \xi) = \int f_i(x_i \mid \theta_i) \cdot \pi(\theta_i \mid \xi) d\theta_i
\]
The joint marginal distribution is:
\[\begin{split}
\begin{align*}
m(\underset{\sim}{x} \mid \xi) &= \int \prod_{i=1}^{n} f_i(x_i \mid \theta_i) \cdot \prod_{i=1}^{n} \pi(\theta_i \mid \xi) d\theta_1 \cdots d\theta_n \\
&= \prod_{i=1}^{n} \int f_i(x_i \mid \theta_i) \cdot \pi(\theta_i \mid \xi) d\theta_i \\
&= \prod_{i=1}^{n} m_i(x_i \mid \xi) \quad \text{(independent)}
\end{align*}
\end{split}\]
From \( m(\underset{\sim}{x} \mid \xi) = \prod_{i=1}^{n} m_i(x_i \mid \xi) \): \( X_i \) are marginally independent if \( \theta_i \overset{\text{i.i.d.}}{\sim} \pi(\theta_i \mid \xi) \).
If \( f_i \equiv f \), then \( X_i \) are i.i.d. (marginally).
Also, the posterior is:
\[
\pi(\theta_i \mid X_i, \xi) = \frac{f(x_i \mid \theta_i) \cdot \pi(\theta_i \mid \xi)}{m(x_i \mid \xi)}
\]
Jeremy in Empirical Bayes
Let \( \underset{\sim}{X} = (98, 107, 89, 88, 108) \) be Jeremy’s scores on \( n = 5 \) independent IQ tests.
For this data:
\[\begin{split}
\begin{align*}
X_i \mid \theta_i &\overset{\text{ind.}}{\sim} N(\theta_i, \sigma^2), \quad \sigma^2 = 80 \\
\theta_i &\overset{\text{i.i.d.}}{\sim} N(\mu, \tau^2), \quad \text{Goal: estimate } \theta_i
\end{align*}
\end{split}\]
The marginal distribution of \( X_i \) is:
\[
m(\underset{\sim}{x} \mid \mu, \tau^2) = \prod_{i=1}^{n} m(x_i \mid \mu, \tau^2)
\]
Given that:
\[
X_i \overset{\text{i.i.d.}}{\sim} N(\mu, \sigma^2 + \tau^2)
\]
This implies:
\[\begin{split}
\begin{align*}
m(\underset{\sim}{x} \mid \mu, \tau^2) &=\prod_{i=1}^{n}m(x_i \mid \mu, \tau^2) \\
&= \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi(\sigma^2 + \tau^2)}} \exp\left(-\frac{(x_i - \mu)^2}{2(\sigma^2 + \tau^2)}\right)
\end{align*}
\end{split}\]
Then, the MLE of \( \mu \) is \( \hat{\mu} = \bar{X} \) and of \( \tau^2 \) is:
\[
\hat{\tau}^2 = (s^2 - \sigma^2)_+ \equiv \text{max}\{0, s^2 - \sigma^2\}, \quad s^2 = \text{sample variance for } X.
\]
With these estimators from the data \( \underset{\sim}{X} \), the (estimated) posterior becomes:
\[
\pi(\theta_i \mid X_i, \hat{\mu}, \hat{\tau}^2) = N(\hat{B} \hat{\mu} + (1 - \hat{B}) x_i, (1 - \hat{B}) \cdot \sigma^2),
\]
where \( \hat{\mu} = \bar{X} \), \( \hat{\tau}^2 = (s^2 - \sigma^2)_+ \), and \( \hat{B} = \frac{\sigma^2}{\sigma^2 + \hat{\tau}^2} \).
Thus, for Jeremy’s data:
\[
s^2 = 101,
\]
\[
\hat{B} = \frac{\sigma^2}{\sigma^2 + \hat{\tau}^2} = \frac{80}{80 + (90.5 - 80)} = \frac{80}{90.5}.
\]
Then, the estimated parameters are:
\[
\hat{\theta}_1 = \frac{80}{90.5} \cdot 98 + \frac{21}{90.5} \cdot 98 = 98,
\]
\[
\hat{\theta}_2 = \frac{80}{90.5} \cdot 98 + \frac{21}{90.5} \cdot 107 = 99.044, \space \text{etc.}
\]
Poisson-Exponential example:
\[\begin{split}
\begin{align*}
X_i\mid \lambda_i &\sim \text{Pois}(\lambda_i), \quad i = 1, 2, \dots, n \\
\lambda_i &\sim \text{Exp}(\mu), \quad \pi(\lambda_i) = \mu e^{-\mu \lambda_i}
\end{align*}
\end{split}\]
Find the Empirical Bayes estimators of \( \lambda_i \).
Bayes estimator: \( X_i \sim \text{Pois}(\lambda_i), \quad \lambda_i \sim \text{Exp}(\mu) \)
\( \lambda_i \mid X_i \sim \text{Ga}(x_i + 1, 1 + \mu) \) by conjugacy.
\( \mathbb{E}(\lambda_i \mid X_i) = \frac{x_i + 1}{1 + \mu}, \) but \( \mu \) may not be known…
Empirical Bayes:
The marginal distribution \( m(x_i) \) is:
\[\begin{split}
\begin{align*}
m(x_i) &= \int_{0}^{+\infty} \frac{\lambda_i^{x_i}}{x_i!} e^{-\lambda_i} \cdot \mu e^{-\mu \lambda_i} d\lambda_i\\
m(x_i) &= \frac{1}{(1+\mu)^{x_i+1}} \cdot \mu \int_{0}^{+\infty} \lambda_i^{x_i+1} e^{-(1+\mu)\lambda_i} d\lambda_i \\
&= \left(\frac{1}{1+\mu}\right)^{x_i} \cdot \frac{\mu}{1+\mu}, \quad x_i = 0, 1, \ldots
\end{align*}
\end{split}\]
This is a geometric distribution! Denote \( \frac{\mu}{1+\mu} = p \):
\[
\text{Ge}(p): \quad P(X_i = x_i) = (1-p)^{x_i} \cdot p
\]
The likelihood is:
\[
L = \prod_{i=1}^{n} m(x_i) = (1-p)^{\sum x_i} \cdot p^n
\]
Taking the log-likelihood:
\[
\ell = \log L = \sum_{i=1}^{n} \left[ \log(1-p) \cdot \sum x_i + n \log p \right]
\]
Taking the derivative and setting it equal to 0:
\[
\ell' = -\frac{\sum x_i}{1-p} + \frac{n}{p} = 0
\]
Solving for \( p \):
\[
\hat{p} = \frac{n}{n + \sum x_i} = \frac{1}{1 + \bar{x}}
\]
Thus,
\[
\frac{\mu}{1+\mu} = \frac{1}{1+\bar{X}} \quad \Rightarrow \quad \hat{\mu} = \frac{1}{\bar{X}}
\]
Back to Bayes estimator with \( \mu \) estimated from the data:
\[
\frac{X_i + 1}{1 + \mu} \quad \rightarrow \quad \frac{X_i + 1}{1 + \hat{\mu}} = \frac{X_i + 1}{1 + \frac{1}{\bar{X}}} = \frac{\bar{X}}{1 + \bar{X}} \cdot (X_i + 1).
\]
Thus,
\[
\hat{\lambda}_i = \frac{\bar{X}}{1 + \bar{X}}(X_i + 1)
\]
The long way
Here we’ll ignore conjugacy and the Geometric distribution.
Let \( X_i \mid \lambda_i \) be independently distributed following a Poisson distribution with parameter \( \lambda_i \), i.e.
\[\begin{split}
\begin{align}
X_i \mid \lambda_i &\sim^\text{ind.} \text{Poisson}(\lambda_i) \\
\lambda_i &\sim^\text{iid} \text{Exp}(\mu)
\end{align}
\end{split}\]
for \( i = 1, \ldots, n \), where \( \mu \) is unknown. Find the empirical Bayes estimator of \( \lambda_i \), \( i = 1, \ldots, n \).
Note: If \(Z \sim \text{Gamma}(a, b)\), then its pdf is
\[
p(z) = \frac{b^a}{\Gamma(a)} z^{a-1} e^{-bz} \quad \text{for} \quad z \geq 0, \, a, b > 0.
\]
Solution
From the form of the likelihood and the prior, we have:
\[
f(X_i \mid \lambda_i) = \frac{\lambda_i^{X_i}}{X_i!} e^{-\lambda_i} \quad \text{for} \quad X_i = 0, 1, 2, \ldots
\]
and
\[
\pi(\lambda_i) = \frac{\mu}{\Gamma(1)} e^{- \mu \lambda_i}
\]
Compute the marginal for \( X_i \).
The marginal for \( X_i \) is the distribution of \( X_i \) averaged over all possible values of \( \lambda_i \):
\[
m(X_i) = \int_0^\infty \frac{\lambda_i^{X_i}}{X_i!} e^{-\lambda_i} \cdot \frac{\mu}{\Gamma(1)} e^{-\mu \lambda_i} \, d\lambda_i,
\]
\[
m(X_i) = \frac{\mu}{X_i!} \int_0^\infty \lambda_i^{X_i} e^{-(\mu + 1) \lambda_i} \, d\lambda_i
\]
where \( \lambda_i^{X_i} e^{-(\mu + 1) \lambda_i} \) is the kernel of a Gamma distribution with parameters \( (X_i + 1, \mu + 1) \). This will integrate to 1 if we normalize it, so we’ll multiply by 1 in the form of the normalizing constant and its reciprocal:
\[\begin{split}
\begin{align}
m(X_i) &= \frac{\mu}{X_i!} \frac{\Gamma(\alpha)}{\beta^\alpha}\frac{\beta^\alpha}{\Gamma(\alpha)}\int_0^\infty \lambda_i^{X_i} e^{-(\mu + 1) \lambda_i} d\lambda_i && \text{Multiply by 1.} \\
&= \frac{\mu}{X_i!} \frac{\Gamma(X_i + 1)}{(\mu + 1)^{X_i + 1}}\frac{{(\mu + 1)}^{X_i + 1}}{\Gamma(X_i + 1)}\int_0^\infty \lambda_i^{X_i} e^{-(\mu + 1) \lambda_i} d\lambda_i && \text{Substitute known parameter values.} \\
&= \frac{\mu}{X_i!} \frac{\Gamma(X_i + 1)}{(\mu + 1)^{X_i + 1}}\left[\frac{{(\mu + 1)}^{X_i + 1}}{\Gamma(X_i + 1)}\int_0^\infty \lambda_i^{X_i} e^{-(\mu + 1) \lambda_i} d\lambda_i\right] && \text{[Gamma dist.] will integrate to 1 now.}\\
&= \frac{\mu}{X_i!} \frac{\Gamma(X_i + 1)}{(\mu + 1)^{X_i + 1}} \\
&= \frac{\mu}{(\mu + 1)^{X_i + 1}} && \text{Remember, } \Gamma(n)=(n-1)!\\
\end{align}
\end{split}\]
So, the marginal for all \( X \) is the joint distribution (independence is enough here):
\[
m(X) = \prod_{i=1}^n m(X_i) = \prod_{i=1}^n \frac{\mu}{(\mu + 1)^{X_i + 1}}.
\]
We view \( m(X) \) as a function of the unknown parameter \( \mu \), and find a parameter estimate for \( \mu \) using Maximum Likelihood Estimation (MLE) as follows:
Convert to log-likelihood:
\[
\log m(X) = \sum_{i=1}^n \left\{ \log \mu - (X_i + 1) \log(1 + \mu) \right\}
\]
Take the derivative and set equal to 0:
\[
\frac{\partial \log m(X)}{\partial \mu} = \sum_{i=1}^n \left( \frac{1}{\mu} - \frac{X_i + 1}{1 + \mu} \right) = 0,
\]
\[
\frac{n}{\mu} = \frac{n \bar{X} + n}{1 + \mu},
\]
\[
n (1 + \mu) = \mu n (\bar{X} + 1),
\]
\[
n + n\mu = \mu n \bar{X} + n\mu.
\]
Solving for \( \mu \) yields:
\[
\hat{\mu} = \frac{1}{\bar{X}}.
\]
Finally, the posterior of \( \lambda_i \) given \( X_i \) is:
\[
p(\lambda_i \mid X_i) \propto p(X_i \mid \lambda_i) p(\lambda_i) \propto e^{-\lambda_i} \lambda_i^{X_i} \times e^{-\mu \lambda_i} = e^{-(1+\mu)\lambda_i} \lambda_i^{X_i},
\]
\[
\Rightarrow \lambda_i \mid X_i \sim \text{Gamma}(X_i + 1, 1 + \mu).
\]
Hence, the empirical Bayes (EB) estimate of \( \lambda_i \) is obtained as follows:
\[
\text{EB estimate of } \lambda_i: \hat{\lambda}_i = \mathbb{E}(\lambda_i \mid X_i) = \frac{X_i + 1}{1 + \hat{\mu}} = \frac{X_i + 1}{1 + \frac{1}{\bar{X}}} = \frac{\bar{X}}{1 + \bar{X}} (X_i + 1).
\]