Supplementary Exercises 4.8

Supplementary Exercises 4.8#

Warning

This page contains solutions! We recommend attempting each problem before peeking.

Note

Question 1 was a duplicate of 4.3 question 16.

2. Mosaic Virus#

A single leaf is taken from each of 8 different tobacco plants. Each leaf is then divided in half, and given one of two preparations of mosaic virus. Researchers wanted to examine if there is a difference in the mean number of lesions from the two preparations. Here is the raw data:

\[\begin{split} \begin{array}{ccc} \text{Plant} & \text{Prep 1} & \text{Prep 2} \\ 1 & 38 & 29 \\ 2 & 40 & 35 \\ 3 & 26 & 31 \\ 4 & 33 & 31 \\ 5 & 21 & 14 \\ 6 & 27 & 37 \\ 7 & 41 & 22 \\ 8 & 36 & 25 \\ \end{array} \end{split}\]

Assume the normal distribution for the difference between the populations/samples. Using a PPL, find:

the 95% credible set for $\mu_1 - \mu_2$, and
posterior probability of hypothesis $H_1: \mu_1 - \mu_2 \geq 0$.

Use noninformative priors.

Hint: Since this is a paired two sample problem, a single model should be placed on the difference.

Solution

See the hidden cell below for the code and output. This question asks for a PPL solution, but keep in mind we won’t use those until after the midterm in the current class format. It is possible to do this with Unit 4 or Unit 5 techniques, but we haven’t written up a solution using those yet.

3. FIGO#

Despite the excellent prognosis of FIGO stage I, type I endometrial cancers, a substantial number of patients experience recurrence and die from this disease. Zeimet et al. [2013] conducted a retrospective multicenter cohort study to determine the expression of L1CAM by immunohistochemistry in 1021 endometrial cancer specimens with the goal of predicting clinical outcomes. Of the 1021 included cancers, 17.7% were rated L1CAM-positive. Of these L1CAM-positive cancers, 51.4% recurred during follow-up compared with 2.9% of L1CAM-negative cancers. Patients with L1CAM-positive cancers had poorer disease-free and overall survival.

It is stated that L1CAM has been the best-ever published prognostic factor in FIGO stage I, type I endometrial cancers and shows clear superiority over the standardly used multifactor risk score. L1CAM expression in type I cancers indicates the need for adjuvant treatment. This adhesion molecule might serve as a treatment target for the fully humanized anti-L1CAM antibody currently under development for clinical use.

FIGO I/I Endometrial Cancer	Recurred	Did Not Recur	Total
L1CAM Positive
L1CAM Negative
Total			1021

Using the information supplied, fill in the table (round the entries to the closest integer).
The estimators of the population sensitivity and specificity are simple relative frequencies (ratios): True Positives (TP)/Recurred and True Negatives (TN)/Not Recurred. Consider now a Bayesian version of this problem. Using a PPL, model TP and TN as Binomials, place priors on population sensitivity ($p_1$) and Specificity ($p_2$) and find their Bayesian estimators. Explore the estimators for your favorite choice of priors on $p_1$ and $p_2$: Jeffreys’, uniform (0, 1), flat on logit, etc.

Solution

Solution by Jason Naramore. This is another PPL example (see hidden code cell, below).

Show code cell content Hide code cell content

import pymc as pm
import numpy as np
import arviz as az

total = 1021
totalpositive = 181  # rounded 0.177* 1021= 180.7170
totalnegative = 840  # % as 1021 - 181
tp = 93  # true positiveas rounded181 *0.514 =93.0340
fp = 88  # false positives, as 181-93
fn = 24  # false negatives rounded 840 *0.029=24.3600
tn = 816  # true negatives, as 840-24

with pm.Model() as m:

    # priors
    prior_sensitivity = pm.Uniform("sensitivity", 0, 1)
    prior_specificity = pm.Uniform("specificity", 0, 1)

    # Binomial Likelihoods
    sensitivity_likelihood = pm.Binomial(
        "sensitivity_likelihood", p=prior_sensitivity, n=tp + fn, observed=tp
    )
    specificity_likelihood = pm.Binomial(
        "specificity_likelihood", p=prior_specificity, n=tn + fp, observed=tn
    )

    # sampling
    trace = pm.sample(5000, target_accept=0.95)

az.summary(trace, hdi_prob=0.95)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sensitivity, specificity]

100.00% [24000/24000 00:01<00:00 Sampling 4 chains, 0 divergences]

Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 2 seconds.

	mean	sd	hdi_2.5%	hdi_97.5%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
sensitivity	0.790	0.037	0.718	0.863	0.0	0.0	12138.0	11296.0	1.0
specificity	0.902	0.010	0.882	0.920	0.0	0.0	14302.0	12679.0	1.0

4. Histocompatibility#

A patient who is waiting for an organ transplant needs a histocompatible donor who matches the patient’s human leukocyte antigen (HLA) type.

For a given patient, the number of matching donors per 1000 National Blood Bank records is modeled as Poisson with unknown rate $\lambda$. If a randomly selected group of 1000 records showed exactly one match, estimate $\lambda$ in Bayesian fashion.

For $\lambda$ assume:

Gamma prior $\text{Ga}(\alpha=2, \beta=1)$;
half-flat prior $\lambda \propto 1$, for $\lambda > 0$;
invariance prior $\pi(\lambda) = \frac{1}{\lambda}$, for $\lambda > 0$;
Jeffreys prior $\pi(\lambda) = \sqrt{\frac{1}{\lambda}}$, for $\lambda > 0$.

Solution

Gamma $\text{Ga}(\alpha=2, \beta=1)$ prior;

Poisson PDF $\propto e^{-\lambda}\lambda^k$

To shake things up, let’s generalize to multiple independent datapoints, even though we only have a single datapoint equalling 1 for this problem.

$\prod_{i=1}^{n} e^{-\lambda}\lambda^{k_i} = e^{- n\lambda}\lambda^{\sum_{i=1}^{n} k_i}$

Gamma PDF $\propto x^{\alpha -1}e^{-\beta x}$

\[\begin{align*} \pi(\lambda \mid k) &\propto \left(e^{- n\lambda}\lambda^{\sum_{i=1}^{n} k_i}\right) \left(\lambda^{\alpha -1}e^{-\beta \lambda}\right) \\ &\propto \lambda^{(\alpha -1) + \sum_{i=1}^{n} k_i}e^{-\beta \lambda - n\lambda} \\ &\propto \lambda^{(\alpha + \sum_{i=1}^{n} k_i) - 1}e^{-(\beta + n)\lambda} \\ &= Ga(\alpha + \sum_{i=1}^{n} k_i, \beta + n) \end{align*}\]

We recognize the $Ga(\alpha + \sum_{i=1}^{n} k_i, \beta + n)$ posterior, which comes out to $Ga(3, 2)$ in this case. Our Bayes estimate is then the mean of the posterior, which is $\frac{\alpha}{\beta} = \frac{3}{2}$.

flat prior $\lambda = 1$, for $\lambda > 0$;

Then go on with the same procedure:

\[\begin{align*} \pi(\lambda \mid k) &\propto \left(e^{- n\lambda}\lambda^{\sum_{i=1}^{n} k_i}\right) \mathbf{1}(\lambda > 0) \\ &\propto \lambda^{\left(\sum_{i=1}^{n} k_i\right) + 1 - 1} e^{- n\lambda} \mathbf{1}(\lambda > 0) \\ &= Ga(1 + \sum_{i=1}^{n} k_i, n) \end{align*} \]

Which in our case would be $Ga(2, 1)$ with a mean of 2.

invariance prior $\pi(\lambda) = \frac{1}{\lambda}$, for $\lambda > 0$;

\[\begin{align*} \pi(\lambda \mid k) & \propto \left(e^{- n\lambda}\lambda^{\sum_{i=1}^{n} k_i}\right) \frac{1}{\lambda}\mathbf{1}(\lambda > 0) \\ &\propto e^{-n\lambda} \lambda^{-1 + \sum_{i=1}^{n} k_i} \mathbf{1}(\lambda > 0) \\ & = Ga(\sum_{i=1}^{n} k_i, n) \end{align*}\]

We identify the $Ga(1, 1)$ distribution, which has a mean of 1. Equivalently, the $Exp(1)$ distribution.

Jeffreys prior $\pi(\lambda) = \sqrt{\frac{1}{\lambda}}$, for $\lambda > 0$.

\[\begin{align*} \pi(\lambda \mid k) &\propto \left(e^{- n\lambda}\lambda^{\sum_{i=1}^{n} k_i}\right) \times \sqrt{\frac{1}{\lambda}}\mathbf{1}(\lambda > 0) \\ &\propto \lambda^{- 1/2 + \sum_{i=1}^{n} k_i} e^{-n\lambda} \mathbf{1}(\lambda > 0) \\ & \propto \lambda^{- 1 + \left(1/2 + \sum_{i=1}^{n} k_i\right)} e^{-n\lambda} \mathbf{1}(\lambda > 0) \\ &= Ga(1/2 + \sum_{i=1}^{n} k_i, n) \end{align*}\]

In our case the posterior is $Ga(3/2, 1)$ with a mean of $3/2$.

Note that the priors in (b-d) are not proper densities (the integrals are not finite), however, the resulting posteriors are proper.

5. Neurons Fire in Potter’s Lab#

Data set consisting of 989 firing times in a cell culture of neurons, recorded time instances when a neuron sent a signal to another linked neuron (a spike). The cells from the cortex of an embryonic rat brain were cultured for 18 days on multielectrode arrays. The measurements were taken while the culture was stimulated at a rate of 1 Hz. From this data set, the counts of firings in consecutive time intervals of length 20 milliseconds were derived:

\[\begin{split} \begin{array}{ccccc} 20 & 19 & 26 & 20 & 24 \\ 21 & 24 & 29 & 21 & 17 \\ 23 & 21 & 19 & 23 & 17 \\ 30 & 20 & 20 & 18 & 16 \\ 14 & 17 & 15 & 25 & 21 \\ 16 & 14 & 18 & 22 & 25 \\ 17 & 25 & 24 & 18 & 13 \\ 12 & 19 & 17 & 19 & 19 \\ 19 & 23 & 17 & 17 & 21 \\ 15 & 19 & 15 & 23 & 22 \\ \end{array} \end{split}\]

Code for a NumPy array containing these values is in the first hidden code cell, below.

It is believed that the counts are distributed as Poisson with an unknown parameter $ \lambda $. An expert believes that the number of counts in the interval of 20 milliseconds should be about 15.

What is the likelihood function for these 50 observations?
Using the information the expert provided, elicit an appropriate Gamma prior. Is such a prior unique?
For the prior suggested in (2), find the Bayes’ estimator of $ \lambda $. How does this estimator compare to the MLE?

Solution

The likelihood is proportional to $ \lambda^{\sum_{i=1}^{50} X_i} \exp\{-50\lambda\} $, where $\sum X_i = 989$ is the sum of all counts (total number of firings). The $\sum_i X_i$ is a sufficient statistic here and has a Poisson $\text{Poi}(n\lambda)$ distribution.
A gamma prior with mean 15 is not unique; for any $ x $, $ \text{Ga}(15x, x) $ is such a prior. However, the variances depend on $ x $. For example, priors $ \text{Ga}(150, 10) $, $ \text{Ga}(15, 1) $, $ \text{Ga}(1.5, 0.1) $, $ \text{Ga}(0.15, 0.01) $, etc., have variances 1.5, 15, 150, 1500, etc. The variances indicate the degree of certainty of the expert that the prior mean is 15. Large variances correspond to non-informative choices. Since the sample variance of 50 observations is about 15, it is reasonable to take a prior with larger variance, say $ \text{Ga}(3, 0.2) $.
Show that $ \lambda \mid \sum_i X_i $ is gamma $ \text{Ga} \left( \sum_i X_i + 3, n + 0.2 \right) $. The Bayes estimator for $ \lambda $ can be represented as $ w \times \bar{X} + (1 - w) \times 15 $, where $ w = \frac{n}{n + 0.2} $, emphasizing the fact that the posterior mean is a compromise between the MLE, $ \bar{X} $, and the prior mean, 15.

6. Elicit Inverse Gamma Prior#

Specify the inverse gamma prior

\[ \pi(\theta) = \frac{\beta^\alpha \exp\{-\theta/\beta\}}{\Gamma(\alpha)\theta^{\alpha+1}}, \quad \theta \geq 0; \, \alpha, \beta > 0 \]

if $ E[\theta] = 2 $ and $ \text{Var}(\theta) = 12 $ are elicited from the experts.

Solution

Show that the mean $ \mu $ and variance $ \sigma^2 $ of an inverse gamma prior $ \text{IG}(\alpha, \beta) $ are connected with $ \alpha $ and $ \beta $ as

\[ \alpha = \frac{\mu^2}{\sigma^2} + 2, \quad \beta = \mu \left( \frac{\mu^2}{\sigma^2} + 1 \right) \]

Result: $ \alpha = \frac{7}{3}, \, \beta = \frac{8}{3} $.

7. Derive Jeffreys’ Priors for Poisson $\lambda$, Bernoulli $p$, and Geometric $p$.#

Recall that Jeffreys’ prior for parameter $\theta$ in the likelihood $f(x | \theta)$ is defined as

\[\pi(\theta) \propto \left| \text{det}(I(\theta)) \right|^{1/2}\]

where, for univariate parameters,

\[ I(\theta) = E \left[ \left( \frac{d \log f(x | \theta)}{d\theta} \right)^2 \right] = -E \left[ \frac{d^2 \log f(x | \theta)}{d\theta^2} \right] \]

and expectation is taken with respect to the random variable $X \sim f(x | \theta)$.

Show that Jeffreys’ prior for Poisson distribution $f(x | \lambda) = \frac{\lambda^x}{x!} e^{-\lambda}$, $\lambda \geq 0$, is $\pi(\lambda) = \sqrt{\frac{1}{\lambda}}$.
Show that Jeffreys’ prior for Bernoulli distribution $f(x | p) = p^x (1 - p)^{1-x}$, $0 \leq p \leq 1$, is $\pi(p) \propto \frac{1}{\sqrt{p(1-p)}}$, which is the beta $\text{Be}(1/2, 1/2)$ distribution (or Arcsin distribution).
Show that Jeffreys’ prior for Geometric distribution $f(x | p) = (1 - p)^{x-1} p$, $x = 1, 2, \ldots$ ; $0 \leq p \leq 1$, is $\pi(p) \propto \frac{1}{p \sqrt{1-p}}$.

Solution

For the Poisson distribution with likelihood function:

\[ f(x | \lambda) = \frac{\lambda^x e^{-\lambda}}{x!} \]

First, we differentiate the log-likelihood with respect to $\lambda$:

\[ \frac{d}{d\lambda} \log \left( \frac{\lambda^x e^{-\lambda}}{x!} \right) = \frac{d}{d\lambda} (x \log \lambda - \lambda) \]

Which gives:

\[ \frac{x}{\lambda} - 1 \]

Now, the Fisher Information $I(\lambda)$ is given by:

\[ I(\lambda) = E\left[ \left( \frac{x}{\lambda} - 1 \right)^2 \right] = \frac{E[x^2]}{\lambda^2} - \frac{2E[x]}{\lambda} + 1 \]

Given that $E[x^2] = Var(x) + (E[x])^2$ and for a Poisson distribution, $E[x] = \lambda$ and $Var(x) = \lambda$:

\[ E[x^2] = \lambda + \lambda^2 \]

Substituting this in, we get:

\[ I(\lambda) = \frac{1}{\lambda} \]

Thus, the Jeffreys’ prior is:

\[ \pi(\lambda) \propto \sqrt{\frac{1}{\lambda}} \]

For the Bernoulli distribution:

\[ f(x | p) = p^x (1-p)^{1-x} \]

The log-likelihood is:

\[ L = x \log(p) + (1 - x) \log(1 - p) \]

Differentiating $L$ with respect to $p$ we get:

\[ \frac{\partial L}{\partial p} = \frac{x}{p} - \frac{1-x}{1-p} \]

And the second derivative is:

\[ \frac{\partial^2 L}{\partial p^2} = -\frac{x}{p^2} - \frac{1-x}{(1-p)^2} \]

For a Bernoulli distribution, $E[x] = p$. The Fisher Information $I(p)$ is:

\[ I(p) = \frac{1}{p(1-p)} \]

So, the Jeffreys’ prior is:

\[ \pi(p) \propto \frac{1}{\sqrt{p(1-p)}} \]

For the Geometric distribution:

\[ f(x | p) = (1-p)^{x-1} p \]

The log-likelihood is:

\[ L = (x-1) \log(1-p) + \log(p) \]

Differentiating $L$ with respect to $p$:

\[ \frac{\partial L}{\partial p} = \frac{1}{p} - \frac{x-1}{1-p} \]

And the second derivative is:

\[ \frac{\partial^2 L}{\partial p^2} = -\frac{1}{p^2} - \frac{x-1}{(1-p)^2} \]

For a Geometric distribution, $E[x] = \frac{1}{p}$. The Fisher Information $I(p)$ is:

\[ I(p) = \frac{1}{p^2(1-p)} \]

So, the Jeffreys’ prior is:

\[ \pi(p) \propto \frac{1}{p \sqrt{1-p}} \]

8. Two Scenarios for the Probability of Success#

An experiment may lead to success with probability $ p $, which is to be estimated. Two series of experiments were conducted:

In the first scenario, the experiment is repeated independently 10 times, and the number of successes realized was 1.
In the second scenario, the experiment was repeated until success, and the number of repetitions was 10.

The two likelihoods are Binomial and Geometric, and the moment matching estimate for the probability of success in both cases is $ \hat{p} = 0.1 $. However, classical inference for the two cases (confidence intervals, testing, etc.) is different. Is there any difference in Bayesian inferences? Why yes or no?
For either of the two scenarios, find the Bayes estimator of $p$ if the prior is $ \pi(p) = \frac{1}{p\sqrt{1-p}}$.

Solution

For the Binomial distribution, $E[X] = np = 10p$, and with $X = 1$, leading to $\hat{p} = 0.1$. For the Geometric distribution, $E[N] = 1/p$ and $N = 10$, also leading to $\hat{p} = 0.1$. However, see Example 9.16 (page 413) in the Engineering Biostatistics textbook, known as the Savage Disparity. Since in both cases the likelihood is proportional to $p(1-p)^9$, Bayesian inference coincides, and for a Bayesian, the scenario is irrelevant; all that matters is one success and 10 trials.
The posterior is proportional to $(1-p)^{17/2}$, which is $\text{Be}(1, 19/2)$. Thus, the Bayes’ estimator is $\hat{p}_B = \frac{2}{21}$.

9. Jeffreys’ Prior for Normal Precision#

The Jeffreys’ prior on the normal scale $\sigma $ is $ \pi(\sigma) = \frac{1}{\sigma}$. Consider the precision parameter $\tau = \frac{1}{\sigma^2}$.

Using the invariance property, show that Jeffreys’ prior for $\tau$ is $\pi(\tau) = \frac{1}{\tau}$.

Solution

We know that Jeffreys’ prior for $\sigma$ is $\pi(\sigma) = \frac{1}{\sigma}$.

The invariance property states that if $ \tau = \tau(\sigma) $, then

\[ I^{1/2}(\tau) = I^{1/2}(\sigma) \left| \frac{d\sigma}{d\tau} \right| \]

Here, $ \sigma = \sqrt{\frac{1}{\tau}} $ and $ \frac{d\sigma}{d\tau} = -\frac{1}{2} \tau^{-3/2} $.

Thus,

\[ \pi(\tau) = \pi(\sigma) \left| \frac{d\sigma}{d\tau} \right| = \sqrt{\frac{1}{1/\tau}} \times \frac{1}{2} \tau^{-3/2} = \frac{1}{2\tau} \]

Since the derived prior is improper, we can drop the constant 2 in the denominator and take

\[ \pi(\tau) = \frac{1}{\tau} \]

as Jeffreys’ prior for the precision parameter $\tau$.

10. Derive Jeffreys’ Prior for Maxwell’s $\theta$#

Show that Jeffreys’ prior for Maxwell’s rate parameter $ \theta $ is proportional to $ \frac{1}{\theta} $. Maxwell density is given by

\[ f(x \mid \theta) = \sqrt{\frac{2}{\pi}} \theta^{3/2} x^2 \exp\left( -\frac{1}{2} \theta x^2 \right), \quad x \geq 0, \, \theta > 0 \]
Show that the flat prior on $\log \theta$ is equivalent to $\frac{1}{\theta}$ prior on $\theta$.

Solution

The second derivative of the log-likelihood, when evaluated, is free of $x$, making the expectation straightforward. Given the prior $\pi(\theta) = \frac{1}{\theta}$.
Let $\phi = \log \theta$ have a flat prior. Then,

\[ \pi(\theta) = \pi(\phi) \left| \frac{d\phi}{d\theta} \right| = 1 \times \frac{1}{\theta} \]

11. “Quasi” Jeffreys’ Priors#

Jeffreys himself often recommended priors different from Jeffreys’ priors. For example, for Poisson rate $ \lambda $ he recommended $ \pi(\lambda) \propto \frac{1}{\lambda} $ instead of $ \pi(\lambda) \propto \sqrt{\frac{1}{\lambda}} $.

For $ (\mu, \sigma^2) $, Jeffreys recommended $ \pi(\mu, \sigma^2) \propto \frac{1}{\sigma^2} $. This prior is obtained as the product of separate one-dimensional Jeffreys’ priors for $ \mu $ and $ \sigma^2 $.

Show that the simultaneous Jeffreys’ prior for the two-dimensional parameter $ (\mu, \sigma^2) $ is $ \pi(\mu, \sigma^2) \propto \frac{1}{\sigma^3} $.

Solution

Denote $ \phi = \sigma^2 $. Then the normal likelihood is

\[ L(\mu, \phi) = \frac{1}{\sqrt{2\pi\phi}} \exp\left( -\frac{(x - \mu)^2}{2\phi} \right) \]

and the log likelihood is

\[ \ell(\mu, \phi) = \text{const} - \frac{1}{2} \log \phi - \frac{(x - \mu)^2}{2\phi} \]

Then,

\[ \frac{\partial \ell}{\partial \mu} = \frac{x - \mu}{\phi} \]

\[ \frac{\partial \ell}{\partial \phi} = -\frac{1}{2\phi} + \frac{(x - \mu)^2}{2\phi^2} \]

and

\[ \frac{\partial^2 \ell}{\partial \mu^2} = -\frac{1}{\phi} \]

\[ \frac{\partial^2 \ell}{\partial \mu \partial \phi} = -\frac{x - \mu}{\phi^2} \]

\[ \frac{\partial^2 \ell}{\partial \phi \partial \mu} = \frac{\partial^2 \ell}{\partial \mu \partial \phi} \]

\[ \frac{\partial^2 \ell}{\partial \phi^2} = \frac{1}{2\phi^2} - \frac{(x - \mu)^2}{\phi^3} \]

The Fisher Information matrix is

\[\begin{split} I = -E \left[ \begin{array}{cc} -\frac{1}{\phi} & -\frac{x - \mu}{\phi^2} \\ -\frac{x - \mu}{\phi^2} & \frac{1}{2\phi^2} - \frac{(x - \mu)^2}{\phi^3} \end{array} \right] = \begin{pmatrix} \frac{1}{\phi} & 0 \\ 0 & \frac{1}{2\phi^2} \end{pmatrix} \end{split}\]

and

\[ \det(I) = \frac{1}{2\phi^3} \]

Jeffreys’ prior is proportional to $ |\det(I)|^{1/2} $, so

\[ \pi(\mu, \phi) \propto \frac{1}{\phi^{3/2}} \propto \frac{1}{\sigma^3} \]

12. Haldane Prior for Binomial p#

Haldane [1932] suggested a fully noninformative prior for binomial $ p $ as $ \pi(p) \propto \frac{1}{p(1-p)} $ [beta $\text{Be}(0, 0)$ distribution].

Show that Haldane prior is equivalent to a flat prior on $\text{logit}(p)$.
Suppose $ X \sim \text{Bin}(n, p) $ is observed. What is the posterior? What is the Bayes estimator of $ p $?
What is the predictive distribution for a single future Bernoulli $ Y $? What is the prediction for $ Y $?

Note

While tracking down the Haldane citation, I found this paper (Etz and Wagenmakers [2016]) on some Bayesian history relating to Jeffreys and Haldane and this blog post commenting on it by one of my favorite Bayesian writers, Christian P. Robert.

Just leaving these links here if anyone is interested!

Solution

Let $ \psi $ be the logit of $ p $, i.e.,

\[ \psi = \log\left(\frac{p}{1 - p}\right), \]

and assume that $ \psi $ is given a flat prior, $ \psi \sim 1 $. Then the prior on $ p $ is

\[\begin{align*} \pi(p) &= \pi(\psi) \left| \frac{d\psi(p)}{dp} \right| \\ &= 1 \times \left| \frac{1}{p/(1 - p)} \cdot \frac{p(-1) - 1 \cdot (1 - p)}{(1 - p)^2} \right| \\ &= \frac{1}{p(1 - p)} \end{align*}\]

The posterior is beta $ \text{Be}(x, n - x) $ and the Bayes estimator of $ p $ is $ \frac{x}{n} $, which coincides with the frequentist’s $ \hat{p} $.
First find the predictive distribution for Bernoulli given the posterior from 2:

\[ P(Y\mid X) = \int_0^1 P(Y \mid p) \, \pi(p \mid X) \, dp \]

Likelihood:

\[ P(Y \mid p) = p^y (1 - p)^{1 - y} \]

Posterior (from part 2): $$ \pi(p \mid X) = \dfrac{p^{x - 1} (1 - p)^{n - x - 1}}{B(x, n - x)} $$

Substitute in:

\[\begin{split} \begin{align*} P(Y = y \mid X = x) &= \int_0^1 p^y (1 - p)^{1 - y} \cdot \dfrac{p^{x - 1} (1 - p)^{n - x - 1}}{B(x, n - x)} \, dp \\ &= \dfrac{1}{B(x, n - x)} \int_0^1 p^{x - 1 + y} (1 - p)^{n - x - 1 + (1 - y)} \, dp \\ &= \dfrac{1}{B(x, n - x)} \int_0^1 p^{x + y - 1} (1 - p)^{n - x - y} \, dp \end{align*} \end{split}\]

Recognize the Beta function:

\[ \int_0^1 p^{a - 1} (1 - p)^{b - 1} \, dp = B(a, b) \]

Where:

\[\begin{split} \begin{align*} a &= x + y \\ b &= n - x - y + 1 \end{align*} \end{split}\]

Substitute back in to the posterior predictive and you have:

\[ f(y \mid x) = \frac{B(x + y, n - x - y+ 1)}{B(x, n - x)} \]

Here, using $ B(a, b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a + b)} $ and $ \Gamma(a + 1) = a\Gamma(a) $, we can show

\[\begin{align*} f(0 \mid x) + f(1 \mid x) &= \frac{\Gamma(x)\Gamma(n + 1 - x)}{\Gamma(n + 1)} + \frac{\Gamma(x + 1)\Gamma(n - x)}{\Gamma(n + 1)} \\ &= \frac{\Gamma(x)(n - x)\Gamma(n - x)}{n\Gamma(n)} + \frac{x\Gamma(x)\Gamma(n - x)}{n\Gamma(n)} \\ &= \frac{n - x}{n} + \frac{x}{n} = 1 \end{align*}\]

Thus, the distribution of future observation $ y $ given $ x $ successes in $ n $ trials is

\[\begin{split} \begin{array}{|c|c|c|} \hline y \mid x & 0 & 1 \\ \hline \text{prob} & \frac{n - x}{n} & \frac{x}{n} \\ \hline \end{array} \end{split}\]

Note that the prediction for future $ y $ is the mean of the posterior predictive distribution, which is $ \frac{x}{n} $. The same result is obtained when $ E[y] = p $ is integrated with respect to the posterior $ \text{Be}(x, n - x) $. Check this!

13. Eliciting a Normal Prior#

We are eliciting a normal prior $ N(\mu, \sigma^2) $ from an expert who can specify percentiles. If the 20th and 70th percentiles are specified as 2.7 and 4.8, respectively, how should $ \mu $ and $ \sigma $ be elicited?

Solution

If $ x_p $ is the $ p $th quantile (100% $ p $th percentile), then $x_p = \mu + z_p \sigma$. A system of two equations with two unknowns is formed with $z_p$.

The solution is $\mu = 3.99382 \approx 4$, $\sigma = 1.53734$.

14. Jigsaw#

An experiment with a sample of 18 nursery-school children involved the elapsed time required to put together a small jigsaw puzzle. The times were:

\[\begin{split} \begin{array}{cccccc} 3.1 & 3.2 & 3.4 & 3.6 & 3.7 & 4.2 \\ 4.3 & 4.5 & 4.7 & 5.2 & 5.6 & 6.0 \\ 6.1 & 6.6 & 7.3 & 8.2 & 10.8 & 13.6 \\ \end{array} \end{split}\]

Assume that data are coming from normal $ N(\mu, \sigma^2) $ with $ \sigma^2 = 8 $. For parameter $ \mu $, set a normal prior with mean 5 and variance 6.

Find the Bayes estimator and 95% credible set for the population mean $ \mu $.
Find the posterior probability of hypothesis $ H_0: \mu \leq 5 $.
What is your prediction for a single future observation?

Solution

Find the Bayes estimator and 95% credible set for population mean $\mu$.

We can define our model like this:

\[\begin{align*} x_i|\mu &\sim N(\mu, 8) \\ \mu & \sim N(5, 6) \end{align*}\]

This is the Normal-Normal conjugate pair for a fixed variance and random mean, with $n=18$ and $\bar{X} \approx 5.78333$

Our posterior is then:

\[\begin{align*} \pi(\theta|x) &\sim N\left(\frac{\tau^2}{\tau^2 +\sigma^2/n}\bar{X} + \frac{\sigma^2/n}{\tau^2 + \sigma^2/n}\mu_0, \frac{\tau^2\sigma^2/n}{\tau^2+ \sigma^2/n}\right) \\ & \sim N(\frac{6}{6 + 8/18}\bar{X} + \frac{8/18}{6 + 8/18}(5), \frac{6(8/18)}{6+8/18}) \\ & \sim N(5.72931, 0.41379) \end{align*}\]

Our Bayes estimator will be the posterior mean, 5.72931.

The 95% equitailed credible set: (4.4685, 6.9901). HPD set will be the same since this is a symmetrical posterior.

Find the posterior probability of hypothesis $H_0 : \mu \leq 5$.

Use the posterior cdf (see code below) with a result of 0.1284.

What is your prediction for a single future observation?

The single best prediction in this case will be equal to the posterior mean, but in general, we should use the posterior predictive distribution for finding predictions.

Show code cell content Hide code cell content

# pymc solution for Q14
import pymc as pm
import arviz as az

with pm.Model() as m:
    mu_prior = pm.Normal("mu", 5, sigma=6**0.5)

    likelihood = pm.Normal("lik", mu_prior, sigma=8**0.5, observed=data)

    trace = pm.sample(5000)
    pm.sample_posterior_predictive(trace, extend_inferencedata=True)


print(trace.posterior_predictive.to_array().mean())

az.summary(trace, hdi_prob=0.95, kind="stats")

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu]

100.00% [24000/24000 00:00<00:00 Sampling 4 chains, 0 divergences]

Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 1 seconds.
Sampling: [lik]

100.00% [20000/20000 00:00<00:00]

<xarray.DataArray ()> Size: 8B
array(5.73871452)

	mean	sd	hdi_2.5%	hdi_97.5%
mu	5.727	0.651	4.451	6.968

15. Jeremy and Poisson#

Jeremy believes that the normal model on his IQ test scores is not appropriate. After all, the scores are reported as integers. So he proposes a Poisson model; the scores to be modeled as Poisson:

\[y \sim \text{Poisson}(\theta)\]

An expert versed in GT student’s intellectual abilities is asked to elicit a prior on $\theta$. The expert elicits a gamma prior:

\[\theta \sim \text{Gamma}(30, 0.25)\]

Jeremy gets the test and scores $y = 98$.

What is the Bayes estimator of $\theta$? Find this estimator exactly.
Using a PPL, confirm that simulations agree with the theoretical result in (a).

Solution

Use the conjugate table to find that $\theta \mid y \sim \text{Gamma}(128, \frac{5}{4})$

The posterior mean is $128 \cdot \frac{4}{5} = 102.4$ and variance $128 \cdot \frac{16}{25} = 81.92$. The posterior standard deviation is $9.0510$.

See the PyMC code below for part 2.

Show code cell content Hide code cell content

# pymc solution for Q15
import pymc as pm
import arviz as az

with pm.Model() as m:
    theta_prior = pm.Gamma("theta", 30, 0.25)

    likelihood = pm.Poisson("lik", theta_prior, observed=[98])

    trace = pm.sample(4000)

az.summary(trace, hdi_prob=0.95, kind="stats")

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [theta]

100.00% [20000/20000 00:00<00:00 Sampling 4 chains, 0 divergences]

Sampling 4 chains for 1_000 tune and 4_000 draw iterations (4_000 + 16_000 draws total) took 1 seconds.

	mean	sd	hdi_2.5%	hdi_97.5%
theta	102.301	9.06	85.352	120.737

16. NPEB for p in the Geometric Distribution#

A geometric random variable $X$ counts the number of failures before the first success, when the probability of success is $p$ (and a failure $1 - p$). The PDF of $X$ is:

\[ P(X = x) = (1 - p)^x \times p, \quad x = 0, 1, 2, \ldots; \quad 0 \leq p \leq 1 \]

We simulated a sample of size 2400 from a geometric distribution with a probability of success 0.32. The following (summarized) sample was obtained:

$x$	Frequency
0	758
1	527
2	379
3	229
4	162
5	121
6	79
7	56
8	30
9	20
10	15
11	6
12	6
13	4
14	1
15	0
16	0
17	4
18	1
19	1
20	1
21+	0
Total	2400

Develop a Nonparametric Empirical Bayes Estimator if the prior on $p$ is $g(p)$, $0 \leq p \leq 1$.
Compute the empirical Bayes estimator developed in 1. on the simulated sample for different values of $x$.

Solution

Solution for 1: The likelihood and prior are:

\[ f(x \mid p) = (1 - p)^x p, \quad x = 0, 1, 2, \ldots; \quad p \sim g(p), \quad 0 \leq p \leq 1 \]

leading to the marginal for $X$:

\[ m(x) = \int_0^1 f(x \mid p) \, dG(p) = \int_0^1 (1 - p)^x p g(p) \, dp \]

The posterior mean is:

\[\begin{align*} E(p \mid x) &= \frac{1}{m(x)} \int_0^1 p (1 - p)^x p g(p) \, dp \\ &= \frac{1}{m(x)} \int_0^1 [1 - (1 - p)] (1 - p)^x p g(p) \, dp \\ &= \frac{1}{m(x)} \left[ m(x) - \int_0^1 (1 - p)^{x+1} p g(p) \, dp \right] \\ &= 1 - \frac{m(x+1)}{m(x)} \end{align*}\]

An automatic estimator for $m(x)$ is:

\[ \hat{m}(x) = \frac{\text{Number of observations equal to } x}{\text{Total } \# \text{ of observations}} \]

This leads to:

\[ p^* = 1 - \frac{\text{frequency at } x + 1}{\text{frequency at } x} \]

\[ \hat{p} = \min \left\{1, \max \{0, p^*\} \right\} \]

In this case $\hat{p}$ is free of the prior distribution $g$ (although the marginal depends on $g$).

Solution for 2: For the simulated data, the estimators (at particular values of $x$) are given in the following table:

\[\begin{split} \begin{array}{|c|r|r|} \hline x & \text{Frequency} & \hat{p} \\ \hline 0 & 758 & 0.3047 \\ \hline 1 & 527 & 0.2808 \\ \hline 2 & 379 & 0.3958 \\ \hline 3 & 229 & 0.2926 \\ \hline 4 & 162 & 0.2531 \\ \hline 5 & 121 & 0.3471 \\ \hline 6 & 79 & 0.2911 \\ \hline 7 & 56 & 0.4643 \\ \hline 8 & 30 & 0.3333 \\ \hline 9 & 20 & 0.2500 \\ \hline 10 & 15 & 0.6000 \\ \hline 11 & 6 & 0.0000 \\ \hline 12 & 6 & 0.3333 \\ \hline 13 & 4 & 0.7500 \\ \hline 14 & 1 & 1.0000 \\ \hline 15 & 0 & \text{NaN} \\ \hline 16 & 0 & 0 \\ \hline 17 & 4 & 0.7500 \\ \hline 18 & 1 & 0 \\ \hline 19 & 1 & 0 \\ \hline 20 & 1 & 1 \\ \hline 21+ & 0 & \text{NaN} \\ \hline \end{array} \end{split}\]

Note that for $x \geq 10$ the NPEB estimators become unreliable due to low frequency counts.

17. Lifetimes and Predictive Distribution#

Suppose that $T_1, \ldots, T_n$ are exponential $\text{Exp}(\theta)$ lifetimes, where $\theta$ is the rate parameter. Let the prior on $\theta$ be exponential $\text{Exp}(\tau)$, where $\tau$ is also a rate parameter.

Denote with $T$ the total observed lifetime $\sum_{i=1}^n T_i$. Then, $T$ is gamma $\text{Gamma}(n, \theta)$ distributed. Show:

Marginal (prior predictive) for $T$ is

\[ m_T(t) = \frac{n\tau t^{n-1}}{(\tau+t)^{n+1}}, \quad t > 0 \]

Posterior for $\theta$ given $T = t$ is $\text{Gamma}(n + 1, \tau + t)$.

\[ \pi(\theta \mid y) = \frac{\theta^n (\tau + t)^{n+1}}{\Gamma(n + 1)} \exp\{- (\tau + t) \theta\} \]

Posterior predictive distribution for a new $T^*$, given $T = t$ is

\[ f(t^* \mid t) = \int_0^\infty \theta \exp \{- \theta t^* \} \pi(\theta \mid t) \, d\theta = \frac{(n + 1)(\tau + t)^{n+1}}{(\tau + t + t^*)^{n+2}} \]

Expected value (with respect to the posterior predictive distribution) of $T^*$ (that is, the prediction for a new $T^*$) is

\[ E(T^* \mid T = t) = \frac{\tau + t}{n} \]

Solution

\[\begin{align*} m_T (t) &= \int_{\Theta} f(T \mid \theta) \pi(\theta) \, d\theta \\ \end{align*}\]

Substituting for the given gamma and exponential distributions:

\[\begin{align*} m_T (t) &= \int_0^\infty \frac{\theta^n}{\Gamma(n)} t^{n-1} e^{-\theta t} \tau e^{-\tau \theta} \, d\theta \\ &= \frac{\tau t^{n-1}}{\Gamma(n)} \int_0^\infty \theta^n e^{-\theta (t+\tau)} \, d\theta \end{align*}\]

Let $u = \theta(\tau + t)$, $du = d\theta (\tau + t) \rightarrow d\theta = \frac{du}{\tau + t}$. The bounds of integration remain the same in this case. Substituting:

\[\begin{align*} \frac{\tau t^{n-1}}{\Gamma(n)} \int_0^\infty \left( \frac{u}{\tau + t} \right)^n e^{-u} \frac{du}{\tau + t} &= \frac{\tau t^{n-1}}{\Gamma(n)(\tau + t)^{n+1}} \int_0^\infty u^n e^{-u} \, du \end{align*}\]

The integral by definition is $\Gamma(n + 1)$, substituting:

\[\begin{align*} \frac{\tau t^{n-1}}{(\tau + t)^{n+1}} \frac{\Gamma(n + 1)}{\Gamma(n)} &= \frac{\tau t^{n-1}}{(\tau + t)^{n+1}} \frac{n!}{(n - 1)!} \\ &= \frac{\tau t^{n-1}}{(\tau + t)^{n+1}} \frac{n \cdot (n - 1)!}{(n - 1)!} \end{align*}\]

We just showed that the product of the marginal and prior is:

\[\begin{align*} \frac{\tau t^{n-1}}{\Gamma(n)} \theta^n e^{-\theta (t + \tau)} \end{align*}\]

This is proportional to:

\[\begin{align*} \theta^n e^{-\theta (t + \tau)} \end{align*}\]

Which is proportional to $\text{Gamma}(n+1, \tau + t)$

We are given the definition of posterior predictive with the distributions substituted in so all we have to do is evaluate the integral, it is the same procedure as (a) but now $u = \theta(\tau + t + t^*)$

\[\begin{align*} E[T^* \mid T = t] &\equiv \int t^* f(t^* \mid t) \, dt^* = \int_0^\infty t^* \frac{(n + 1)(\tau + t)^{n+1}}{(\tau + t + t^*)^{n+2}} \, dt^* \end{align*}\]

substitute $\tau + t + t^* = u, \, dt^* = du$

\[\begin{align*} = (n + 1)(\tau + t)^{n+1} \int_{t+\tau}^\infty \frac{u - t - \tau}{u^{n+2}} \, du &= (n + 1)(\tau + t)^{n+1} \int_{t+\tau}^\infty \left( \frac{u^{-n-1}}{n} - \frac{(t + \tau)u^{-n-2}}{n + 1} \right) \, du \end{align*}\]

\[\begin{align*} = (n + 1)(\tau + t)^{n+1} \left( - \frac{u^{-n}}{n} + \frac{(t + \tau)u^{-n-1}}{n + 1} \right) \bigg|_{t+\tau}^\infty \end{align*}\]

\[\begin{align*} = (n + 1)(\tau + t)^{n+1} \left[ \left( - \frac{(t + \tau)^{-n}}{n} + \frac{(t + \tau) (t + \tau)^{-n-1}}{n + 1} \right) - (0 - 0) \right] \end{align*}\]

\[\begin{align*} = \frac{(t + \tau)(n + 1)}{n} - \frac{(t + \tau)(n + 1)}{n + 1} &= (t + \tau)(n + 1) \left( \frac{1}{n} - \frac{1}{n + 1} \right) \end{align*}\]

\[\begin{align*} = (t + \tau)(n + 1) \left( \frac{n + 1 - n}{n(n + 1)} \right) &= \frac{t + \tau}{n} \end{align*}\]

18. Normal Likelihood with Improper Priors#

Let $X_1, \ldots, X_n$ be iid normals $N (\theta, \sigma^2)$, where

$\theta$ is the parameter of interest, and $\sigma^2$ is known. Assume a flat prior on $\theta$:

\[ \pi(\theta) = 1, \, -\infty < \theta < \infty \]

Show that the posterior is

\[ [\theta \mid X_1, \ldots, X_n] \sim N \left( \bar{X}, \frac{\sigma^2}{n} \right) \]

where $\bar{X}$ is the mean of the observations.

$\sigma^2$ is the parameter of interest, and $\theta$ is known. Let the prior on $\sigma^2$ be

\[ \pi(\sigma^2) = \frac{1}{\sigma^2}, \quad \sigma^2 > 0. \]

Show that the posterior is inverse gamma

\[ \sigma^2 \mid X_1, \ldots, X_n \sim \text{IG} \left( \frac{n}{2}, \frac{\sum_{i=1}^n (X_i - \theta)^2}{2} \right) \]

where $\text{IG}(a, b)$ stands for the distribution with a density

\[ f(y) = \frac{b^a}{\Gamma(a)} y^{-a-1} e^{-b/y}, \quad a, b > 0, \, y \geq 0. \]

Solution

\[\begin{align*} f(x \mid \theta) &= \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - \theta)^2}{2\sigma^2}} = C_0 e^{-\frac{1}{2\sigma^2} \sum_{i=1}(x_i - \theta)^2} \\ \pi(\theta \mid x) &\propto f(x \mid \theta) \pi(\theta) \\ &= C_0 e^{-\frac{1}{2\sigma^2} \sum_{i=1}(x_i - \theta)^2} \cdot 1 \\ &\propto e^{-\frac{1}{2\sigma^2} \sum_{i=1} (x_i^2 - 2x_i\theta + \theta^2)} \\ &\propto e^{-\frac{1}{2\sigma^2} \left( \sum_{i=1} x_i^2 - 2\theta \sum_{i=1} x_i + n\theta^2 \right)} \\ &\propto e^{-\frac{1}{2\sigma^2} \left( n\theta^2 - 2\theta \sum_{i=1} x_i \right)} \\ &= e^{-\frac{n}{2\sigma^2} \left( \theta^2 - 2\theta \frac{1}{n} \sum_{i=1} x_i \right)} \\ &\propto e^{-\frac{1}{2 \left( \frac{\sigma^2}{n} \right)} \left( \theta - \frac{\sum_{i=1} x_i}{n} \right)^2} \\ &\sim N \left( \bar{X}, \frac{\sigma^2}{n} \right) \end{align*}\]

A note about the second to last proportion, this is just completing the square and discarding the extra bit needed to complete the square because it is proportional!

\[\begin{align*} \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - \theta)^2}{2\sigma^2}} &= C_0 \sigma^{-n} e^{-\frac{1}{2\sigma^2} \sum_{i=1}(x_i - \theta)^2} \\ \pi(\sigma^2 \mid x) &\propto f(x \mid \sigma^2) \pi(\sigma^2) \\ &= C_0 \sigma^{-n} e^{-\frac{1}{2\sigma^2} \sum_{i=1}(x_i - \theta)^2} \cdot \frac{1}{\sigma^2} \\ &\propto \sigma^{-(n+2)} e^{-\frac{1}{2\sigma^2} \sum_{i=1}(x_i - \theta)^2} \\ &\propto (\sigma^2)^{-\left(\frac{n+2}{2}\right)} e^{-\frac{\sum_{i=1}(x_i - \theta)^2}{2\sigma^2}} \end{align*}\]

This is proportional to an inverse gamma distribution but let us be explicit about the parameters. The $(\sigma^2)^{-\left(\frac{n+2}{2}\right)}$ is like $y^{-(a+1)}$ so:

\[ -\left(\frac{n + 2}{2}\right) = -(a + 1) \rightarrow n + 2 = 2a + 2 \rightarrow a = \frac{n}{2} \]

The $e^{-\frac{\sum_{i=1}(x_i - \theta)^2}{2\sigma^2}}$ is like $e^{-\frac{b}{y}}$ so:

\[ \frac{\sum_{i=1}(x_i - \theta)^2}{2} = b \]

	mean	sd	hdi_2.5%	hdi_97.5%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
mean	4.723	3.771	-2.838	12.063	0.048	0.036	6552.0	5865.0	1.0
precision	0.012	0.006	0.002	0.025	0.000	0.000	7070.0	7276.0	1.0
variance	115.911	90.591	26.121	271.055	1.174	0.830	7070.0	7276.0	1.0

Supplementary Exercises 4.8

Contents

Supplementary Exercises 4.8#

2. Mosaic Virus#

3. FIGO#

4. Histocompatibility#

5. Neurons Fire in Potter’s Lab#

6. Elicit Inverse Gamma Prior#

7. Derive Jeffreys’ Priors for Poisson \(\lambda\), Bernoulli \(p\), and Geometric \(p\).#

8. Two Scenarios for the Probability of Success#

9. Jeffreys’ Prior for Normal Precision#

10. Derive Jeffreys’ Prior for Maxwell’s \(\theta\)#

11. “Quasi” Jeffreys’ Priors#

12. Haldane Prior for Binomial p#

13. Eliciting a Normal Prior#

14. Jigsaw#

15. Jeremy and Poisson#

16. NPEB for p in the Geometric Distribution#

17. Lifetimes and Predictive Distribution#

18. Normal Likelihood with Improper Priors#