Supplementary Exercises 4.8#

Warning

This page contains solutions! We recommend attempting each problem before peeking.

Note

Question 1 was a duplicate of 4.3 question 16.

2. Mosaic Virus#

A single leaf is taken from each of 8 different tobacco plants. Each leaf is then divided in half, and given one of two preparations of mosaic virus. Researchers wanted to examine if there is a difference in the mean number of lesions from the two preparations. Here is the raw data:

\[\begin{split} \begin{array}{ccc} \text{Plant} & \text{Prep 1} & \text{Prep 2} \\ 1 & 38 & 29 \\ 2 & 40 & 35 \\ 3 & 26 & 31 \\ 4 & 33 & 31 \\ 5 & 21 & 14 \\ 6 & 27 & 37 \\ 7 & 41 & 22 \\ 8 & 36 & 25 \\ \end{array} \end{split}\]

Assume the normal distribution for the difference between the populations/samples. Using a PPL, find:

  1. the 95% credible set for \(\mu_1 - \mu_2\), and

  2. posterior probability of hypothesis \(H_1: \mu_1 - \mu_2 \geq 0\).

Use noninformative priors.

Hint: Since this is a paired two sample problem, a single model should be placed on the difference.

Hide code cell content
import pymc as pm
import numpy as np
import arviz as az

prep1 = np.array([38, 40, 26, 33, 21, 27, 41, 36])
prep2 = np.array([29, 35, 31, 31, 14, 37, 22, 25])
diff = prep1 - prep2

with pm.Model() as m:
    tau = pm.Gamma("precision", 0.001, 0.001)
    mu = pm.Normal("mean", 0, tau=0.0001)
    sigma2 = pm.Deterministic("variance", 1 / tau)

    pm.Normal("likelihood", mu, tau=tau, observed=diff)

    trace = pm.sample(3000)

results = az.summary(trace, hdi_prob=0.95)

results
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [precision, mean]
100.00% [16000/16000 00:00<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 1 seconds.
mean sd hdi_2.5% hdi_97.5% mcse_mean mcse_sd ess_bulk ess_tail r_hat
mean 4.723 3.771 -2.838 12.063 0.048 0.036 6552.0 5865.0 1.0
precision 0.012 0.006 0.002 0.025 0.000 0.000 7070.0 7276.0 1.0
variance 115.911 90.591 26.121 271.055 1.174 0.830 7070.0 7276.0 1.0

3. FIGO#

Despite the excellent prognosis of FIGO stage I, type I endometrial cancers, a substantial number of patients experience recurrence and die from this disease. Zeimet et al. [2013] conducted a retrospective multicenter cohort study to determine the expression of L1CAM by immunohistochemistry in 1021 endometrial cancer specimens with the goal of predicting clinical outcomes. Of the 1021 included cancers, 17.7% were rated L1CAM-positive. Of these L1CAM-positive cancers, 51.4% recurred during follow-up compared with 2.9% of L1CAM-negative cancers. Patients with L1CAM-positive cancers had poorer disease-free and overall survival.

It is stated that L1CAM has been the best-ever published prognostic factor in FIGO stage I, type I endometrial cancers and shows clear superiority over the standardly used multifactor risk score. L1CAM expression in type I cancers indicates the need for adjuvant treatment. This adhesion molecule might serve as a treatment target for the fully humanized anti-L1CAM antibody currently under development for clinical use.

FIGO I/I Endometrial Cancer

Recurred

Did Not Recur

Total

L1CAM Positive

L1CAM Negative

Total

1021

  1. Using the information supplied, fill in the table (round the entries to the closest integer).

  2. The estimators of the population sensitivity and specificity are simple relative frequencies (ratios): True Positives (TP)/Recurred and True Negatives (TN)/Not Recurred. Consider now a Bayesian version of this problem. Using a PPL, model TP and TN as Binomials, place priors on population sensitivity (\(p_1\)) and Specificity (\(p_2\)) and find their Bayesian estimators. Explore the estimators for your favorite choice of priors on \(p_1\) and \(p_2\): Jeffreys’, uniform (0, 1), flat on logit, etc.

Hide code cell content
import pymc as pm
import numpy as np
import arviz as az

total = 1021
totalpositive = 181  # rounded 0.177* 1021= 180.7170
totalnegative = 840  # % as 1021 - 181
tp = 93  # true positiveas rounded181 *0.514 =93.0340
fp = 88  # false positives, as 181-93
fn = 24  # false negatives rounded 840 *0.029=24.3600
tn = 816  # true negatives, as 840-24

with pm.Model() as m:

    # priors
    prior_sensitivity = pm.Uniform("sensitivity", 0, 1)
    prior_specificity = pm.Uniform("specificity", 0, 1)

    # Binomial Likelihoods
    sensitivity_likelihood = pm.Binomial(
        "sensitivity_likelihood", p=prior_sensitivity, n=tp + fn, observed=tp
    )
    specificity_likelihood = pm.Binomial(
        "specificity_likelihood", p=prior_specificity, n=tn + fp, observed=tn
    )

    # sampling
    trace = pm.sample(5000, target_accept=0.95)

az.summary(trace, hdi_prob=0.95)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sensitivity, specificity]
100.00% [24000/24000 00:01<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 2 seconds.
mean sd hdi_2.5% hdi_97.5% mcse_mean mcse_sd ess_bulk ess_tail r_hat
sensitivity 0.790 0.037 0.718 0.863 0.0 0.0 12138.0 11296.0 1.0
specificity 0.902 0.010 0.882 0.920 0.0 0.0 14302.0 12679.0 1.0

4. Histocompatibility#

A patient who is waiting for an organ transplant needs a histocompatible donor who matches the patient’s human leukocyte antigen (HLA) type.

For a given patient, the number of matching donors per 1000 National Blood Bank records is modeled as Poisson with unknown rate \(\lambda\). If a randomly selected group of 1000 records showed exactly one match, estimate \(\lambda\) in Bayesian fashion.

For \(\lambda\)​ assume:

  1. Gamma prior \(\text{Ga}(\alpha=2, \beta=1)\)​;

  2. half-flat prior \(\lambda \propto 1\), for \(\lambda > 0\)​;

  3. invariance prior \(\pi(\lambda) = \frac{1}{\lambda}\), for \(\lambda > 0\)​;

  4. Jeffreys prior \(\pi(\lambda) = \sqrt{\frac{1}{\lambda}}\), for \(\lambda > 0\).

5. Neurons Fire in Potter’s Lab#

Data set consisting of 989 firing times in a cell culture of neurons, recorded time instances when a neuron sent a signal to another linked neuron (a spike). The cells from the cortex of an embryonic rat brain were cultured for 18 days on multielectrode arrays. The measurements were taken while the culture was stimulated at a rate of 1 Hz. From this data set, the counts of firings in consecutive time intervals of length 20 milliseconds were derived:

\[\begin{split} \begin{array}{ccccc} 20 & 19 & 26 & 20 & 24 \\ 21 & 24 & 29 & 21 & 17 \\ 23 & 21 & 19 & 23 & 17 \\ 30 & 20 & 20 & 18 & 16 \\ 14 & 17 & 15 & 25 & 21 \\ 16 & 14 & 18 & 22 & 25 \\ 17 & 25 & 24 & 18 & 13 \\ 12 & 19 & 17 & 19 & 19 \\ 19 & 23 & 17 & 17 & 21 \\ 15 & 19 & 15 & 23 & 22 \\ \end{array} \end{split}\]

Code for a NumPy array containing these values is in the first hidden code cell, below.

It is believed that the counts are distributed as Poisson with an unknown parameter \( \lambda \). An expert believes that the number of counts in the interval of 20 milliseconds should be about 15.

  1. What is the likelihood function for these 50 observations?

  2. Using the information the expert provided, elicit an appropriate Gamma prior. Is such a prior unique?

  3. For the prior suggested in (2), find the Bayes’ estimator of \( \lambda \). How does this estimator compare to the MLE?

Hide code cell content
import numpy as np

# fmt: off
firing_counts = np.array([
    20, 19, 26, 20, 24,
    21, 24, 29, 21, 17,
    23, 21, 19, 23, 17,
    30, 20, 20, 18, 16,
    14, 17, 15, 25, 21,
    16, 14, 18, 22, 25,
    17, 25, 24, 18, 13,
    12, 19, 17, 19, 19,
    19, 23, 17, 17, 21,
    15, 19, 15, 23, 22
])
# fmt: on

6. Elicit Inverse Gamma Prior#

Specify the inverse gamma prior

\[ \pi(\theta) = \frac{\beta^\alpha \exp\{-\theta/\beta\}}{\Gamma(\alpha)\theta^{\alpha+1}}, \quad \theta \geq 0; \, \alpha, \beta > 0 \]

if \( E[\theta] = 2 \) and \( \text{Var}(\theta) = 12 \) are elicited from the experts.

7. Derive Jeffreys’ Priors for Poisson \(\lambda\), Bernoulli \(p\), and Geometric \(p\).#

Recall that Jeffreys’ prior for parameter \(\theta\) in the likelihood \(f(x | \theta)\) is defined as

\[\pi(\theta) \propto \left| \text{det}(I(\theta)) \right|^{1/2}\]

where, for univariate parameters,

\[ I(\theta) = E \left[ \left( \frac{d \log f(x | \theta)}{d\theta} \right)^2 \right] = -E \left[ \frac{d^2 \log f(x | \theta)}{d\theta^2} \right] \]

and expectation is taken with respect to the random variable \(X \sim f(x | \theta)\).

  1. Show that Jeffreys’ prior for Poisson distribution \(f(x | \lambda) = \frac{\lambda^x}{x!} e^{-\lambda}\), \(\lambda \geq 0\), is \(\pi(\lambda) = \sqrt{\frac{1}{\lambda}}\).

  2. Show that Jeffreys’ prior for Bernoulli distribution \(f(x | p) = p^x (1 - p)^{1-x}\), \(0 \leq p \leq 1\), is \(\pi(p) \propto \frac{1}{\sqrt{p(1-p)}}\), which is the beta \(\text{Be}(1/2, 1/2)\) distribution (or Arcsin distribution).

  3. Show that Jeffreys’ prior for Geometric distribution \(f(x | p) = (1 - p)^{x-1} p\), \(x = 1, 2, \ldots\) ; \(0 \leq p \leq 1\), is \(\pi(p) \propto \frac{1}{p \sqrt{1-p}}\).

8. Two Scenarios for the Probability of Success#

An experiment may lead to success with probability \( p \), which is to be estimated. Two series of experiments were conducted:

  • In the first scenario, the experiment is repeated independently 10 times, and the number of successes realized was 1.

  • In the second scenario, the experiment was repeated until success, and the number of repetitions was 10.

  1. The two likelihoods are Binomial and Geometric, and the moment matching estimate for the probability of success in both cases is \( \hat{p} = 0.1 \). However, classical inference for the two cases (confidence intervals, testing, etc.) is different. Is there any difference in Bayesian inferences? Why yes or no?

  2. For either of the two scenarios, find the Bayes estimator of \(p\) if the prior is \( \pi(p) = \frac{1}{p\sqrt{1-p}}\).

9. Jeffreys’ Prior for Normal Precision#

The Jeffreys’ prior on the normal scale \(\sigma \) is \( \pi(\sigma) = \frac{1}{\sigma}\). Consider the precision parameter \(\tau = \frac{1}{\sigma^2}\).

Using the invariance property, show that Jeffreys’ prior for \(\tau\) is \(\pi(\tau) = \frac{1}{\tau}\).

10. Derive Jeffreys’ Prior for Maxwell’s \(\theta\)#

  1. Show that Jeffreys’ prior for Maxwell’s rate parameter \( \theta \) is proportional to \( \frac{1}{\theta} \). Maxwell density is given by

    \[ f(x \mid \theta) = \sqrt{\frac{2}{\pi}} \theta^{3/2} x^2 \exp\left( -\frac{1}{2} \theta x^2 \right), \quad x \geq 0, \, \theta > 0 \]
  2. Show that the flat prior on \(\log \theta\) is equivalent to \(\frac{1}{\theta}\) prior on \(\theta\).

11. “Quasi” Jeffreys’ Priors#

Jeffreys himself often recommended priors different from Jeffreys’ priors. For example, for Poisson rate \( \lambda \) he recommended \( \pi(\lambda) \propto \frac{1}{\lambda} \) instead of \( \pi(\lambda) \propto \sqrt{\frac{1}{\lambda}} \).

For \( (\mu, \sigma^2) \), Jeffreys recommended \( \pi(\mu, \sigma^2) \propto \frac{1}{\sigma^2} \). This prior is obtained as the product of separate one-dimensional Jeffreys’ priors for \( \mu \) and \( \sigma^2 \).

Show that the simultaneous Jeffreys’ prior for the two-dimensional parameter \( (\mu, \sigma^2) \) is \( \pi(\mu, \sigma^2) \propto \frac{1}{\sigma^3} \).

12. Haldane Prior for Binomial p#

Haldane [1932] suggested a fully noninformative prior for binomial \( p \) as \( \pi(p) \propto \frac{1}{p(1-p)} \) [beta \(\text{Be}(0, 0)\) distribution].

  1. Show that Haldane prior is equivalent to a flat prior on \(\text{logit}(p)\).

  2. Suppose \( X \sim \text{Bin}(n, p) \) is observed. What is the posterior? What is the Bayes estimator of \( p \)?

  3. What is the predictive distribution for a single future Bernoulli \( Y \)? What is the prediction for \( Y \)?

Note

While tracking down the Haldane citation, I found this paper (Etz and Wagenmakers [2016]) on some Bayesian history relating to Jeffreys and Haldane and this blog post commenting on it by one of my favorite Bayesian writers, Christian P. Robert.

Just leaving these links here if anyone is interested!

13. Eliciting a Normal Prior#

We are eliciting a normal prior \( N(\mu, \sigma^2) \) from an expert who can specify percentiles. If the 20th and 70th percentiles are specified as 2.7 and 4.8, respectively, how should \( \mu \) and \( \sigma \) be elicited?

Hide code cell content
from scipy.stats import norm

z_20 = norm.ppf(0.20)
z_70 = norm.ppf(0.70)

z_20, z_70
(-0.8416212335729142, 0.5244005127080407)

14. Jigsaw#

An experiment with a sample of 18 nursery-school children involved the elapsed time required to put together a small jigsaw puzzle. The times were:

\begin{array}{cccccc} 3.1 & 3.2 & 3.4 & 3.6 & 3.7 & 4.2 \ 4.3 & 4.5 & 4.7 & 5.2 & 5.6 & 6.0 \ 6.1 & 6.6 & 7.3 & 8.2 & 10.8 & 13.6 \ \end{array}

Assume that data are coming from normal \( N(\mu, \sigma^2) \) with \( \sigma^2 = 8 \). For parameter \( \mu \), set a normal prior with mean 5 and variance 6.

  1. Find the Bayes estimator and 95% credible set for the population mean \( \mu \).

  2. Find the posterior probability of hypothesis \( H_0: \mu \leq 5 \).

  3. What is your prediction for a single future observation?

Hide code cell content
# part 1
import numpy as np
import scipy.stats as ss

alpha = 0.05
mean = 5.72931
var = 0.41379

post = ss.norm(loc=mean, scale=var**0.5)

round(post.ppf(alpha / 2), 4), round(post.ppf(1 - alpha / 2), 4)
(4.4685, 6.9901)
Hide code cell content
# part 1 HPD
from scipy.optimize import fsolve


def conditions(x, post, alpha):
    lwr, upr = x

    cond_1 = post.pdf(upr) - post.pdf(lwr)
    cond_2 = post.cdf(upr) - post.cdf(lwr) - (1 - alpha)

    return cond_1, cond_2


fsolve(conditions, (4.5, 7.0), args=(post, alpha))
array([4.46853355, 6.99008645])
Hide code cell content
round(post.cdf(5), 4)  # post probability, part 2
0.1284
Hide code cell content
# pymc solution for Q14
import pymc as pm
import arviz as az

with pm.Model() as m:
    mu_prior = pm.Normal("mu", 5, sigma=6**0.5)

    likelihood = pm.Normal("lik", mu_prior, sigma=8**0.5, observed=data)

    trace = pm.sample(5000)
    pm.sample_posterior_predictive(trace, extend_inferencedata=True)


print(trace.posterior_predictive.to_array().mean())

az.summary(trace, hdi_prob=0.95, kind="stats")
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu]
100.00% [24000/24000 00:00<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 1 seconds.
Sampling: [lik]
100.00% [20000/20000 00:00<00:00]
<xarray.DataArray ()> Size: 8B
array(5.73871452)
mean sd hdi_2.5% hdi_97.5%
mu 5.727 0.651 4.451 6.968

15. Jeremy and Poisson#

Jeremy believes that the normal model on his IQ test scores is not appropriate. After all, the scores are reported as integers. So he proposes a Poisson model; the scores to be modeled as Poisson:

\[y \sim \text{Poisson}(\theta)\]

An expert versed in GT student’s intellectual abilities is asked to elicit a prior on \(\theta\). The expert elicits a gamma prior:

\[\theta \sim \text{Gamma}(30, 0.25)\]

Jeremy gets the test and scores \(y = 98\).

  1. What is the Bayes estimator of \(\theta\)? Find this estimator exactly.

  2. Using a PPL, confirm that simulations agree with the theoretical result in (a).

Hide code cell content
# pymc solution for Q15
import pymc as pm
import arviz as az

with pm.Model() as m:
    theta_prior = pm.Gamma("theta", 30, 0.25)

    likelihood = pm.Poisson("lik", theta_prior, observed=[98])

    trace = pm.sample(4000)

az.summary(trace, hdi_prob=0.95, kind="stats")
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [theta]
100.00% [20000/20000 00:00<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 4_000 draw iterations (4_000 + 16_000 draws total) took 1 seconds.
mean sd hdi_2.5% hdi_97.5%
theta 102.301 9.06 85.352 120.737

16. NPEB for p in the Geometric Distribution#

A geometric random variable \(X\) counts the number of failures before the first success, when the probability of success is \(p\) (and a failure \(1 - p\)). The PDF of \(X\) is:

\[ P(X = x) = (1 - p)^x \times p, \quad x = 0, 1, 2, \ldots; \quad 0 \leq p \leq 1 \]

We simulated a sample of size 2400 from a geometric distribution with a probability of success 0.32. The following (summarized) sample was obtained:

\(x\)

Frequency

0

758

1

527

2

379

3

229

4

162

5

121

6

79

7

56

8

30

9

20

10

15

11

6

12

6

13

4

14

1

15

0

16

0

17

4

18

1

19

1

20

1

21+

0

Total

2400

  1. Develop a Nonparametric Empirical Bayes Estimator if the prior on \(p\) is \(g(p)\), \(0 \leq p \leq 1\).

  2. Compute the empirical Bayes estimator developed in 1. on the simulated sample for different values of \(x\).

17. Lifetimes and Predictive Distribution#

Suppose that \(T_1, \ldots, T_n\) are exponential \(\text{Exp}(\theta)\) lifetimes, where \(\theta\) is the rate parameter. Let the prior on \(\theta\) be exponential \(\text{Exp}(\tau)\), where \(\tau\) is also a rate parameter.

Denote with \(T\) the total observed lifetime \(\sum_{i=1}^n T_i\). Then, \(T\) is gamma \(\text{Gamma}(n, \theta)\) distributed. Show:

  1. Marginal (prior predictive) for \(T\) is

\[ m_T(t) = \frac{n\tau t^{n-1}}{(\tau+t)^{n+1}}, \quad t > 0 \]
  1. Posterior for \(\theta\) given \(T = t\) is \(\text{Gamma}(n + 1, \tau + t)\).

\[ \pi(\theta \mid y) = \frac{\theta^n (\tau + t)^{n+1}}{\Gamma(n + 1)} \exp\{- (\tau + t) \theta\} \]
  1. Posterior predictive distribution for a new \(T^*\), given \(T = t\) is

\[ f(t^* \mid t) = \int_0^\infty \theta \exp \{- \theta t^* \} \pi(\theta \mid t) \, d\theta = \frac{(n + 1)(\tau + t)^{n+1}}{(\tau + t + t^*)^{n+2}} \]
  1. Expected value (with respect to the posterior predictive distribution) of \(T^*\) (that is, the prediction for a new \(T^*\)) is

\[ E(T^* \mid T = t) = \frac{\tau + t}{n} \]

18. Normal Likelihood with Improper Priors#

Let \(X_1, \ldots, X_n\) be iid normals \(N (\theta, \sigma^2)\), where

  1. \(\theta\) is the parameter of interest, and \(\sigma^2\) is known. Assume a flat prior on \(\theta\):

\[ \pi(\theta) = 1, \, -\infty < \theta < \infty \]

Show that the posterior is

\[ [\theta \mid X_1, \ldots, X_n] \sim N \left( \bar{X}, \frac{\sigma^2}{n} \right) \]

where \(\bar{X}\) is the mean of the observations.

  1. \(\sigma^2\) is the parameter of interest, and \(\theta\) is known. Let the prior on \(\sigma^2\) be

\[ \pi(\sigma^2) = \frac{1}{\sigma^2}, \quad \sigma^2 > 0. \]

Show that the posterior is inverse gamma

\[ \sigma^2 \mid X_1, \ldots, X_n \sim \text{IG} \left( \frac{n}{2}, \frac{\sum_{i=1}^n (X_i - \theta)^2}{2} \right) \]

where \(\text{IG}(a, b)\) stands for the distribution with a density

\[ f(y) = \frac{b^a}{\Gamma(a)} y^{-a-1} e^{-b/y}, \quad a, b > 0, \, y \geq 0. \]