Supplementary Exercises 4.3#

Warning

This page contains solutions! We recommend attempting each problem before peeking.

1. Expectation and Variance#

Find E(X) and Var(X) if X has a density:

  1. f(x)=3x2, 0x1;

  2. f(x)=sin(x), 0xπ/2.

2. Uniform on Unit Circle#

Let bivariate random variable (X,Y) have a density

f(x,y)=1π,x2+y21.
  1. Find fY|X(y|x).

  2. What is the conditional density f(y|x) for x=0?

3. Fungi Spores#

Fungi spores are elliptical in shape with minor and major axes used to describe their size. Let X and Y denote lengths of the minor and major axes (in micrometers):

XExp(1)
YX=xf(y|x)=e(yx)for yx

Find:

  1. joint distribution f(X,Y)(x,y);

  2. P(Y2|X=1);

  3. E(Y|X=1);

  4. constant C, such that P(Yc)=0.5;

  5. Are X and Y independent?

4. Joint, Marginals and Conditionals #1#

Let random variables X and Y have joint density:

f(X,Y)(x,y)=λ2eλy,0xy

Find:

  1. marginal densities fX(x) and fY(y);

  2. conditional densities of X given Y=y, fX|Y(x|y), and of Y given X=x, fY|X(y|x);

  3. the CDF of (X,Y), F(X,Y)(x,y)=P(Xx,Yy).

Hide code cell content
import matplotlib.pyplot as plt
import numpy as np
from myst_nb import glue

x = 2
y = 3

fig, ax = plt.subplots()

# region to fill
ax.fill_between([0, x], [0, x], y, color="lightcoral", alpha=0.3)

# x = y
x_vals = np.linspace(0, max(x, y), 400)
y_vals = x_vals
ax.plot(x_vals, y_vals, label="$x = y$", color="black")

# domain boundaries
ax.plot([x, x], [0, y], "g--")
ax.plot([0, x], [y, y], "g--")

ax.set_xlim(0, max(x, y) * 1.1)
ax.set_ylim(0, max(x, y) * 1.1)
ax.set_xlabel("$x$")
ax.set_ylabel("$y$")
ax.set_title("Integration Domain")
ax.legend()
plt.grid(True)

plt.savefig("../images/q4_plot1.png")
../_images/9137b8ad4e5198b47cf8396b3bf30b1ff64165bbf083a02f5b6d872c531a6676.png
Hide code cell content
x = 4
y = 2

fig, ax = plt.subplots()

# region to fill
ax.fill_between([0, y], [0, y], y, color="lightcoral", alpha=0.3)

# x = y
x_vals = np.linspace(0, max(x, y), 400)
y_vals = x_vals
ax.plot(x_vals, y_vals, label="$x = y$", color="black")

# domain boundaries
ax.plot([x, x], [0, y], "g--")
ax.plot([0, x], [y, y], "g--")

ax.set_xlim(0, max(x, y) * 1.1)
ax.set_ylim(0, max(x, y) * 1.1)
ax.set_xlabel("$x$")
ax.set_ylabel("$y$")
ax.set_title("Integration Domain")
ax.legend()
plt.grid(True)

plt.savefig("../images/q4_plot2.png")
../_images/6a83abf56e9aa9dd12402f7aa82ba6ef991e6a7c88f10ac05e82dafe1afd99a0.png

5. Joint, Marginals and Conditionals #2#

Let random variables X and Y have joint density

f(X,Y)(x,y)=Cxy2,x0,y0,x+y1
  1. Show that

C=60,fX(x)=20x(1x)3,0x1

and

fY|X(y|x)=3y2(1x)3,0y1x.
  1. Find fY(y) and fX|Y(x|y).

Hide code cell content
import numpy as np
import matplotlib.pyplot as plt
from myst_nb import glue


x = np.linspace(0, 1, 1000)
y = 1 - x

fig, ax = plt.subplots()

ax.fill_between(x, 0, y, color="lightblue", label="Domain of $(X, Y)$")
ax.plot(x, y, color="blue", label="$y = 1 - x$")

ax.set_xlabel("x")
ax.set_ylabel("y")
ax.legend()
plt.savefig("../images/q5_plot.png")
../_images/e7e001e1a0cf2135b2b701fcf7ec41d51a0a1b55e25a7cf2b40e9d8bced28eb6.png

6. Joint, Marginals and Conditionals #3#

Let random variables X and Y have joint density

fX,Y(x,y)={1+xy4,1x1,1y10,else

Find:

  1. marginal densities fX(x) and fY(y);

  2. conditional densities of X given Y=y, fX|Y(x|y), and of Y given X=x, fY|X(y|x).

  3. Are X and Y independent?

7. Joint, Marginals and Conditionals #4#

Let random variables X and Y have joint density fX,Y(x,y) given by:

fX,Y(x,y)={C(x2y2)exfor 0x<,xyx0else

Find

  1. constant C.

1=0xxC(x2y2)exdydx=4C30x3exdx=4C3×Γ(4)=8C.

Thus, C=18. (Recall, Γ(n)=(n1)!)

  1. marginal densities fX(x) and fY(y).

fX(x)=16x3ex=x41Γ(4)e1x,x0

This is Gamma Ga(4,1) distribution. To find fY(y), when integrating out x, for y0, the integration is y, and for y<0 the integration is y. You will need to do integration by parts. When the smoke clears,

  1. conditional distributions fX|Y(x|y) and fY|X(y|x).

Hide code cell content
import matplotlib.pyplot as plt
import numpy as np

# x values (this could go to infinity but this is enough to get an idea)
x_vals = np.linspace(0, 5, 100)

# upper and lower bounds for y based on x
y_upper = x_vals
y_lower = -x_vals

plt.fill_between(x_vals, y_lower, y_upper, color="blue", alpha=0.5)
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Domain of $f_{X,Y}(x, y)$")
plt.grid(True)

plt.savefig("../images/q7_plot.png")
../_images/9e19bc5eb6ae5e0b0749d4a3878e1cb8af086000fe46f7054f85fcf942a33331.png

8. A 2D PDF#

Let

f(x,y)={38(x2+2xy),0x1,0y20,else

be a bivariate PDF of a random vector (X,Y).

  1. Show that f(x,y) is a density.

  2. Show that marginal distributions are

fX(x)=32x+34x2,0x1

and

fY(y)=18+38y,0y2
  1. Show E(X)=1116 and E(Y)=54.

  2. Show that conditional distributions are

f(x|y)=3x(x+2y)1+3y,0x1, for any fixed y[0,2],
f(y|x)=2y+x4+2x,0y2, for any fixed x[0,1].

9. In a Circle#

Let random variables X and Y have joint density

fX,Y(x,y)={C1x2y2,x2+y210,else

Find:

  1. constant C, and

  2. marginal densities fX(x) and fY(y).

10. Weibull - Gamma#

Let X be distributed as Weibull Wei(r,λ), r>0 known, with a density

f(x|λ)=λrxr1exp{λxr},λ>0,x0.

Assume a Ga(α,β) prior on λ,

π(λ|α,β)=βαλα1Γ(α)exp{βλ},λ0,α,β>0.
  1. Is the problem conjugate?

  2. Find the posterior distribution.

  3. If X1=5, X2=3, and X3=1 are observed from Wei(2,λ) and π(λ) is gamma Ga(3,5), what is the posterior in this case? What are the mean and variance of λ according to this posterior distribution?

11. Normal - Inverse Gamma#

Let XN(μ,σ2), with μ known, and σ2 is inverse gamma IG(α,β), with a density

π(σ2|α,β)=βαΓ(α)(σ2)α1exp(βσ2),σ20,α,β>0.
  1. Is the problem conjugate?

  2. If the sample (X1,,Xn):

[4.3430, 0.5850, 4.4345, 6.2605, 3.9778, 4.4538, 2.3931, 3.5877, 1.4254, 5.0694]

comes from normal N(3,σ2) distribution, find the posterior distribution of σ2 when the prior is IG(0.1,0.1).

12. Uniform - Pareto#

Suppose X=(X1,,Xn) is a sample from U(0,θ). Let θ have Pareto Pa(θ0,α) distribution. Show that the posterior distribution is Pa(max{θ0,x1,,xn},α+n).

13. Gamma - Inverse Gamma#

Let XGa(n2,12θ), so that X/θ is χn2. Let θIG(α,β). Show that the posterior is IG(n2+α,x2+β).

14. Negative Binomial - Beta#

If X=(X1,,Xn) is a sample from NB(m,θ) and θBe(α,β), show that the posterior for θ is beta Be(α+mn,β+i=1nxi).

15. Horse-Kick Fatalities and Gamma Prior#

During the latter part of the nineteenth century, Prussian officials gathered information on the hazards that horses posed to cavalry soldiers. Fatal accidents for 10 cavalry corps were collected over a period of 20 years (Preussischen Statistik). The number of fatalities due to kicks, x, was recorded for each year and each corps. The table below shows the distribution of x for these 200 “corps-years.”

x (number of deaths)

Observed number of corps-years

0

109

1

65

2

22

3

3

4

1

Total

200

Altogether there were 122 fatalities (109(0)+65(1)+22(2)+3(3)+1(4)), meaning that the observed fatality rate was 122200, or 0.61 fatalities per corps-year. Von Bortkiewicz (1898) proposed a Poisson model for x with a rate λ.

If the prior on λ is gamma Ga(5,9), find the posterior for λ. What are the posterior mean and variance?

16. Counts of Alpha#

Rutherford and Geiger (Rutherford, Geiger and Bateman, Phil. Mag. 20, 698, 19) provided counts of α-particles in their experiment. The counts, given in the table below, can be well–modeled by the Poisson distribution:

Particle ct.

0

1

2

3

4

5

6

7

8

9

10

11

12

Interval ct.

57

203

383

525

532

408

273

139

45

27

10

4

2

Theoretical

54

210

407

525

508

394

254

140

68

29

11

4

6

  1. Find sample size n and sample mean X¯. In calculations for X¯ take ≥ 12 as 12.

  2. Elicit a gamma prior for λ with rate parameter β = 10 and shape parameter α selected in such a way that the prior mean is 7.

  3. Find the Bayes estimator of λ using the prior from (1). Is the problem conjugate? Use the fact that i=1nXiPoi(nλ).

  4. Write a script that simulates the Bayes estimator for λ and compare its output with the analytic solution from (3).

17. Estimating Chemotherapy Response Rates#

An oncologist believes that 90% of cancer patients will respond to a new chemotherapy treatment and that it is unlikely that this proportion will be below 80%. Elicit a beta prior that models the oncologist’s beliefs. During the trial, with 30 patients treated, 22 responded.

Hint: For elicitation of the prior use μ=0.9, μ2σ=0.8 and expressions for μ and σ for beta.

  1. What are the likelihood and posterior distributions?

  2. Plot the prior, likelihood, and posterior in a single figure.

  3. Using a PPL, find the Bayes estimator of the response rate and compare it to the posterior mean.

Hide code cell content
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta, binom

xx = np.linspace(0, 1, 1000)

prior_alpha, prior_beta = 31.5, 3.5
post_alpha, post_beta = 53.5, 11.5

prior_densities = beta.pdf(xx, prior_alpha, prior_beta)
posterior_densities = beta.pdf(xx, post_alpha, post_beta)

n = 30  # trials
k = 22  # successes
p = xx
likelihood = binom.pmf(k, n, p)

# Normalize the likelihood for comparison
likelihood /= np.max(likelihood)

# Plotting the prior, likelihood and posterior
fig, axs = plt.subplots(3, 1, figsize=(12, 8))
fig.suptitle("Chemotherapy Response Rate")

# Plot prior
prior_mean = prior_alpha / (prior_alpha + prior_beta)
axs[0].plot(xx, prior_densities, label="Prior")
axs[0].axvline(x=prior_mean, color="g", linestyle="--")
axs[0].text(
    prior_mean + 0.01,
    y=np.max(prior_densities) * 0.5,
    s=f"{prior_mean:.3f}",
    color="g",
    ha="left",
)
axs[0].text(
    x=0.02,
    y=np.max(prior_densities) * 0.85,
    s=f"Be({prior_alpha}, {prior_beta}) prior",
    color="black",
)
axs[0].set_ylabel("Density")
axs[0].set_xlim(0, 1)

# Plot likelihood
mle = k / n
axs[1].plot(xx, likelihood, label="Likelihood")
axs[1].axvline(x=mle, color="g", linestyle="--")
axs[1].text(
    x=mle + 0.01,
    y=np.max(likelihood) * 0.5,
    s=f"{mle:.3f}",
    color="g",
    ha="left",
)
axs[1].text(
    x=0.02,
    y=np.max(likelihood) * 0.85,
    s=f"Bin(n={n}, x={k}, p)",
    color="black",
)
axs[1].set_ylabel("Density")
axs[1].set_xlim(0, 1)

# Plot posterior
post_mean = post_alpha / (post_alpha + post_beta)
axs[2].plot(xx, posterior_densities, label="Posterior")
axs[2].axvline(x=post_mean, color="g", linestyle="--")
axs[2].text(
    x=post_mean + 0.01,
    y=np.max(posterior_densities) * 0.5,
    s=f"{post_mean:.3f}",
    color="g",
    ha="left",
)
axs[2].text(
    x=0.02,
    y=np.max(posterior_densities) * 0.85,
    s=f"Be({post_alpha}, {post_beta}) posterior",
    color="black",
)
axs[2].set_ylabel("Density")
axs[2].set_xlabel("Probability of Success")
axs[2].set_xlim(0, 1)


plt.savefig("../images/q17_plot.png")
../_images/0d86a03c63b4a31c9e9b511cab0c0d1e2a64f9d2b9ccfd81b7a547e84da9c403.png
Hide code cell content
import pymc as pm
import arviz as az

with pm.Model():
    prior_p = pm.Beta("p", 31.5, 3.5)
    pm.Binomial("likelihood", n=n, p=prior_p, observed=k)

    trace = pm.sample(3000)

az.summary(trace)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [p]
100.00% [16000/16000 00:00<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 1 seconds.
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
p 0.823 0.047 0.735 0.907 0.001 0.0 4950.0 7351.0 1.0

18. Jeremy and Variance from Single Observation#

Jeremy believes that his IQ test scores have a normal distribution with mean 110 and unknown variance σ2. He takes a test and scores X=98.

  1. Show that inverse gamma prior IG(α,β) is the conjugate for σ2 if the observation X is normal N(μ,σ2) with μ known. What is the posterior?

  2. Find a Bayes estimator of σ2 and its standard deviation in Jeremy’s model if the prior on σ2 is an inverse gamma IG(3,100).

    • Hint: Random variable Y is said to have an inverse gamma IG(α,β) distribution if its density is given by:

      f(y|α,β)=βαΓ(α)yα1exp(βy),α,β>0.

      The mean of Y is E(Y)=βα1, α>1 and the variance is Var(Y)=β2(α1)2(α2), α>2.

  3. Use WinBUGS to solve this problem and compare the MCMC approximations with exact values from (2).

    • Hint: Express the likelihood in terms of precision τ with gamma Ga(α,β) prior, but then calculate and monitor σ2=1τ.

Hide code cell content
import pymc as pm
import arviz as az

x = 98
mu = 110

with pm.Model() as m:
    sigma2 = pm.InverseGamma("sigma2", 3, 100)

    pm.Normal("likelihood", mu, sigma=pm.math.sqrt(sigma2), observed=x)

    trace = pm.sample(10000)

az.summary(trace)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma2]
100.00% [44000/44000 00:01<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 10_000 draw iterations (4_000 + 40_000 draws total) took 2 seconds.
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
sigma2 69.07 58.186 15.007 149.345 0.577 0.408 13099.0 14720.0 1.0

19. Easy 2D#

Joint distribution for X and Θ is

f(x,θ)=2x,0x1,0θ1.

Find:

  1. Marginal distributions mX(x) and mΘ(θ).

  2. Conditional distributions f(x|θ) and π(θ|x).

  3. P(Θ2XΘ).

  4. P(X2ΘX).

Hint: Because of the factorization theorem, Θ and X are independent and you should obtain f(x|θ)=mX(x) and π(θ|x)=mΘ(θ).

For (3) and (4) conditioning is needed. For example, for random variables X, Y and functions g1, g2:

P(g1(Y)Xg2(Y))=E[P(g1(Y)Xg2(Y))Y]=P(g1(y)Xg2(y))fY(y)dy=[FX(g2(y))FX(g1(y))]fY(y)dy.

20. Bayes and Bats#

By careful examination of sound and film records it is possible to measure the distance at which a bat first detects an insect. The measurements are modeled by a normal distribution N(θ,102), where θ is the unknown mean distance (in cm).

Experts believe that the prior suitably expressing uncertainty about θ is θN(50,102). Three measurements are obtained: 62, 52, and 68.

  1. Find the posterior distribution of θ given these observations.

  2. Test the hypothesis, H0, that θ>50 in a Bayesian fashion.

  3. What is the 95% credible set for θ?

For 2 and 3, use the posterior obtained in 1.

Hide code cell content
from scipy.optimize import fsolve
from scipy.stats import norm

posterior = norm(58, 5)
print(f"Probability of H_0: {1 - posterior.cdf(50):.4f}")  # 0.9452

alpha = 0.05
print(
    f"Equi-tailed credible set: ({posterior.ppf(alpha/2):.4f}, {posterior.ppf(1 - alpha/2):.4f})"  # 'Equi-tailed credible set: (48.2002, 67.7998)'
)

guess_lwr, guess_upr = 50, 70


def conditions(x, posterior, alpha):
    lwr, upr = x

    cond_1 = posterior.pdf(upr) - posterior.pdf(lwr)
    cond_2 = posterior.cdf(upr) - posterior.cdf(lwr) - (1 - alpha)

    return cond_1, cond_2


# since this distribution is symmetrical, HPD should be the same as EQT
hpd = fsolve(conditions, (guess_lwr, guess_upr), args=(posterior, alpha))
print(
    f"HPD credible set: ({hpd[0]:.4f}, {hpd[1]:.4f})"
)  # 'HPD credible set: (48.2002, 67.7998)'