Supplementary Exercises#

Warning

This page contains solutions! We recommend attempting each problem before peeking.

1. Dukes’ C Colorectal Cancer and Diet Treatment#

Colorectal cancer is a common cause of death. In the advanced stage of disease, when the disease is first diagnosed in many patients, surgery is the only treatment. Cytotoxic drugs, when given as an adjunct to surgery, do not prevent relapse and do not increase the survival in patients with advanced disease. Interest has been shown, at least by patients, in a nutritional approach to treatment, where diet plays a critical role in the disease management program. In a controlled clinical trial, McIllmurray and Turkie (1987) evaluated the diet treatment in patients with Dukes’ C colorectal cancer. Because the residual tumor mass is small after operation, the relapse rate is high, and no other effective treatment is available. The diet treatment consisted of linolenic acid, an oil extract of the seed from the evening primrose plant Onagraceae Oenothera biennis and vitamin E. The data for the treatment and control patients are given below:

\[\begin{split} \begin{array}{|c|c|c|} \hline \text{Group} & \text{Sample Size} & \text{Survival Time (months)} \\ \hline \text{Treatment (Linoleic acid)} & n_1 = 25 & \begin{array}{c} 1+, 5+, 6, 6, 9+, 10, 10, 10+, 12, 12, 12, 12 \\ 12+, 13+, 15+, 16+, 20+, 24, 24+, 27+, 32 \\ 34+, 36+, 36+, 44+ \end{array} \\ \hline \text{Control} & n_2 = 24 & \begin{array}{c} 3+, 6, 6, 6, 6, 8, 8, 12, 12, 12+ \\ 15+, 16+, 18+, 18+, 20, 22+, 24, 28+ \\ 28+, 28+, 30, 30+, 33+, 42 \end{array} \\ \hline \end{array} \end{split}\]

This data is available as a csv here.

Fit the data with Weibull distribution, taking the treatment/control (1/0) as a covariate. Place noninformative priors on all parameters. Is the linoleic acid treatment beneficial? Comment.

2. Censored Rayleigh#

The lifetime (in hours) of a certain sensor has Rayleigh distribution, with survival function

\[ S(t) = \exp\left(-\frac{t^2}{2\lambda}\right), \lambda > 0 \]

Twelve sensors are placed under test for 100 hours, and the following failure times are recorded: 23, 40, 41, 67, 69, 72, 84, 84, 88, 100+, 100+. Here + denotes a censored time.

  1. If failure times \( t_1, \ldots, t_r \) are observed, and \( t_{r+1}, \ldots, t_n \) are censored, find the Bayes estimator of \( \lambda \). Use a noninformative gamma prior on \( \lambda \).

  2. Evaluate \( S(t) \) for \( t = 60 \) and find 95% Credible Set.

The MLE for \( \lambda \) is

\[ \hat{\lambda} = \frac{\sum_{i=1}^{r} t_i^2 + \sum_{i=r+1}^{n} t_i^2}{2r} \]
  1. Evaluate the MLE for the given data and comment on closeness to the Bayes estimator in (a).

The Rayleigh distribution is not implemented in PyMC. You can try the zero-trick or create a custom distribution.

3. Stagnant Water with MAR Data#

Carlin et al. [1992] analyzed data on the stagnation of water by piecing together linear parametric forms.

  • \( y_i \) is the log flow rate down the channel.

  • \( x_i \) is the log height of stagnant surface levels for different surfactants \( i \).

Data is available for download here.

The proposed model is:

\[\begin{align*} y_i &\sim N(\mu_i, \sigma^2) \\ \mu_i &= \alpha + \beta_1 \cdot x_i + \beta_2 \cdot (x_i - \theta)_{+} \end{align*}\]

Here, \( (a)_{+} \) is \( a \) if \( a \geq 0 \) and 0 if \( a < 0 \).

According to this model, the regression slope is \( \beta_1 \) for \( x < \theta \) and \( \beta_1 + \beta_2 \) for \( x \geq \theta \). The original exercise is modified to have two \( y \)‘s and two \( x \)‘s missing at random.