import arviz as az
import pymc as pm

%load_ext lab_black

9. Type-1 Censoring*#

Bartholomew (1957)#

This is our first introduction to censoring.

It is adapted from Unit 6: exponential1.odc.

Dataset and table from: A Problem in Life Testing (Bartholemew 1957).

Results of a life test on ten pieces of equipment#

Item No.

1

2

3

4

5

6

7

8

9

10

Date of installation

11 June

21 June

22 June

2 July

21 July

31 July

31 July

1 Aug.

2 Aug.

10 Aug.

Date of Failure

13 June

-

12 Aug.

-

23 Aug.

27 Aug.

14 Aug.

25 Aug.

6 Aug.

-

Life in Days

2

(119)

51

(77)

33

27

14

24

4

(37)

No. Days to End of Period

81

72

70

60

41

31

31

30

29

21

The observations above are the result of a test on the lifespan of some piece of equipment. The test had to end on August 31, but items 2, 4, and 10 did not fail by that date (the values in parentheses are the eventual lifespans of those pieces of equipment, but they were not known on August 31). So all we know for the purposes of the experiment is that they worked for 72, 60, and 21 days, respectively, and that they will continue working for some unknown additional time period.

The goal of the experiment is to provide a lifespan estimate. We could:

  1. Take the censored data as observed:

    • Divide the total observed working time (308 days) and divide by the equipment count (10) to get an average lifetime of 308 for an estimate of 30.8 days.

    • Problem: underestimates actual average lifespan.

  2. Ignore the equipment that didn’t fail:

    • Take mean lifetime of the pieces of equipment that broke within the experiment period for an estimate of 22.1 days.

    • Problems: selection bias, underestimates even more.

  3. Use the classical method:

    • Maximum Likelihood Estimation (MLE) under an exponential model. Total observed lifetime divided by 7 (number of observed failures) for an estimate of 44.0 days.

    • Problem: small sample size.

The actual mean lifespan of all pieces of equipment was 38.8 days.

A Bayesian approach#

We will right-censor the three machines that still hadn’t failed by August 31, following this example.

# gamma dist parameters
α = 0.01
β = 0.1

# max possible observed life for each piece of equipment (before end of experiment)
censored = (81, 72, 70, 60, 41, 31, 31, 30, 29, 21)

# observed life within experiment dates
y = (2, 72, 51, 60, 33, 27, 14, 24, 4, 21)


with pm.Model() as m:
    λ = pm.Gamma("λ", α, β)
    μ = pm.Deterministic("μ", 1 / λ)
    obs_latent = pm.Exponential.dist(λ)

    observed = pm.Censored(
        "observed", obs_latent, lower=None, upper=censored, observed=y, shape=len(y)
    )

    trace = pm.sample(5000)
Hide code cell output
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [λ]
Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 1 seconds.
Chain <xarray.DataArray 'chain' ()>
array(0)
Coordinates:
    chain    int64 0 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
Chain <xarray.DataArray 'chain' ()>
array(1)
Coordinates:
    chain    int64 1 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
Chain <xarray.DataArray 'chain' ()>
array(2)
Coordinates:
    chain    int64 2 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
Chain <xarray.DataArray 'chain' ()>
array(3)
Coordinates:
    chain    int64 3 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
az.summary(trace, hdi_prob=0.95)
mean sd hdi_2.5% hdi_97.5% mcse_mean mcse_sd ess_bulk ess_tail r_hat
λ 0.023 0.009 0.008 0.041 0.000 0.000 7913.0 9520.0 1.0
μ 50.812 22.938 19.194 94.858 0.271 0.192 7913.0 9520.0 1.0

BUGS results:

mean

sd

MC_error

val2.5pc

median

val97.5pc

start

sample

lambda

0.02281

0.0086

3.66E-05

0.009229

0.02169

0.04248

1001

100000

mu

51.07

22.61

0.1018

23.54

46.09

108.4

1001

100000

observed[2]

122.8

60.45

0.2442

73.12

103.5

284.5

1001

100000

observed[4]

110.8

59.93

0.2145

61.09

91.81

269.8

1001

100000

observed[10]

72.01

60.28

0.2155

22.13

52.89

232.6

1001

100000

Okay, these results are pretty close. We end up with \(1/.023=43.5\) days, compared to \(43.8\) days for BUGS. But we’re missing the imputed values for each censored observation. We could get those using the imputed censoring method in this example.

%load_ext watermark
%watermark -n -u -v -iv -p pytensor
Last updated: Sat Mar 18 2023

Python implementation: CPython
Python version       : 3.11.0
IPython version      : 8.9.0

pytensor: 2.10.1

pymc : 5.1.1
arviz: 0.14.0