4. Time-to-event Models: Example 1#

Consider an exponential survival model:

\[Y_1, Y_2, ... Y_n \sim Exp (\lambda) \]

Let \(\delta_i\) be an indicator to represent censored points. There are \(k\) uncensored points, and \(n-k\) censored points

\[\begin{split} \begin{align*} \delta_i & = 0, \space \space i = 1, ... , k \\ \delta_i & = 1, \space \space i = k + 1, ... , n \\ \end{align*} \end{split}\]

What is the (frequentist) estimator of \(\lambda\)? If all data points were observed, the MLE of \(\lambda\) would be

\[\frac{n}{\sum_{i=1}^n Y_i} = \frac{1}{\bar{Y}} \]

Should we ignore censored points? The estimator would be

\[\hat{\lambda} = \frac{k}{\sum_{i=1}^k Y_i}\]

which is wrong because it is biased to the non-censored data. So, should we consider censored points as observed?

\[\hat{\lambda} = \frac{n}{\sum_{i=1}^n Y_i} = \frac{1}{\bar{Y}}\]

This is also incorrect, because the censored values are likely larger than what was observed.

Now let’s consider the likelihood as

\[ L (y_1,...,y_n | \lambda) = \lambda^k e^{-(\lambda \sum_{i=1}^{n} Y_i)}\]

We arrive at this likelihood by taking into account both censored and uncensored points. The non-censored points follow an exponential PDF, while the censored points have probability of the exponential’s complementary CDF.

\[ \begin{align*} L (y_1,...,y_n | \lambda) & = \prod_{i=1}^k\lambda e^{-\lambda y_{i}}\times \prod_{i=k+1}^n e^{-\lambda y_i} \end{align*} \]

The MLE of \(\lambda\) is

\[\hat{\lambda} = \frac{k}{\sum_{i=1}^n Y_i} = \frac{k}{n \bar{Y}}\]

which is correct!