Revisiting UK Coal Mining Disasters*

3. Revisiting UK Coal Mining Disasters*#

Data can be found here.

Change Point Analysis, discussed previously in this Unit 5 example about Gibbs sampling.

The 112 data points represent the numbers of coal-mining disasters involving 10 or more men killed per year between 1851 and 1962.

Based on the observation that the there was a significant decrease around 1900, it is suitable to apply a change-point model to divide the whole dataset into two periods; each period with its own distribution of number of disasters.

The data set was compiled by Maguire, Pearson and Wynn in 1952 and updated by Jarrett (1978). This data have been used by a number of authors to illustrate various techniques that can be applied to point processes

Maguire, B. A., Pearson, E. S. and Wynn, A. H. A. (1952). The time intervals between industrial accidents. Biometrika, 39, 168†180.

Jarrett, R.G. (1979). A note on the intervals between coal-mining disasters. Biometrika, 66, 191-193.

Carlin, Gelfand, and Smith (1992) Heirarchical Bayesian Analysis of Changepoint Problems. Applied Statistics, 41, 389-405.

# X is the number of coal mine disasters per year
# fmt: off
X = np.array([4, 5, 4, 1, 0, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6, 3, 3, 5, 4, 5, 3, 1,
     4, 4, 1, 5, 5, 3, 4, 2, 5, 2, 2, 3, 4, 2, 1, 3, 2, 2, 1, 1, 1, 1, 3,
     0, 0, 1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1, 0, 1, 0, 1, 0,
     0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2, 3, 3, 1, 1, 2, 1, 1, 1, 1, 2, 4, 2,
     0, 0, 0, 1, 4, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1])
# fmt: on

y = np.array([y for y in range(1851, 1963)])

Model 1#

α = 4
β = 1
γ = 0.5
δ = 1

with pm.Model() as m:
    year = pm.DiscreteUniform("year", 1851, 1963)
    λ = pm.Gamma("λ", α, β, initval=1)
    μ = pm.Gamma("μ", γ, δ, initval=1)

    diff = pm.Deterministic("diff", μ - λ)

    rate = λ + switch(ge(y - year, 0), 1, 0) * diff
    pm.Poisson("lik", mu=rate, observed=X)

    trace = pm.sample(2000)

az.summary(trace)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
year	1890.894	2.450	1887.000	1895.000	0.081	0.057	974.0	724.0	1.01
λ	3.142	0.291	2.585	3.680	0.004	0.003	4878.0	4824.0	1.00
μ	0.917	0.116	0.704	1.137	0.002	0.001	4930.0	4794.0	1.00
diff	-2.226	0.304	-2.790	-1.641	0.004	0.003	6293.0	5268.0	1.00

Model 2#

with pm.Model() as m:
    year = pm.DiscreteUniform("year", 1851, 1963)
    z0 = pm.Normal("z0", 0, sigma=10, initval=1)
    z1 = pm.Normal("z1", 0, sigma=10, initval=-1)

    λ = pm.Deterministic("λ", exp(z0))
    μ = pm.Deterministic("μ", exp(z0 + z1))

    diff = pm.Deterministic("diff", μ - λ)

    rate = pm.math.exp(z0 + switch(ge(y - year, 0), 1, 0) * z1)
    pm.Poisson("lik", mu=rate, observed=X)

    trace = pm.sample(2000)

az.summary(trace)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
year	1890.960	2.497	1886.000	1895.000	0.081	0.057	963.0	1148.0	1.0
z0	1.132	0.094	0.956	1.310	0.002	0.001	2595.0	3854.0	1.0
z1	-1.220	0.152	-1.494	-0.924	0.002	0.002	4050.0	4486.0	1.0
λ	3.117	0.293	2.579	3.679	0.006	0.004	2595.0	3854.0	1.0
μ	0.924	0.116	0.703	1.139	0.002	0.001	4328.0	5480.0	1.0
diff	-2.193	0.308	-2.736	-1.587	0.006	0.004	2910.0	3148.0	1.0

%load_ext watermark
%watermark -n -u -v -iv -p pytensor

Last updated: Tue Nov 19 2024

Python implementation: CPython
Python version       : 3.12.7
IPython version      : 8.29.0

pytensor: 2.26.0

arviz: 0.20.0
pymc : 5.18.0
numpy: 1.26.4

Revisiting UK Coal Mining Disasters*

Contents

3. Revisiting UK Coal Mining Disasters*#

Model 1#

Model 2#