{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "81c27040", "metadata": {}, "outputs": [], "source": [ "import pymc as pm\n", "import numpy as np\n", "import arviz as az\n", "from pymc.math import switch, ge, exp\n", "\n", "%load_ext lab_black" ] }, { "cell_type": "markdown", "id": "37138bca", "metadata": {}, "source": [ "# Revisiting UK Coal Mining Disasters*\n", "\n", "Adapted from [Unit 10: disasters.odc](https://raw.githubusercontent.com/areding/6420-pymc/main/original_examples/Codes4Unit10/disasters.odc).\n", "\n", "Data can be found [here](https://raw.githubusercontent.com/areding/6420-pymc/main/data/r.txt)." ] }, { "cell_type": "markdown", "id": "e79c8243", "metadata": {}, "source": [ "Change Point Analysis, discussed previously in [this Unit 5](../unit5/Unit5-Gibbs3.ipynb) example about Gibbs sampling.\n", "\n", "The 112 data points represent the numbers of coal-mining disasters involving 10 or more men killed per year between 1851 and 1962. \n", " \n", "Based on the observation that the there was a significant decrease around 1900, it is suitable to apply a change-point model to divide the whole dataset into two periods; each period with its own distribution of number of disasters.\n", " \n", "The data set was compiled by Maguire, Pearson and Wynn in 1952 and \n", "updated by Jarrett (1978). This data have been used by a number of authors to illustrate various techniques that can be applied to point processes\n", "\n", "\n", " Maguire, B. A., Pearson, E. S. and Wynn, A. H. A. (1952). The time intervals between industrial accidents. Biometrika, 39, 168\u0013†180.\n", "\n", " Jarrett, R.G. (1979). A note on the intervals between coal-mining disasters. Biometrika, 66, 191-193. \n", "\n", " Carlin, Gelfand, and Smith (1992) Heirarchical Bayesian Analysis of Changepoint Problems. Applied Statistics, 41, 389-405.\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "45e6f391", "metadata": {}, "outputs": [], "source": [ "# X is the number of coal mine disasters per year\n", "# fmt: off\n", "X = np.array([4, 5, 4, 1, 0, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6, 3, 3, 5, 4, 5, 3, 1,\n", " 4, 4, 1, 5, 5, 3, 4, 2, 5, 2, 2, 3, 4, 2, 1, 3, 2, 2, 1, 1, 1, 1, 3,\n", " 0, 0, 1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1, 0, 1, 0, 1, 0,\n", " 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2, 3, 3, 1, 1, 2, 1, 1, 1, 1, 2, 4, 2,\n", " 0, 0, 0, 1, 4, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1])\n", "# fmt: on\n", "\n", "y = np.array([y for y in range(1851, 1963)])" ] }, { "cell_type": "markdown", "id": "527ddd88", "metadata": {}, "source": [ "## Model 1" ] }, { "cell_type": "code", "execution_count": 3, "id": "482d95b6", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Auto-assigning NUTS sampler...\n", "Initializing NUTS using jitter+adapt_diag...\n", "Multiprocess sampling (4 chains in 4 jobs)\n", "NUTS: [year, λ, μ]\n" ] }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " \n", " 100.00% [12000/12000 03:26<00:00 Sampling 4 chains, 0 divergences]\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 206 seconds.\n", "Chain 0 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n", "Chain 1 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n", "Chain 2 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n", "Chain 3 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n" ] } ], "source": [ "α = 4\n", "β = 1\n", "γ = 0.5\n", "δ = 1\n", "\n", "with pm.Model() as m:\n", " year = pm.Uniform(\"year\", 1851, 1963)\n", " λ = pm.Gamma(\"λ\", α, β)\n", " μ = pm.Gamma(\"μ\", γ, δ)\n", "\n", " diff = pm.Deterministic(\"diff\", μ - λ)\n", "\n", " rate = λ + switch(ge(y - year, 0), 1, 0) * diff\n", " pm.Poisson(\"lik\", mu=rate, observed=X)\n", "\n", " trace = pm.sample(2000)" ] }, { "cell_type": "code", "execution_count": 4, "id": "9bd529a2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meansdhdi_3%hdi_97%mcse_meanmcse_sdess_bulkess_tailr_hat
year1890.4052.4051886.0001894.4610.1080.076537.0500.01.0
λ3.1530.2842.6553.6980.0110.008658.01025.01.0
μ0.9160.1150.6981.1370.0040.003645.0826.01.0
diff-2.2370.301-2.805-1.6810.0120.008670.01185.01.0
\n", "
" ], "text/plain": [ " mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk \\\n", "year 1890.405 2.405 1886.000 1894.461 0.108 0.076 537.0 \n", "λ 3.153 0.284 2.655 3.698 0.011 0.008 658.0 \n", "μ 0.916 0.115 0.698 1.137 0.004 0.003 645.0 \n", "diff -2.237 0.301 -2.805 -1.681 0.012 0.008 670.0 \n", "\n", " ess_tail r_hat \n", "year 500.0 1.0 \n", "λ 1025.0 1.0 \n", "μ 826.0 1.0 \n", "diff 1185.0 1.0 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "az.summary(trace)" ] }, { "cell_type": "markdown", "id": "168fe0b2", "metadata": {}, "source": [ "## Model 2" ] }, { "cell_type": "code", "execution_count": 5, "id": "00853354", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Auto-assigning NUTS sampler...\n", "Initializing NUTS using jitter+adapt_diag...\n", "Multiprocess sampling (4 chains in 4 jobs)\n", "NUTS: [year, z0, z1]\n" ] }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " \n", " 100.00% [12000/12000 05:01<00:00 Sampling 4 chains, 0 divergences]\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 302 seconds.\n", "The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details\n", "The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details\n", "Chain 0 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n", "Chain 1 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n", "Chain 2 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n", "Chain 3 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n" ] } ], "source": [ "with pm.Model() as m:\n", " year = pm.Uniform(\"year\", 1851, 1963)\n", " z0 = pm.Normal(\"z0\", 0, tau=0.00001)\n", " z1 = pm.Normal(\"z1\", 0, tau=0.00001)\n", "\n", " λ = pm.Deterministic(\"λ\", exp(z0))\n", " μ = pm.Deterministic(\"μ\", exp(z0 + z1))\n", "\n", " diff = pm.Deterministic(\"diff\", μ - λ)\n", "\n", " rate = pm.math.exp(z0 + switch(ge(y - year, 0), 1, 0) * z1)\n", " pm.Poisson(\"lik\", mu=rate, observed=X)\n", "\n", " trace = pm.sample(2000)" ] }, { "cell_type": "code", "execution_count": 6, "id": "c81e6b10", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meansdhdi_3%hdi_97%mcse_meanmcse_sdess_bulkess_tailr_hat
z01.1430.0890.9711.3060.0050.003378.0748.01.01
z1-1.2330.145-1.519-0.9720.0060.004529.01039.01.01
year1890.2582.3111886.0031893.9540.1170.083409.0478.01.01
λ3.1470.2792.6203.6620.0140.010378.0748.01.01
μ0.9210.1160.7031.1410.0040.003946.01579.01.00
diff-2.2270.289-2.766-1.6890.0150.011368.0683.01.01
\n", "
" ], "text/plain": [ " mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk \\\n", "z0 1.143 0.089 0.971 1.306 0.005 0.003 378.0 \n", "z1 -1.233 0.145 -1.519 -0.972 0.006 0.004 529.0 \n", "year 1890.258 2.311 1886.003 1893.954 0.117 0.083 409.0 \n", "λ 3.147 0.279 2.620 3.662 0.014 0.010 378.0 \n", "μ 0.921 0.116 0.703 1.141 0.004 0.003 946.0 \n", "diff -2.227 0.289 -2.766 -1.689 0.015 0.011 368.0 \n", "\n", " ess_tail r_hat \n", "z0 748.0 1.01 \n", "z1 1039.0 1.01 \n", "year 478.0 1.01 \n", "λ 748.0 1.01 \n", "μ 1579.0 1.00 \n", "diff 683.0 1.01 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "az.summary(trace)" ] }, { "cell_type": "code", "execution_count": 7, "id": "7d36b9b1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Last updated: Wed Mar 22 2023\n", "\n", "Python implementation: CPython\n", "Python version : 3.11.0\n", "IPython version : 8.9.0\n", "\n", "pytensor: 2.10.1\n", "\n", "pymc : 5.1.2\n", "arviz: 0.15.1\n", "numpy: 1.24.2\n", "\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -n -u -v -iv -p pytensor" ] }, { "cell_type": "code", "execution_count": null, "id": "bb38c890", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 5 }