Wine Classification*

import pymc as pm
import numpy as np
import arviz as az
import pandas as pd
import pytensor.tensor as pt
from pytensor.tensor.subtensor import set_subtensor

5. Wine Classification*#

Adapted from Unit 10: italywines123.odc.

Data can be found here.

This popular data set was provided by Forina et al (1988). The data below consist of results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

Column	Variable	Description
1	Y	Type (Response, 1,2,3)
—	—	1 = [1 0 0]; 2 = [0 1 0]; 3 = [0 0 1]
2	X1	Alcohol
3	X2	Malic acid
4	X3	Ash
5	X4	Alkalinity of ash
6	X5	Magnesium
7	X6	Total phenols
8	X7	Flavanoids
9	X8	Nonflavanoid phenols
10	X9	Proanthocyanins
11	X10	Color intensity
12	X11	Hue
13	X12	OD280/OD315 of diluted wines
14	X13	Proline

(a) Fit the multinomial regressionthat predicts the type of wine Y from predictors X1 - X13. What are estimated coefficients? What is the deviance?

(b) What is your prediction for pp=P(Ynew=1) if a new case has attributes new_attributes (below) How would you classify this wine type, as 1, 2, or 3?

Forina, M., Leardi, R., Armanino, C., and Lanteri, S. (1988). PARVUS: An extendable package of programs for data exploration, classification and correlation, Elsevier, Amsterdam, ISBN 0-444-43012-1;

Report at Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.

data = pd.read_csv("../data/wine.data", header=None)
Y = pd.get_dummies(data[0]).to_numpy()
X = data.drop(0, axis=1).to_numpy()
X_aug = np.concatenate((np.ones((X.shape[0], 1)), X), axis=1)

X_aug.shape, Y.shape

((178, 14), (178, 3))

with pm.Model() as m:
    X_data = pm.Data("X", X_aug, mutable=True)
    Y_data = pm.Data("y", Y, mutable=False)
    _b = pm.Normal("_b", 0, tau=0.05, shape=(14, 2))
    b = pt.concatenate([pt.zeros((14, 1)), _b], axis=1)

    # 178, 14 x 14, 3 -> 178, 3
    phi = pm.math.exp(pm.math.dot(X_data, b))
    # probabilities must sum to 1
    P = phi / pm.math.sum(phi, axis=1)[:, None]

    # P is 178, 3. category count determined by last axis
    pm.Multinomial("likelihood", n=1, p=P, observed=Y_data)

    trace = pm.sample(2000, init="jitter+adapt_diag_grad", target_accept=0.95)

az.summary(trace)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
_b[0, 0]	3.984	4.282	-4.306	11.886	0.044	0.038	9596.0	5813.0	1.0
_b[0, 1]	-0.533	4.437	-8.621	8.179	0.042	0.055	11192.0	5624.0	1.0
_b[1, 0]	3.105	1.372	0.636	5.704	0.029	0.021	2274.0	3188.0	1.0
_b[1, 1]	0.287	1.871	-3.230	3.837	0.034	0.024	2980.0	3655.0	1.0
_b[2, 0]	-3.655	1.625	-6.781	-0.782	0.026	0.019	3995.0	4906.0	1.0
_b[2, 1]	0.933	2.172	-3.076	5.048	0.031	0.023	5040.0	5290.0	1.0
_b[3, 0]	-7.821	3.076	-13.558	-2.115	0.047	0.034	4273.0	5473.0	1.0
_b[3, 1]	0.622	4.037	-7.028	8.168	0.049	0.044	6687.0	5576.0	1.0
_b[4, 0]	1.521	0.498	0.594	2.462	0.009	0.006	3075.0	4042.0	1.0
_b[4, 1]	0.920	0.898	-0.695	2.654	0.015	0.011	3451.0	4495.0	1.0
_b[5, 0]	0.012	0.079	-0.143	0.155	0.001	0.001	3854.0	4781.0	1.0
_b[5, 1]	0.136	0.156	-0.153	0.434	0.003	0.002	3269.0	4101.0	1.0
_b[6, 0]	1.045	2.009	-2.591	4.951	0.025	0.021	6548.0	5857.0	1.0
_b[6, 1]	-1.868	3.488	-8.585	4.529	0.038	0.033	8310.0	6345.0	1.0
_b[7, 0]	-1.604	2.377	-6.074	2.799	0.036	0.026	4371.0	4923.0	1.0
_b[7, 1]	-8.488	3.132	-14.206	-2.559	0.037	0.027	7287.0	6677.0	1.0
_b[8, 0]	1.749	4.148	-5.732	9.740	0.041	0.037	10043.0	6772.0	1.0
_b[8, 1]	-1.681	4.376	-9.630	6.600	0.040	0.047	11914.0	6167.0	1.0
_b[9, 0]	2.464	2.414	-2.194	6.881	0.033	0.024	5378.0	6124.0	1.0
_b[9, 1]	-2.750	3.454	-9.382	3.513	0.036	0.030	9179.0	6928.0	1.0
_b[10, 0]	-2.990	1.674	-6.288	0.013	0.030	0.022	3078.0	4149.0	1.0
_b[10, 1]	3.240	1.826	0.070	6.695	0.028	0.020	4260.0	5428.0	1.0
_b[11, 0]	5.040	3.803	-2.111	12.227	0.040	0.032	9066.0	6680.0	1.0
_b[11, 1]	-3.339	4.222	-11.266	4.586	0.037	0.038	13294.0	6124.0	1.0
_b[12, 0]	-2.352	2.341	-6.562	2.127	0.034	0.026	4690.0	5113.0	1.0
_b[12, 1]	-6.179	3.515	-12.886	0.402	0.039	0.030	8130.0	6207.0	1.0
_b[13, 0]	-0.049	0.015	-0.077	-0.021	0.000	0.000	2533.0	3390.0	1.0
_b[13, 1]	-0.023	0.017	-0.054	0.008	0.000	0.000	3578.0	4895.0	1.0

Coefficients are not the same as BUGS results.

new_attributes = np.array(
    [1, 12.9, 2, 2.4, 17, 100, 2.8, 2.1, 0.35, 1.6, 5, 1.05, 3, 750]
).reshape((1, 14))
pm.set_data({"X": new_attributes}, model=m)
ppc = pm.sample_posterior_predictive(trace, model=m)

Sampling: [likelihood]

100.00% [8000/8000 00:01<00:00]

ppc.posterior_predictive.mean(
    dim=["chain", "draw", "likelihood_dim_2"]
).likelihood.values

array([0.91875632, 0.07775492, 0.00348876])

Getting the same predicted category, but with slightly different probabilities.

%load_ext watermark
%watermark -n -u -v -iv -p pytensor

Last updated: Sun Apr 23 2023

Python implementation: CPython
Python version       : 3.11.0
IPython version      : 8.9.0

pytensor: 2.11.1

pytensor: 2.11.1
numpy   : 1.24.2
arviz   : 0.15.1
pandas  : 1.5.3
pymc    : 5.3.0