import pymc as pm
import numpy as np
import arviz as az
import pandas as pd
import pytensor.tensor as pt
from pytensor.tensor.subtensor import set_subtensor

%load_ext lab_black

5. Wine Classification*#

Adapted from Unit 10: italywines123.odc.

Data can be found here.

This popular data set was provided by Forina et al (1988). The data below consist of results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

Column

Variable

Description

1

Y

Type (Response, 1,2,3)

1 = [1 0 0]; 2 = [0 1 0]; 3 = [0 0 1]

2

X1

Alcohol

3

X2

Malic acid

4

X3

Ash

5

X4

Alkalinity of ash

6

X5

Magnesium

7

X6

Total phenols

8

X7

Flavanoids

9

X8

Nonflavanoid phenols

10

X9

Proanthocyanins

11

X10

Color intensity

12

X11

Hue

13

X12

OD280/OD315 of diluted wines

14

X13

Proline

(a) Fit the multinomial regressionthat predicts the type of wine Y from predictors X1 - X13. What are estimated coefficients? What is the deviance?

(b) What is your prediction for pp=P(Ynew=1) if a new case has attributes new_attributes (below) How would you classify this wine type, as 1, 2, or 3?

Forina, M., Leardi, R., Armanino, C., and Lanteri, S. (1988). PARVUS: An extendable package of programs for data exploration, classification and correlation, Elsevier, Amsterdam, ISBN 0-444-43012-1;

Report at Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.

data = pd.read_csv("../data/wine.data", header=None)
Y = pd.get_dummies(data[0]).to_numpy()
X = data.drop(0, axis=1).to_numpy()
X_aug = np.concatenate((np.ones((X.shape[0], 1)), X), axis=1)
X_aug.shape, Y.shape
((178, 14), (178, 3))
with pm.Model() as m:
    X_data = pm.Data("X", X_aug, mutable=True)
    Y_data = pm.Data("y", Y, mutable=False)
    _b = pm.Normal("_b", 0, tau=0.05, shape=(14, 2))
    b = pt.concatenate([pt.zeros((14, 1)), _b], axis=1)

    # 178, 14 x 14, 3 -> 178, 3
    phi = pm.math.exp(pm.math.dot(X_data, b))
    # probabilities must sum to 1
    P = phi / pm.math.sum(phi, axis=1)[:, None]

    # P is 178, 3. category count determined by last axis
    pm.Multinomial("likelihood", n=1, p=P, observed=Y_data)

    trace = pm.sample(2000, init="jitter+adapt_diag_grad", target_accept=0.95)
Hide code cell output
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag_grad...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [_b]
100.00% [12000/12000 00:44<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 45 seconds.
az.summary(trace)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
_b[0, 0] 3.984 4.282 -4.306 11.886 0.044 0.038 9596.0 5813.0 1.0
_b[0, 1] -0.533 4.437 -8.621 8.179 0.042 0.055 11192.0 5624.0 1.0
_b[1, 0] 3.105 1.372 0.636 5.704 0.029 0.021 2274.0 3188.0 1.0
_b[1, 1] 0.287 1.871 -3.230 3.837 0.034 0.024 2980.0 3655.0 1.0
_b[2, 0] -3.655 1.625 -6.781 -0.782 0.026 0.019 3995.0 4906.0 1.0
_b[2, 1] 0.933 2.172 -3.076 5.048 0.031 0.023 5040.0 5290.0 1.0
_b[3, 0] -7.821 3.076 -13.558 -2.115 0.047 0.034 4273.0 5473.0 1.0
_b[3, 1] 0.622 4.037 -7.028 8.168 0.049 0.044 6687.0 5576.0 1.0
_b[4, 0] 1.521 0.498 0.594 2.462 0.009 0.006 3075.0 4042.0 1.0
_b[4, 1] 0.920 0.898 -0.695 2.654 0.015 0.011 3451.0 4495.0 1.0
_b[5, 0] 0.012 0.079 -0.143 0.155 0.001 0.001 3854.0 4781.0 1.0
_b[5, 1] 0.136 0.156 -0.153 0.434 0.003 0.002 3269.0 4101.0 1.0
_b[6, 0] 1.045 2.009 -2.591 4.951 0.025 0.021 6548.0 5857.0 1.0
_b[6, 1] -1.868 3.488 -8.585 4.529 0.038 0.033 8310.0 6345.0 1.0
_b[7, 0] -1.604 2.377 -6.074 2.799 0.036 0.026 4371.0 4923.0 1.0
_b[7, 1] -8.488 3.132 -14.206 -2.559 0.037 0.027 7287.0 6677.0 1.0
_b[8, 0] 1.749 4.148 -5.732 9.740 0.041 0.037 10043.0 6772.0 1.0
_b[8, 1] -1.681 4.376 -9.630 6.600 0.040 0.047 11914.0 6167.0 1.0
_b[9, 0] 2.464 2.414 -2.194 6.881 0.033 0.024 5378.0 6124.0 1.0
_b[9, 1] -2.750 3.454 -9.382 3.513 0.036 0.030 9179.0 6928.0 1.0
_b[10, 0] -2.990 1.674 -6.288 0.013 0.030 0.022 3078.0 4149.0 1.0
_b[10, 1] 3.240 1.826 0.070 6.695 0.028 0.020 4260.0 5428.0 1.0
_b[11, 0] 5.040 3.803 -2.111 12.227 0.040 0.032 9066.0 6680.0 1.0
_b[11, 1] -3.339 4.222 -11.266 4.586 0.037 0.038 13294.0 6124.0 1.0
_b[12, 0] -2.352 2.341 -6.562 2.127 0.034 0.026 4690.0 5113.0 1.0
_b[12, 1] -6.179 3.515 -12.886 0.402 0.039 0.030 8130.0 6207.0 1.0
_b[13, 0] -0.049 0.015 -0.077 -0.021 0.000 0.000 2533.0 3390.0 1.0
_b[13, 1] -0.023 0.017 -0.054 0.008 0.000 0.000 3578.0 4895.0 1.0

Coefficients are not the same as BUGS results.

new_attributes = np.array(
    [1, 12.9, 2, 2.4, 17, 100, 2.8, 2.1, 0.35, 1.6, 5, 1.05, 3, 750]
).reshape((1, 14))
pm.set_data({"X": new_attributes}, model=m)
ppc = pm.sample_posterior_predictive(trace, model=m)
Sampling: [likelihood]
100.00% [8000/8000 00:01<00:00]
ppc.posterior_predictive.mean(
    dim=["chain", "draw", "likelihood_dim_2"]
).likelihood.values
array([0.91875632, 0.07775492, 0.00348876])

Getting the same predicted category, but with slightly different probabilities.

%load_ext watermark
%watermark -n -u -v -iv -p pytensor
Last updated: Sun Apr 23 2023

Python implementation: CPython
Python version       : 3.11.0
IPython version      : 8.9.0

pytensor: 2.11.1

pytensor: 2.11.1
numpy   : 1.24.2
arviz   : 0.15.1
pandas  : 1.5.3
pymc    : 5.3.0