16. Multinomial Logit#

Contributed by Jason Naramore.

Multinomial logit models are a generalization of logistic regression when there are more than 2 categories in the response. If there are \(n\) categories, the multinomial logit model is:

\[\begin{split} \begin{align*} y_1, y_2, ... , y_n & \sim Mn(\textbf{p} , 1) \\ \textbf{p} & = (p_1, p_2, ... , p_K) \\ y_i & = (y_{i1}, y_{i2}, ... , y_{iK}), y_{ij} = 1, y_{i, \neq j} = 0, \space j \in \{1,...,K\} \\ \end{align*} \end{split}\]

The second parameter in the Multinomial distribution is the number of trials, which is \(n = 1\) in the case of Multinomial logit.

For example, the \(i\)th response could be \(y_i = (0,0,0,1,0)\), meaning the 4th category is true, and categories 1, 2, 3, and 5 are false. The notation would be:

\[\begin{split} \begin{align*} y_i & = (0,0,0,1,0) \\ K & = 5 \\ y_{i4} & = 1 \\ y_{i, \neq 4} & = 0 \\ \end{align*} \end{split}\]

similar to the probability \(p\) in a logistic regression for predicting category 1 over category 0, a vector of probabilities \(p = (p_1, p_2, ... , p_K)\) is produced in the Multinomial logit model. In order to do this, the liniear combination of \(\beta\) coefficients and \(x\) predictors are calculated for each category into \(\eta\), and the \(\eta\)’s are normalized so that the sum of \(p's\) equals 1:

\[\begin{split} \begin{align*} \eta_{ij} & = \beta_{0j} + \beta_{1j} x_{i1} + ... + \beta_{p-1,j} x_{i,p-1}\\ p_{ij} & = \frac{e^{\eta_{ij}}}{\sum_{k=1}^K e^{\eta_{ik}}} \\ \end{align*} \end{split}\]

so there is a \(\beta\) coefficient for each category \(k\) and each \(x\) predictor \(i\). To put it all together, the Bayesian model is:

\[\begin{split} \begin{align*} y_1, y_2, ... , y_n & \sim Mn(\textbf{p} , 1) && \text{likelihood}\\ \\ \eta_{ij} & = \beta_{0j} + \beta_{1j} x_{i1} + ... + \beta_{p-1,j} x_{i,p-1} && \text{deterministic relationship} \\ p_{ij} & = \frac{e^{\eta_{ij}}}{\sum_{k=1}^K e^{\eta_{ik}}} && \text{deterministic relationship} \\ \beta_{ij} & \sim N(0,\sigma_j^2) && \text{prior: } \beta_{ij} \\ \end{align*}\end{split}\]