12. Multiple Linear Regression#

Contributed by Jason Naramore.

Multiple linear regression is really an extension of simple linear regression when there are multiple predictors \(x_1, x_2, ... x_k\).

\[\begin{split}\begin{align*} y_i & = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_k x_{ik} + \epsilon_i, i = 1 , ... , n \\ \epsilon_i & \overset{iid}{\sim} N (0,\sigma^2) \end{align*} \end{split}\]

The compact representation can be written as follows, using matrix form.

\[\begin{split} \bf{y} = \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix}_{n \times 1} \space X = \begin{bmatrix} 1 & X_{11} & \dots & X_{1k} \\ 1 & X_{21} & \dots & X_{2k} \\ \vdots \\ 1 & X_{n1} & \dots & X_{nk} \end{bmatrix}_{n \times p} \space \boldsymbol\beta = \begin{bmatrix} \beta_0 \\ \vdots \\ \beta_k \end{bmatrix}_{p \times 1} \space \boldsymbol\epsilon = \begin{bmatrix} \epsilon_1 \\ \vdots \\ \epsilon_n \end{bmatrix}_{n \times 1} \end{split}\]
\[\bf{y} = \boldsymbol\beta \bf{X} + \boldsymbol\epsilon\]

There’s an elegant closed-form solution for finding the \(\hat{\boldsymbol\beta}\) estimates in classical statistics:

\[ \hat{\boldsymbol\beta} = (\textbf{X}^T \textbf{X} ) ^{-1} \textbf{X y} \]

\(\hat{\bf y}\) estimates can then be found by \(\hat{\bf y} = \bf X \hat{\boldsymbol \beta}\).

In the Bayesian form of multiple linear regression, we place priors on all \(\beta\)’s and \(\sigma^2\).

\[\begin{split} \begin{align*} y_{ij} & \sim N(\mu,\sigma^2) && \text{likelihood}\\ \\ \mu & = \beta_0 + \beta_1 x && \text{deterministic relationship}\\ \beta_j & \sim N(0,\sigma_j^2) && \text{prior: } \beta_j, \space j = 0 \text{ to } k \\ \tau & \sim Ga(a,b) && \text{prior: } \tau\\ \sigma^2 & = 1/\tau && \text{deterministic relationship}\\ \end{align*}\end{split}\]

We typically assume that \(\beta\)’s are independent. For non-informative priors, we might set \(\sigma_j^2\) to something large like \(10^3\) or \(10^4\), and the \(a\) and \(b\) parameters in \(\tau\)’s Gamma distribution to something small like \(10^{-3}\). An example with this prior setup is

An example using independent Normal priors on \(\beta\)’s can be found here: Taste of Cheese.

There are other methods for defining priors, such as Zellner’s prior, which may help to account for covariance between predictors.

\[\begin{split} \begin{align*} \boldsymbol\beta & \sim MVN(\mu,g \cdot \sigma^2\textbf{V}) && \text{prior: }\boldsymbol\beta \\ \sigma^2 & \sim IG(a,b) && \text{prior: } \sigma^2\\ \end{align*}\end{split}\]
\[\text{typical choices: } g = n, \space g = p^2, \space g = \max\{n,p^2\}\]

where \(\sigma^2\textbf{V}\) is the covariance matrix. An example using Zellner’s prior can be found here: Brozek Index Prediction.