10. Simple Linear Regression#

Contributed by Jason Naramore.

The simple linear regression model is:

\[\begin{split}\begin{align*} y_i & = \beta_0 + \beta_1 x_{i1} + \epsilon_i, i = 1 , ... , n \\ \epsilon_i & \overset{iid}{\sim} N (0,\sigma^2) \end{align*} \end{split}\]

where data is in the form of paired \(x_{i1}\) and \(y_i\) values from \(i = 1,2,...,n\). \(\beta_1\) and \(\beta_0\) are the slope and intercept, respectively. The goal is to find the best fit line for predicting \(y\) based on \(x\). The best fit line is almost never perfect, so one of the typical assumptions is that the error \(\epsilon\) about the \(\hat{y}\) fit is normally distributed.

Classical statistics has an elegant approach to solving this problem. We can find the estimate of \(\hat{\beta}_1 = \frac{SXY}{SXX}\), where

\[SXY = \sum_{i=1}^n (x_i - \bar{x}) (y_i - \bar{y}) \]
\[SXX = \sum_{i=1}^n (x_i - \bar{x})^2 \]

and then find \(\hat{\beta}_0 = \bar{y} - \hat{\beta_1} \bar{x} \). Furthermore, we can assess the model fit using \(R^2\), which is found using SSE, SSR, and SST. \(R^2\) is the variance in \(y\) explained by the model.

\[\begin{split}\begin{align*} SSE & = \sum_{i=1}^n (y_i - \hat{y}_i)^2 \\ SSR & = \sum_{i=1}^n (\hat{y}_i - \bar{y})^2 \\ SST & = \sum_{i=1}^n (y_i - \bar{y})^2 \\ R^2 & = \frac{SSR}{SST} = 1 - \frac{SSE}{SST} \\ \end{align*}\end{split}\]

In the Bayesian model for simple linear regression, we set priors for \(\beta_0\), \(\beta_1\), and \(\sigma^2\):

\[\begin{split} \begin{align*} y_{ij} & \sim N(\mu,\sigma^2) && \text{likelihood}\\ \\ \mu & = \beta_0 + \beta_1 x && \text{deterministic relationship}\\ \beta_0 & \sim N(0,\sigma_0^2) && \text{prior: } \beta_0\\ \beta_1 & \sim N(0,\sigma_1^2) && \text{prior: } \beta_1\\ \tau & \sim Ga(a,b) && \text{prior: } \tau\\ \sigma^2 & = 1/\tau && \text{deterministic relationship}\\ \end{align*}\end{split}\]

Typical non-informative priors for \(\beta_0\) and \(\beta_1\) are centered at zero, with a wide standard deviation. A non-informative prior for \(\tau\) could be \(Ga(a = 0.001, b = 0.001)\). In PyMC we can define a Normal distribution with either the \(\tau\) or \(\sigma\) parameter, so we could define the \(\sigma^2\) distribution directly instead of using a deterministic relationship to \(\tau\). Other possible non-negative priors for \(\sigma^2\) might be Inverse-Gamma, Half-Normal, or Half-flat.