6. Analysis of Variance#

This page contributed by Jason Naramore!

One-Way ANOVA#

In an ANOVA model, we are trying to discern whether the means of multiple treatment groups are equivalent or different. The null hypothesis is that the mean of all treatment groups are equal:

\[\begin{split}\begin{align*} H_0 & : \mu_1 = \mu_2 = ... = \mu_a \\ H_1 & : \text{one or more groups have a different } \mu \end{align*}\end{split}\]

The ANOVA model is defined as

\[\begin{split}\begin{align*} y_{ij} & \sim N(\mu_i, \sigma^2) \\ \mu_i & = \mu + a_i, \space i = 1,...,a \space ; j = 1,...,n_i \end{align*}\end{split}\]

where \(i\) represents the treatment group and \(j\) is the sample number. Variance \(\sigma^2\) is assumed to be common among all groups. \(\mu\) is the grand mean of all the data, and \(a_i\) is the treatment effect of the \(i\)th group. Now we can re-define the null hypothesis as:

\[\begin{split}\begin{align*} H_0 & : a_1 = a_2 = ... = a_a \\ \end{align*}\end{split}\]

which basically is interpreted as the difference in treatment groups is equal to zero. In the model, we will need to use one of 2 constraints to have an acceptable degrees of freedom. In the sum-to-zero (STZ) constraint, the sum of \(a\)’s are equal to zero. Or, we can set one of the \(a\)’s, say \(a_1\), to zero.

In classical statistics, an ANOVA table is built and an F-test is used to determine if \(H_0\) is rejected. In the case where \(H_0\) is rejected, we then would need to make additional models for each comparison of treatment groups to further analyze treatment effects.

The Bayesian approach to ANOVA is to set priors on \(\mu\), \(a_1,a_2,...,a_a\), and \(\sigma^2\). The Bayesian model is constructed as:

\[\begin{split} \begin{align*} y_i & \sim N(\mu_i,\sigma^2) && \text{likelihood}\\ \\ \mu & \sim N(\mu_i,\sigma_0^2) && \text{prior: grand mean}\\ \mu_i & = \mu + a_i && \text{deterministic relationship}\\ a_i & \sim N(0,\sigma_i^2) && \text{prior: } a_i\\ \tau &\sim Ga(0.001,0.001) && \text{prior: } \tau\\ \sigma^2 & = 1/\tau && \text{deterministic relationship}\\ \\ \text{Subject to: } & \sum a_i = 0 && \text{STZ constraint}\\ \end{align*}\end{split}\]

Here \(\sigma_0^2\) and \(\sigma_i^2\) are representing prior variances of the grand mean and treatment groups, respectively. Non-informative variances might be something like \(\sigma_0^2 = 1000\). To assess \(H_0\), we will look at the posterior distributions of the \(a_i\)’s to see if they are significantly different than zero. If we want to compare treatment groups, we can calculate their difference in the Bayesian model, and then look at the posterior distribution of the difference to see if it’s significantly different than zero.