12. Bayesian Testing#

First, I recommend checking out 3blue1brown’s video on Bayes’ factor to help with intuition. There is a similar example in Professor Vidakov’s statistics book: Vidakovic [2017] page 103. Unlike the classical approach, Bayesian testing doesn’t prioritize the null hypothesis. Instead, we calculate the posterior probabilities for both hypotheses, then choose the one with the larger posterior probability.

Lecture errata#

During the precise null example, the lecture has an incorrectly-labeled equation for the posterior. Instead of

\[\begin{split} \begin{align*} \pi(\theta \mid x) &= \frac{f(x \mid \theta_0) \pi_0}{m(x)} = \frac{\pi_0 f(x|\theta_0)}{\pi_0 f(x|\theta_0) + \pi_1 m_1(x)} \\ &= \left(1 + \frac{\pi_1}{\pi_0} \cdot \frac{m_1(x)}{f(x \mid \theta_0}\right)^{-1} \end{align*} \end{split}\]

it should be \(\pi(\theta_0 \mid x)\) at the beginning.

Point masses#

The professor notes that your overall prior needs to contain the point mass representing your precise null hypothesis.

What the professor means here is that you can only test a hypothesis where your prior allows for a probability greater than 0. So if your prior is just a continuous distribution, the probability at any given point is 0. At a high level, if your prior doesn’t contain your hypothesis, you are essentially ruling out the hypothesis before you even build the rest of your model. Without mixing in the point mass, you’ve already predetermined that your hypothesis is impossible.

That’s why he mixes that point mass and the “spread” distribution; you need a discrete distribution to test a specific point.

Scales for the strength of evidence#

The professor uses Jeffrey’s scale (Jeffreys [2003] Appendix B) in the lecture.

Grade

K Value

Interpretation

0

\(K > 1\)

Null hypothesis supported.

1

\(1 > K > 10^{-1/2}\)

Evidence against q, but not worth more than a bare mention.

2

\(10^{-1/2} > K > 10^{-1}\)

Evidence against q substantial.

3

\(10^{-1} > K > 10^{-3/2}\)

Evidence against q strong.

4

\(10^{-3/2} > K > 10^{-2}\)

Evidence against q very strong.

5

\(10^{-2} > K\)

Evidence against q decisive.

This scale is subjective, though. There are other ones out there: the Kass & Raftery scale (Kass and Raftery [1995]) and the Lee and Wagenmakers scale (Lee and Wagenmakers [2013]) are two alternatives.