12. Bayesian Testing#

Lecture errata#

I’ve marked the lecture errors in red text in the rewritten slides below.

Rewriting the slides#

We’ve gotten a lot of questions about these slides for this lecture, so I’ve rewritten them here with some additional information.

First, I recommend checking out 3blue1brown’s video on Bayes’ factor to help with intuition, particularly the part about expressing Bayes’ rule in terms of the prior and posterior odds.

There is a similar example in Professor Vidakov’s statistics book: Vidakovic [2017] page 103. Unlike the classical approach, Bayesian testing doesn’t prioritize the null hypothesis. Instead, we calculate the posterior probabilities for both hypotheses, then choose the one with the larger posterior probability.


1#

Assume that Θ0 and Θ1 are two non-overlapping sets of parameter θ. We want to test
$H0:θΘ0v.s.H1:θΘ1$

The probabilities are:

p0=Θ0π(θ|x)dθ=Pθ|X(H0)
p1=Θ1π(θ|x)dθ=Pθ|X(H1)

See slides 5 and 6 for an example.


2#

Prior probabilities of hypotheses#

π0=Θ0π(θ)dθ,π1=Θ1π(θ)dθ

B01 – Bayes Factor in favor of H0: $B01=p0/p1π0/π1,(posterior odds / prior odds)$

B10 – Bayes Factor in favor of H1 is the reciprocal: $B10=1B01$

Precise null H0:θ=θ0 requires a prior with point mass at θ0.#

See the example on slide 3.


3#

Precise Null#

The professor notes that, if you want to test a precise number, your overall prior needs to contain the point mass representing your precise null hypothesis.

What he means here is that you can only test a hypothesis where your prior allows for a probability greater than 0. So if your prior is a continuous distribution, the probability at any given point is 0. At a high level, if your prior doesn’t contain your hypothesis, you are essentially ruling out the hypothesis before you even build the rest of your model. Without mixing in the point mass, you’ve already predetermined that your hypothesis is impossible.

That’s why he mixes that point mass δθ0 and the “spread” distribution ξ(θ) in the prior π(θ); you need a discrete distribution to test a specific point. $H0:θ=θ0v.s.H1:θθ0$

π(θ)=π0δθ0+(1π0)ξ(θ)where (1π0)=π1
m(x)=π0f(x|θ0)+π1m1(x)
m1(x)={θθ0}f(x|θ)ξ(θ)dθ
π(θ0|x)=f(x|θ0)π0m(x)=π0f(x|θ0)π0f(x|θ0)+π1m1(x)=π0f(x|θ0)π0f(x|θ0)(1+π1m1(x)π0f(xθ0))=1(1+π1m1(x)π0f(xθ0))π(θ0|x)=(1+π1π0m1(x)f(x|θ0))1

Remembering that Bayes’ factor is what updates the prior odds to posterior odds,

Odds(HiA)=BF×Odds(Hi)

we see that

B01=f(x|θ0)m1(x)=f(x|θ0){θθ0}f(x|θ)ξ(θ)dθ

4#

Scales for the strength of evidence#

The professor uses Jeffreys scale (Jeffreys [2003] Appendix B) in the lecture.

Grade

K Value

Interpretation

0

K>1

Null hypothesis supported.

1

1>K>101/2

Evidence against q, but not worth more than a bare mention.

2

101/2>K>101

Evidence against q substantial.

3

101>K>103/2

Evidence against q strong.

4

103/2>K>102

Evidence against q very strong.

5

102>K

Evidence against q decisive.

This scale is just one of many, though. The Kass & Raftery scale (Kass and Raftery [1995]) and the Lee and Wagenmakers scale (Lee and Wagenmakers [2013]) are two alternatives.


5#

Example: Jeremy’s IQ#

In the context of Jeremy’s IQ example, test the hypotheses

H0:θ100v.s.H1:θ>100
θ|xN(102.8,48)
  • p0=Pθ|X(H0)=10012π48e(θ102.8)2248dθ

    =normcdf(100,102.8,48)
    =0.3431
  • p1=Pθ|X(H1)=10.3431=0.6569


6#

  • π0=Pθ(H0)=10012π120e(θ110)22120dθ

    =normcdf(100,110,120)
    =0.1807
π1=10.1807=0.8193
  • B10=p1/p0π1/π0=0.6569/0.34310.8193/0.1807=1.91464.5340=0.4223

p1p0=B10×π1π0
  • log10B01=log10B10=0.3744(poor evidence in favor of H0)


7#

Example: 10 flips of a coin revised#

X|pBin(n,p)pBe(500,500)

Posterior: p|XBe(500,510)


  • We already found the posterior mean in a previous example:

    E(p|X)=5001010=0.4950495
  • The mode for Be(α,β) is given by:

    α1α+β2;here the posterior mode is 4991008=0.4950397
  • The median (not explicit, uses special functions):

    betainv(0.5,500,510)=0.4950462

    Approximation:

    α1/3α+β2/3=499.66661009.3333=0.4950462

8#

Test: H0:p0.5 vs. H1:p>0.5#

p0=00.51B(500,510)p5001(1p)5101dp=betacdf(0.5,500,510)=0.6235p1=1p0=0.3765
π0=00.51B(500,500)p5001(1p)5001dp=betacdf(0.5,500,500)=0.5π1=1π0=0.5

B01=p0/p1π0/π1=0.62350.3765=1.656
log10B01=0.2191(Poor evidence against H1)

9#

Precise Null Test#

H0:p=0.5 vs. H1:p0.5

π(p)=0.8δ0.5+0.2Be(500,500)
π0=0.8,π1=0.2
m1(x)|X=0=m1(0)=01(100)p0(1p)101B(500,500)p5001(1p)5001dp=B(500,510)B(500,500)=0.001021
f(x|p)|x=0,p=0.5=f(0|0.5)=(100)0.500.510=11024=0.0009765
B01=f(0|0.5)m1(0)=0.00097650.001021=0.9564
log10B10=log10B01=log100.9564=0.0194

Very poor evidence against H0.