Quantifying Goodness of Fit: the \[\chi^2\]

Number of Observations in Categories

Consider the following data generating process:

Then add this error generating process:

Do our observed values fit our expectations?

\(H_0\): Observations = Expectations
We are testing goodness of fit!
If there is no difference, deviations should be normally distributed noise
Differences can be positive or negtive - so we square them
The square of a normal distribution is the χ2 distribution!
- The χ2 is defined by degrees of freedom = n-1!

\[\chi^2 = \sum\frac{\displaystyle(O_i-E_i)^2}{E_i}\]

Are births evenly spread across the week?

\(\chi^2\) = 15.24 with 6 DF

p = 0.01847

Eizaguirre lab

	Heavily Infected	Lightly Infected	Uninfected
Eaten by Birds	37	10	1
Not Eaten by Birds	9	35	49

p(eaten AND uninfected) = p(eaten) x p(infected)

    Eaten by Birds Not Eaten by Birds 
                48                 93

Heavily Infected Lightly Infected       Uninfected 
              46               45               50

p(eaten AND uninfected) = 48/141 * 50/141

E(eaten AND uninfected) = 17

\[\chi^2 = \sum_{row=1}^{r}\sum_{col = 1}^{c}\frac{\displaystyle(O_{r,c}-E_{r,c})^2}{E_{r,c}}\]

df = (r-1)(c-1)


    Pearson's Chi-squared test

data:  ctab
X-squared = 69.756, df = 2, p-value = 7.124e-16

Given that the goal is to detect deviations from expectations given normal error, this test has a few assumptions:

If you violate assumptions: