Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Frequentist Tests of Statistical Models

1 / 70

Making P-Values with Models

  1. Linear Models

  2. Generalized Linear Models & Likelihood

  3. Mixed Models

2 / 70

Common Regression Test Statistics

  • Are my coefficients 0?
    • Null Hypothesis: Coefficients are 0
    • Test Statistic: T distribution (normal distribution modified for low sample size)
3 / 70

Common Regression Test Statistics

  • Are my coefficients 0?
    • Null Hypothesis: Coefficients are 0
    • Test Statistic: T distribution (normal distribution modified for low sample size)
  • Does my model explain variability in the data?
    • Null Hypothesis: The ratio of variability from your predictors versus noise is 1
    • Test Statistic: F distribution (describes ratio of two variances)
3 / 70
4 / 70
5 / 70

T-Distributions are What You'd Expect Sampling a Standard Normal Population with a Small Sample Size

  • t = mean/SE, DF = n-1
  • It assumes a normal population with mean of 0 and SD of 1

6 / 70

Assessing the Slope with a T-Test


tb=bβ0SEb

DF=n-2

H0:β0=0, but we can test other hypotheses

7 / 70

Slope of Puffer Relationship (DF = 1 for Parameter Tests)

Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.925 1.506 1.278 0.218
resemblance 2.989 0.571 5.232 0.000


p is very small here so...
We reject the hypothesis of no slope for resemblance, but fail to reject it for the intercept.

8 / 70

So, what can we say in a null hypothesis testing framework?

  • We reject that there is no relationship between resemblance and predator visits in our experiment.

  • 0.6 of the variability in predator visits is associated with resemblance.

9 / 70

Does my model explain variability in the data?

Ho = The model predicts no variation in the data.

Ha = The model predicts variation in the data.

10 / 70

Does my model explain variability in the data?

Ho = The model predicts no variation in the data.

Ha = The model predicts variation in the data.

To evaluate these hypotheses, we need to have a measure of variation explained by data versus error - the sums of squares!

10 / 70

Does my model explain variability in the data?

Ho = The model predicts no variation in the data.

Ha = The model predicts variation in the data.

To evaluate these hypotheses, we need to have a measure of variation explained by data versus error - the sums of squares!

This is an Analysis of Variance..... ANOVA!

10 / 70

Does my model explain variability in the data?

Ho = The model predicts no variation in the data.

Ha = The model predicts variation in the data.

To evaluate these hypotheses, we need to have a measure of variation explained by data versus error - the sums of squares!

This is an Analysis of Variance..... ANOVA! SSTotal=SSRegression+SSError

10 / 70

Sums of Squares of Error, Visually

11 / 70

Sums of Squares of Regression, Visually

Distance from ˆy to ˉy

12 / 70

Components of the Total Sums of Squares

SSR=(^YiˉY)2, df=1

SSE=(YiˆYi)2, df=n-2

13 / 70

Components of the Total Sums of Squares

SSR=(^YiˉY)2, df=1

SSE=(YiˆYi)2, df=n-2

To compare them, we need to correct for different DF. This is the Mean Square.

MS=SS/DF

e.g, MSE=SSEn2

13 / 70

The F Distribution and Ratios of Variances

F=MSRMSE with DF=1,n-2

14 / 70

F-Test and Pufferfish

Df Sum Sq Mean Sq F value Pr(>F)
resemblance 1 255.1532 255.153152 27.37094 5.64e-05
Residuals 18 167.7968 9.322047 NA NA



15 / 70

F-Test and Pufferfish

Df Sum Sq Mean Sq F value Pr(>F)
resemblance 1 255.1532 255.153152 27.37094 5.64e-05
Residuals 18 167.7968 9.322047 NA NA



We reject the null hypothesis that resemblance does not explain variability in predator approaches

15 / 70

Testing the Coefficients

  • F-Tests evaluate whether elements of the model contribute to variability in the data
    • Are modeled predictors just noise?
    • What's the difference between a model with only an intercept and an intercept and slope?
16 / 70

Testing the Coefficients

  • F-Tests evaluate whether elements of the model contribute to variability in the data
    • Are modeled predictors just noise?
    • What's the difference between a model with only an intercept and an intercept and slope?
  • T-tests evaluate whether coefficients are different from 0
16 / 70

Testing the Coefficients

  • F-Tests evaluate whether elements of the model contribute to variability in the data
    • Are modeled predictors just noise?
    • What's the difference between a model with only an intercept and an intercept and slope?
  • T-tests evaluate whether coefficients are different from 0

  • Often, F and T agree - but not always

    • T can be more sensitive with multiple predictors
16 / 70

What About Models with Categorical Variables?

  • T-tests for Coefficients with Treatment Contrasts

  • F Tests for Variability Explained by Including Categorical Predictor

    • ANOVA
  • More T-Tests for Posthoc Evaluation

17 / 70

What Explains This Data? Same as Regression (because it's all a linear model)

18 / 70

Variability due to Model (between groups)

19 / 70

Variability due to Error (Within Groups)

20 / 70

F-Test to Compare



SSTotal=SSModel+SSError

21 / 70

F-Test to Compare



SSTotal=SSModel+SSError

(Classic ANOVA: SSTotal=SSBetween+SSWithin)

21 / 70

F-Test to Compare



SSTotal=SSModel+SSError

(Classic ANOVA: SSTotal=SSBetween+SSWithin)

Yes, these are the same!

21 / 70

F-Test to Compare

SSModel=ij(¯YiˉY)2, df=k-1

SSError=ij(Yij¯Yi)2, df=n-k

To compare them, we need to correct for different DF. This is the Mean Square.

MS = SS/DF, e.g, MSW=SSWnk

22 / 70

ANOVA

Df Sum Sq Mean Sq F value Pr(>F)
group 2 0.5402533 0.2701267 7.823136 0.0012943
Residuals 42 1.4502267 0.0345292 NA NA

We have strong confidence that we can reject the null hypothesis

23 / 70

This Works the Same for Multiple Categorical

TreatmentHo: μi1=μi2=μi3=...

Block Ho: μj1=μj2=μj3=...

i.e., The variance due to each treatment type is no different than noise

24 / 70

We Decompose Sums of Squares for Multiple Predictors

SSTotal=SSA+SSB+SSError

  • Factors are Orthogonal and Balanced, so, Model SS can be split
    • F-Test using Mean Squares as Before
25 / 70

What About Unbalanced Data or Mixing in Continuous Predictors?

  • Let's Assume Y ~ A + B where A is categorical and B is continuous
26 / 70

What About Unbalanced Data or Mixing in Continuous Predictors?

  • Let's Assume Y ~ A + B where A is categorical and B is continuous

  • F-Tests are really model comparisons

26 / 70

What About Unbalanced Data or Mixing in Continuous Predictors?

  • Let's Assume Y ~ A + B where A is categorical and B is continuous

  • F-Tests are really model comparisons

  • The SS for A is the Residual SS of Y ~ A + B - Residual SS of Y ~ B

    • This type of SS is called marginal SS or type II SS
26 / 70

What About Unbalanced Data or Mixing in Continuous Predictors?

  • Let's Assume Y ~ A + B where A is categorical and B is continuous

  • F-Tests are really model comparisons

  • The SS for A is the Residual SS of Y ~ A + B - Residual SS of Y ~ B

    • This type of SS is called marginal SS or type II SS
  • Proceed as normal
26 / 70

What About Unbalanced Data or Mixing in Continuous Predictors?

  • Let's Assume Y ~ A + B where A is categorical and B is continuous

  • F-Tests are really model comparisons

  • The SS for A is the Residual SS of Y ~ A + B - Residual SS of Y ~ B

    • This type of SS is called marginal SS or type II SS
  • Proceed as normal

  • This also works for interactions, where interactions are all tested including additive or lower-level interaction components

    • e.g., SS for AB is RSS for A+B+AB - RSS for A+B
26 / 70

Warning for Working with SAS Product Users (e.g., JMP)

  • You will sometimes see Type III which is sometimes nonsensical
    • e.g., SS for A is RSS for A+B+AB - RSS for B+AB
27 / 70

Warning for Working with SAS Product Users (e.g., JMP)

  • You will sometimes see Type III which is sometimes nonsensical
    • e.g., SS for A is RSS for A+B+AB - RSS for B+AB
  • Always question the default settings!
27 / 70

F-Tests tell you if you can reject the null that predictors do not explain anything

28 / 70

Post-Hoc Comparisons of Groups

  • Only compare groups if you reject a Null Hypothesis via F-Test
    • Otherwise, any results are spurious
    • This is why they are called post-hocs
29 / 70

Post-Hoc Comparisons of Groups

  • Only compare groups if you reject a Null Hypothesis via F-Test
    • Otherwise, any results are spurious
    • This is why they are called post-hocs
  • We've been here before with SEs and CIs
29 / 70

Post-Hoc Comparisons of Groups

  • Only compare groups if you reject a Null Hypothesis via F-Test
    • Otherwise, any results are spurious
    • This is why they are called post-hocs
  • We've been here before with SEs and CIs

  • In truth, we are using T-tests

29 / 70

Post-Hoc Comparisons of Groups

  • Only compare groups if you reject a Null Hypothesis via F-Test
    • Otherwise, any results are spurious
    • This is why they are called post-hocs
  • We've been here before with SEs and CIs

  • In truth, we are using T-tests

  • BUT - we now correct p-values for Family-Wise Error (if at all)

29 / 70

Making P-Values with Models

  1. Linear Models

  2. Generalized Linear Models & Likelihood

  3. Mixed Models

30 / 70

To Get There to Testing, We Need To Understand Likelihood and Deviance

.center[

31 / 70

Likelihood: how well data support a given hypothesis.

32 / 70

Likelihood: how well data support a given hypothesis.

Note: Each and every parameter choice IS a hypothesis

32 / 70

Likelihood Defined



L(H|D)=p(D|H)

Where the D is the data and H is the hypothesis (model) including a both a data generating process with some choice of parameters (often called θ). The error generating process is inherent in the choice of probability distribution used for calculation.

33 / 70

The Maximum Likelihood Estimate is the value at which p(D|θ) - our likelihood function - is highest.

34 / 70

The Maximum Likelihood Estimate is the value at which p(D|θ) - our likelihood function - is highest.

To find it, we search across various values of θ

34 / 70

MLE for Multiple Data Points

Let's say this is our data:

[1] 3.37697212 3.30154837 1.90197683 1.86959410 0.20346568 3.72057350
[7] 3.93912102 2.77062225 4.75913135 3.11736679 2.14687718 3.90925918
[13] 4.19637296 2.62841610 2.87673977 4.80004312 4.70399588 -0.03876461
[19] 0.71102505 3.05830349
35 / 70

MLE for Multiple Data Points

Let's say this is our data:

[1] 3.37697212 3.30154837 1.90197683 1.86959410 0.20346568 3.72057350
[7] 3.93912102 2.77062225 4.75913135 3.11736679 2.14687718 3.90925918
[13] 4.19637296 2.62841610 2.87673977 4.80004312 4.70399588 -0.03876461
[19] 0.71102505 3.05830349

We know that the data comes from a normal population with a σ of 1.... but we want to get the MLE of the mean.

35 / 70

MLE for Multiple Data Points

Let's say this is our data:

[1] 3.37697212 3.30154837 1.90197683 1.86959410 0.20346568 3.72057350
[7] 3.93912102 2.77062225 4.75913135 3.11736679 2.14687718 3.90925918
[13] 4.19637296 2.62841610 2.87673977 4.80004312 4.70399588 -0.03876461
[19] 0.71102505 3.05830349

We know that the data comes from a normal population with a σ of 1.... but we want to get the MLE of the mean.

p(D|θ)=p(Di|θ)

35 / 70

MLE for Multiple Data Points

Let's say this is our data:

[1] 3.37697212 3.30154837 1.90197683 1.86959410 0.20346568 3.72057350
[7] 3.93912102 2.77062225 4.75913135 3.11736679 2.14687718 3.90925918
[13] 4.19637296 2.62841610 2.87673977 4.80004312 4.70399588 -0.03876461
[19] 0.71102505 3.05830349

We know that the data comes from a normal population with a σ of 1.... but we want to get the MLE of the mean.

p(D|θ)=p(Di|θ)

    = dnorm(Di,μ,σ=1)

35 / 70

Likelihood At Different Choices of Mean, Visually

36 / 70

The Likelihood Surface

MLE = 2.896

37 / 70

The Log-Likelihood Surface

We use Log-Likelihood as it is not subject to rounding error, and approximately χ2 distributed.

38 / 70

The χ2 Distribution

  • Distribution of sums of squares of k data points drawn from N(0,1)

  • k = Degrees of Freedom

  • Measures goodness of fit

  • A large probability density indicates a match between the squared difference of an observation and expectation

39 / 70

The χ2 Distribution, Visually

40 / 70

Hey, Look, it's the Standard Error!

The 68% CI of a χ2 distribution is 0.49, so....

41 / 70

Hey, Look, it's the 95% CI!

The 95% CI of a χ2 distribution is 1.92, so....

42 / 70

The Deviance: -2 * Log-Likelihood

  • Measure of fit. Smaller deviance = closer to perfect fit
  • We are minimizing now, just like minimizing sums of squares
  • Point deviance residuals have meaning
  • Point deviance of linear regression = mean square error!

43 / 70

Putting MLE Into Practice with Pufferfish

  • Pufferfish are toxic/harmful to predators

  • Batesian mimics gain protection from predation - why?

  • Evolved response to appearance?

  • Researchers tested with mimics varying in toxic pufferfish resemblance

44 / 70

This is our fit relationship

45 / 70

Likelihood Function for Linear Regression




Will often see:

`L(θ|D)=ni=1p(yi|xi; β0,β1,σ)`
46 / 70

Likelihood Function for Linear Regression: What it Means



L(θ|Data)=ni=1N(Visitsi|β0+β1Resemblancei,σ)

where β0,β1,σ are elements of θ

47 / 70

The Log-Likelihood Surface from Grid Sampling

48 / 70

Let's Look at Profiles to get CIs

49 / 70

Evaluate Coefficients

term estimate std.error statistic p.value
(Intercept) 1.925 1.506 1.278 0.218
resemblance 2.989 0.571 5.232 0.000


Test Statistic is a Wald Z-Test Assuming a well behaved quadratic Confidence Interval

50 / 70

What about comparing models, or evaluating if variables (continuous or categorical) should be included?

51 / 70

Can Compare p(data | H) for alternate Parameter Values - and it could be 0!

Compare p(D|θ1) versus p(D|θ2)

52 / 70

Likelihood Ratios


G=L(H1|D)L(H2|D)

  • G is the ratio of Maximum Likelihoods from each model
53 / 70

Likelihood Ratios


G=L(H1|D)L(H2|D)

  • G is the ratio of Maximum Likelihoods from each model

  • Used to compare goodness of fit of different models/hypotheses

53 / 70

Likelihood Ratios


G=L(H1|D)L(H2|D)

  • G is the ratio of Maximum Likelihoods from each model

  • Used to compare goodness of fit of different models/hypotheses

  • Most often, θ = MLE versus θ = 0

53 / 70

Likelihood Ratios


G=L(H1|D)L(H2|D)

  • G is the ratio of Maximum Likelihoods from each model

  • Used to compare goodness of fit of different models/hypotheses

  • Most often, θ = MLE versus θ = 0

  • 2log(G) is χ2 distributed

53 / 70

Likelihood Ratio Test

  • A new test statistic: D=2log(G)
54 / 70

Likelihood Ratio Test

  • A new test statistic: D=2log(G)

  • =2[Log(L(H2|D))Log(L(H1|D))]

54 / 70

Likelihood Ratio Test

  • A new test statistic: D=2log(G)

  • =2[Log(L(H2|D))Log(L(H1|D))]

  • We then scale by dispersion parameter (e.g., variance, etc.)

54 / 70

Likelihood Ratio Test

  • A new test statistic: D=2log(G)

  • =2[Log(L(H2|D))Log(L(H1|D))]

  • We then scale by dispersion parameter (e.g., variance, etc.)

  • It's χ2 distributed!

    • DF = Difference in # of Parameters
54 / 70

Likelihood Ratio Test

  • A new test statistic: D=2log(G)

  • =2[Log(L(H2|D))Log(L(H1|D))]

  • We then scale by dispersion parameter (e.g., variance, etc.)

  • It's χ2 distributed!

    • DF = Difference in # of Parameters
  • If H1 is the Null Model, we have support for our alternate model
54 / 70

Likelihood Ratio Test for Regression

  • We compare our slope + intercept to a model fit with only an intercept!

  • Note, models must have the SAME response variable

int_only <- glm(predators ~ 1, data = puffer)
55 / 70

Likelihood Ratio Test for Regression

  • We compare our slope + intercept to a model fit with only an intercept!

  • Note, models must have the SAME response variable

int_only <- glm(predators ~ 1, data = puffer)
  • We then use Analysis of Deviance (ANODEV)
55 / 70

Our First ANODEV

Analysis of Deviance Table
Model 1: predators ~ 1
Model 2: predators ~ resemblance
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 19 422.95
2 18 167.80 1 255.15 1.679e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note, uses Difference in Deviance / Dispersion where Dispersion = Variance as LRT

56 / 70

Or, R has Tools to Automate Doing This Piece by Piece

Analysis of Deviance Table (Type II tests)
Response: predators
LR Chisq Df Pr(>Chisq)
resemblance 27.371 1 1.679e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here, LRT = Difference in Deviance / Dispersion where Dispersion = Variance

57 / 70

Making P-Values with Models

  1. Linear Models

  2. Generalized Linear Models & Likelihood

  3. Mixed Models

58 / 70

Let's take this to the beach with Tide Height: RIKZ

59 / 70

How is Tidal Height of Measurement Associated With Species Richness?

60 / 70

Before we go any further - keep up to date at

https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#testing-hypotheses

61 / 70

The Big Question: What are your degrees of freedom in the presence of a random effect?

62 / 70

Approaches to approximating DF

  • Satterthwaite approximation - Based on sample sizes and variances within groups

    • lmerTest (which is kinda broken at the moment)
  • Kenward-Roger’s approximation.

    • Based on estimate of variance-covariance matrix of fixed effects and a scaling factor
    • More conservative - in car::Anova() and pbkrtest
63 / 70

Compare!

Baseline - only for balanced LMMs!

Analysis of Variance Table
npar Sum Sq Mean Sq F value
NAP 1 10.467 10.467 63.356

Satterwaith

Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
NAP 10.467 10.467 1 37.203 63.356 1.495e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kenward-Roger

Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)
Response: log_richness
F Df Df.res Pr(>F)
NAP 62.154 1 37.203 1.877e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
64 / 70

The Smaller Questions and Answers

  1. With REML (for lmms), we often are conservative in our estimation of fixed effects. Should we use it?
    • use ML for FE tests if using Chisq
65 / 70

The Smaller Questions and Answers

  1. With REML (for lmms), we often are conservative in our estimation of fixed effects. Should we use it?
    • use ML for FE tests if using Chisq
  1. Should we use Chi-Square?
    • for GLMMs, yes.
    • Can be unreliable in some scenarios.
    • Use F tests for lmms
65 / 70

F and Chisq

F-test

# A tibble: 1 × 5
term statistic df Df.res p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 NAP 62.2 1 37.2 0.00000000188

LR Chisq

# A tibble: 1 × 4
term statistic df p.value
<chr> <dbl> <dbl> <dbl>
1 NAP 63.4 1 1.72e-15
66 / 70

F and Chisq

F-test

# A tibble: 1 × 5
term statistic df Df.res p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 NAP 62.2 1 37.2 0.00000000188

LR Chisq

# A tibble: 1 × 4
term statistic df p.value
<chr> <dbl> <dbl> <dbl>
1 NAP 63.4 1 1.72e-15

LR Chisq where REML = FALSE

# A tibble: 1 × 4
term statistic df p.value
<chr> <dbl> <dbl> <dbl>
1 NAP 65.6 1 5.54e-16
66 / 70

What about Random Effects

  • For LMMs, make sure you have fit with REML = TRUE
67 / 70

What about Random Effects

  • For LMMs, make sure you have fit with REML = TRUE

  • One school of thought is to leave them in place

    • Your RE structure should be determined by sampling design
    • Do you know enough to change your RE structure?
67 / 70

What about Random Effects

  • For LMMs, make sure you have fit with REML = TRUE

  • One school of thought is to leave them in place

    • Your RE structure should be determined by sampling design
    • Do you know enough to change your RE structure?
  • But sometimes you need to test!
    • Can sequentially drop RE effects with lmerTest::ranova()
    • Can simulate models with 0 variance in RE with rlrsim, but gets tricky
67 / 70

RANOVA

ranova(rikz_varint)
ANOVA-like table for random-effects: Single term deletions
Model:
log_richness ~ NAP + (1 | Beach)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 4 -32.588 73.175
(1 | Beach) 3 -38.230 82.460 11.285 1 0.0007815 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
68 / 70

Testing Mixed Models

  1. Yes you can!
69 / 70

Testing Mixed Models

  1. Yes you can!

  2. Some of the fun issues of denominator DF raise their heads

69 / 70

Testing Mixed Models

  1. Yes you can!

  2. Some of the fun issues of denominator DF raise their heads

  3. Keep up to date on the literature/FAQ when you are using them!

69 / 70

70 / 70

Making P-Values with Models

  1. Linear Models

  2. Generalized Linear Models & Likelihood

  3. Mixed Models

2 / 70
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow