Evaluating Fit Linear Models


http://www.smbc-comics.com/?id=2469/

Etherpad



https://etherpad.wikimedia.org/p/607-lm-2018

Outline

  1. Putting your model to the test

  2. Evaluating fit

  3. How you can get it wrong

  4. Power and Linear Regression

The Steps of Statistical Modeling

  1. What is your question?
  2. What model of the world matches your question?
  3. Build a test
  4. Evaluate test assumptions
  5. Evaluate test results
  6. Visualize

Our Puffer Example

  • Pufferfish are toxic/harmful to predators

  • Batesian mimics gain protection from predation - why?

  • Evolved response to appearance?

  • Researchers tested with mimics varying in toxic pufferfish resemblance

Question: Does Resembling a Pufferfish Reduce Predator Visits?

You have Fit a Valid Model. Now…

  1. Does your model explain variation in the data?

  2. Are your coefficients different from 0?

  3. How much variation is retained by the model?

  4. How confident can you be in model predictions?

Testing the Model

Ho = The model predicts no variation in the data.
Ha = The model predicts variation in the data.



To evaluate these hypotheses, we need to have a measure of variation explained by data versus error - the sums of squares!

\[SS_{Total} = SS_{Regression} + SS_{Error}\]

Sums of Squares of Error, Visually

Sums of Squares of Regression, Visually

Sums of Squares of Regression, Visually

Distance from \(\hat{y}\) to \(\bar{y}\)

Components of the Total Sums of Squares

\(SS_{R} = \sum(\hat{Y_{i}} - \bar{Y})^{2}\), df=1

\(SS_{E} = \sum(Y_{i} - \hat{Y}_{i})^2\), df=n-2

To compare them, we need to correct for different DF. This is the Mean Square.

MS=SS/DF

e.g, \(MS_{E} = \frac{SS_{E}}{n-2}\)

The F Distribution and Ratios of Variances

\(F = \frac{MS_{R}}{MS_{E}}\) with DF=1,n-2

1-Tailed Test

F-Test and Pufferfish

Df Sum Sq Mean Sq F value Pr(>F)
resemblance 1 255.1532 255.153152 27.37094 5.64e-05
Residuals 18 167.7968 9.322047 NA NA


We reject the null hypothesis that resemblance does not explain variability in predator approaches

Testing the Coefficients

  • F-Tests evaluate whether elements of the model contribute to variability in the data
    • Are modeled predictors just noise?
    • What’s the difference between a model with only an intercept and an intercept and slope?

  • T-tests evaluate whether coefficients are different from 0

  • Often, F and T agree - but not always
    • T can be more sensitive with multiple predictors

Error in the Slope Estimate


\(\Large SE_{b} = \sqrt{\frac{MS_{E}}{SS_{X}}}\)


95% CI = \(b \pm t_{\alpha,df}SE_{b}\)


(~ 1.96 when N is large)

Assessing the Slope with a T-Test


\[\Large t_{b} = \frac{b - \beta_{0}}{SE_{b}}\]

DF=n-2



\(H_0: \beta_{0} = 0\), but we can test other hypotheses

Slope of Puffer Relationship

Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.924694 1.5064163 1.277664 0.2176012
resemblance 2.989492 0.5714163 5.231724 0.0000564


We reject the hypothesis of no slope for resemblance, but fail to reject it for the intercept.

The Steps of Statistical Modeling

  1. What is your question?
  2. What model of the world matches your question?
  3. Build a test
  4. Evaluate test assumptions
  5. Evaluate test results
  6. Visualize

Visualize Your Fit

Outline

  1. Putting your model to the test

  2. Evaluating fit

  3. How you can get it wrong

  4. Power and Linear Regression

Evaluating Goodness of Fit


  • You can have a strong slope, reject and F-test, and still have a meaningless result

  • How much variation does your model explain in the data?

  • What is the SD of your residual relative to the predicted values?

Coefficient of Determination

\(R^2\) = The porportion of Y is predicted by X.

\[R^2 = \frac{SS_{regression}}{SS_{total}}\]

\[= 1 - \frac{SS_{regression}}{SS_{error}}\]

Coefficient of Determination



https://xkcd.com/1725/

How well does our line fit?


R2 = 0.603

Confidence Intervals Around Fit

Accomodates uncertainty (SE) in slope & intercept

Or, Fit Confidence Interval via Simulation using Coefficient SEs

  • Values close to mean of X and Y are more certain.
  • Uncertainty increases at edges.

Confidence Intervals Around Fit

\(\hat{y}\pm t_{n-2}s_{y}\sqrt{\frac{1}{n}+\frac{(x^{*}-x)^2}{(n-1)s_{x}^2}}\)


\(s_{y} = \sqrt{\frac{SS_{E}}{n-2}}\) 

  • Incorporates variability in residuals, distance from center of regression, sample size 
  •  
  • t value for desired CI of fit. Note, for 95% CI, as n is large, multiplier converges to 1.96
  • Confidence Intervals Around Prediction


    • Fit CIs show results of imprecise estimation of parameters

    • Remember out model has an \(\epsilon_i\) in it, though!

    • This means that there are other sources of variability that influence our response variable

    • If we want to make a new prediction we need to incorporate the SD around the line - 3.0532028 in the puffer model

    • Note: Extrapolation beyond range of data is bad practice

    Fit Intervals v. Prediction intervals

    • Fit Intervals are about uncertainty in parameter estimates

    • Fit Intervals do not show uncertainty due to factors other than the predictor

    • Fit Intervals tell you what the world could look like if X=x and there are no other influences on Y

    • Prediction Intervals accomodate full range of uncertainty

    • Prediction Intervals tell you what the world could look like if X=x

    Coefficient of Determination



    https://xkcd.com/605/

    Confidence Intervals Around Prediction of a New Value

    \(\hat{y}\pm t_{n-2}s_{y}\sqrt{1+\frac{1}{n}+\frac{(x^{*}-x)^2}{(n-1)s_{x}^2}}\)



    Confidence of where the true value of \(\hat{y}\) lies
    Large n converges on t distribution of fitted value.

    Prediction Confidence Interval

    Outline

    1. Putting your model to the test

    2. Evaluating fit

    3. How you can get it wrong

    4. Power and Linear Regression

    The “Obese N”

    High sample size can lead to a low p-value, even if no association exists

    Sample Size and \(R^2\)

    High sample size can decrease \(R^2\) if residual SD is high relative to slope

    Little Variation in a Predictor = Low Power

    But is this all there is to X?

    But is that all there is?

    How Should this Population be Sampled?

    Same N, Higher Variation in X = More Power

    Outline

    1. Putting your model to the test

    2. Evaluating fit

    3. How you can get it wrong

    4. Power and Linear Regression

    What Influences Power of a Regression?

    1. Sample Size

    2. Effect Size (slope)

    3. Residual Variability

    4. Range of X Values

    Power Analysis of Regression

    • Yes, we can do this with equations

    • But, come on, it’s better with simulation!

    Relevant Info for Pufferfish

    term estimate std.error statistic p.value
    (Intercept) 1.924694 1.5064163 1.277664 0.2176012
    resemblance 2.989492 0.5714163 5.231724 0.0000564
    r.squared sigma df.residual
    0.6032703 3.053203 18


    (DF Residual = n-2)

    Let’s try different effect sizes: 0.5-5.5

    Let’s try different Sigmas - 1 to 10

    Let’s try different Sample Sizes Per Treatment - 2 to 10