Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Evaluating Fit Linear Models


1 / 72

Etherpad



https://etherpad.wikimedia.org/p/607-lm-2020

2 / 72

Putting Linear Regression Into Practice with Pufferfish

  • Pufferfish are toxic/harmful to predators

  • Batesian mimics gain protection from predation - why?

  • Evolved response to appearance?

  • Researchers tested with mimics varying in toxic pufferfish resemblance

3 / 72

Question of the day: Does Resembling a Pufferfish Reduce Predator Visits?

4 / 72

Digging Deeper into Regression

  1. Assumptions: Is our fit valid?

  2. How did we fit this model?

  3. How do we draw inference from this model?

5 / 72

You are now a Statistical Wizard. Be Careful. Your Model is a Golem.

(sensu Richard McElreath)

6 / 72

A Case of "Great" versus "Not as Great" Fits...

7 / 72

The Two Fits





8 / 72

Assumptions (in rough descending order of importance)

  1. Validity

  2. Representativeness

  3. Model captures features in the data

  4. Additivity and Linearity

  5. Independence of Errors

  6. Equal Variance of Errors

  7. Normality of Errors

  8. Minimal Outlier Influence

9 / 72

Validity: Do X and Y Reflect Concepts I'm interested In

What if predator approaches is not a good measure of recognition? Or mimics just don't look like fish?

10 / 72

Solution to lack of validity:

Reframe your question! Change your framing! Question your life choices!

11 / 72

Representativeness: Does Your Data Represent the Population?

For example, say this is your result...

12 / 72

But is that all there is to X in nature?

13 / 72

Representativeness: Does Your Data Represent the Population?

What if you are looking at only a piece of the variation in X in your population?

14 / 72

Representativeness: Does Your Data Represent the Population?

How should you have sampled this population for a representative result?

15 / 72

Representativeness: Does Your Data Represent the Population?

It's better to have more variation in X than just a bigger N

16 / 72

Representativeness: Does Your Data Represent the Population?

  • Always question if you did a good job sampling

  • Use natural history and the literature to get the bounds of values

  • If experimenting, make sure your treatment levels are representative

  • If you realize post-hoc they are not, qualify your conclusions

17 / 72

Model captures features in the data

Does the model seem to fit the data? Are there any deviations? Can be hard to see...

18 / 72

Simulating implications from the model to see if we match features in the data

Is anything off?

19 / 72

But what to wolves say to you?

20 / 72

Additivity and Linearity: Should account for all of the variation between residual and fitted values - what you want

21 / 72

Additivity and Linearity: Wolf Problems?

22 / 72

Additivity and Linearity: Wolf Problems?

Solutions: Nonlinear transformations or a better model!

22 / 72

Independence of Errors

  • Are all replicates TRULY independent

  • Did they come from the same space, time, etc.

  • Non-independence can introduce BIAS

    • SEs too small (at the least)
    • Causal inference invalid
  • Incoporate Non-independence into models (many methods)

23 / 72

Equal Variance of Errors: No Pattern to Residuals and Fitted Values

24 / 72

Equal Variance of Errors: What is up with intermediate Wolf Values

25 / 72

Equal Variance of Errors: Problems and Solutions

  • Shapes (cones, footballs, etc.) with no bias in fitted v. residual relationship

  • A linear relationship indicates an additivity problem

  • Can solve with a better model (more predictors)

  • Can solve with weighting by X values, if source of heteroskedasticity known

    • This actually means we model the variance as a function of X
    • ϵi(N,f(xi))
  • Minor problem for coefficient estimates

  • Major problem for doing inference and prediction as it changes error

26 / 72

Normality of errors: Did we fit the error generating process that we observed?

  • We assumed ϵiN(0,σ) - but is that right?

  • Can assess with a QQ-plot

    • Do quantiles of the residuals match quantiles of a normal distribution?
  • Again, minor problem for coefficient estimates

  • Major problem for doing inference and prediction, as it changes error

27 / 72

Equal Variance of Errors: Puffers

28 / 72

Equal Variance of Errors: Wolves underpredict at High Levels

29 / 72

Equal Variance of Errors: Wolves underpredict at High Levels

Shapiro-Wilk normality test
data: residuals(wolf_mod)
W = 0.9067, p-value = 0.02992
29 / 72

Outliers: Cook's D

Want no values > 1

30 / 72

Outliers: Cook's D - wolves OK

31 / 72

Everyone worries about outliers, but...

  • Are they real?

  • Do they indicate a problem or a nonlinearity?

  • Remove only as a dead last resort

  • If from a nonlinearity, consider transformation

32 / 72

Assumptions (in rough descending order of importance)

  1. Validity: only you know!

  2. Representativeness: look at nature

  3. Model captures features in the data: compare model v. data!

  4. Additivity and Linearity: compare model v. data!

  5. Independence of Errors: conisder sampling design

  6. Equal Variance of Errors: evaluate res-fit

  7. Normality of Errors: evaluate qq and levene test

  8. Minimal Outlier Influence: evaluate Cook's D

33 / 72

Digging Deeper into Regression

  1. Assumptions: Is our fit valid?

  2. How did we fit this model?

  3. How do we draw inference from this model?

34 / 72

So, uh.... How would you fit a line here?

35 / 72

Lots of Possible Lines - How would you decide?

36 / 72

Method of Model Fitting

  1. Least Squares

    • Conceptually Simple
    • Minimizes distance between fit and residuals
    • Approximations of quantities based on frequentist logic
  2. Likelihood

    • Flexible to many error distributions and other problems
    • Produces likelihood surface of different parameter values
    • Equivalent to least square for Gaussian likelihood
    • Approximations of quantities based on frequentist logic
  3. Bayesian

    • Incorporates prior knowledge
    • Probability for any parameter is likelihood * prior
    • Superior for quantifying uncertainty
    • With "flat" priors, equivalent to least squares/likelihood
    • Analytic or simulated calculation of quantities
37 / 72

Basic Principles of Least Squares Regression

ˆY=β0+β1X+ϵ where β0 = intercept, β1 = slope

Minimize Residuals defined as SSresiduals=(YiˆY)2

38 / 72

Let's try it out!

39 / 72

Analytic Solution: Solving for Slope



b=sxys2x =cov(x,y)var(x)

40 / 72

Analytic Solution: Solving for Slope



b=sxys2x =cov(x,y)var(x)

=rxysysx

40 / 72

Analytic Solution: Solving for Intercept



Least squares regression line always goes through the mean of X and Y

ˉY=β0+β1ˉX



41 / 72

Analytic Solution: Solving for Intercept



Least squares regression line always goes through the mean of X and Y

ˉY=β0+β1ˉX



β0=ˉYβ1ˉX

41 / 72

Digging Deeper into Regression

  1. Assumptions: Is our fit valid?

  2. How did we fit this model?

  3. How do we draw inference from this model?

42 / 72

Inductive v. Deductive Reasoning



Deductive Inference: A larger theory is used to devise many small tests.

Inductive Inference: Small pieces of evidence are used to shape a larger theory and degree of belief.

43 / 72

Applying Different Styles of Inference

  • Null Hypothesis Testing: What's the probability that things are not influencing our data?

    • Deductive
  • Cross-Validation: How good are you at predicting new data?

    • Deductive
  • Model Comparison: Comparison of alternate hypotheses

    • Deductive or Inductive
  • Probabilistic Inference: What's our degree of belief in a data?

    • Inductive
44 / 72

Null Hypothesis Testing is a Form of Deductive Inference

Falsification of hypotheses is key!

A theory should be considered scientific if, and only if, it is falsifiable.

45 / 72

Null Hypothesis Testing is a Form of Deductive Inference

Falsification of hypotheses is key!

A theory should be considered scientific if, and only if, it is falsifiable.

Look at a whole research program and falsify auxilliary hypotheses

45 / 72

A Bigger View of Dedictive Inference

https://plato.stanford.edu/entries/lakatos/#ImprPoppScie

46 / 72

Reifying Refutation - What is the probability something is false?

What if our hypothesis was that the resemblance-predator relationship was 2:1. We know our SE of our estimate is 0.57, so, we have a distribution of what we could observe.

47 / 72

Reifying Refutation - What is the probability something is false?

BUT - our estimated slope is 3.

48 / 72

To falsify the 2:1 hypothesis, we need to know the probability of observing 3, or something GREATER than 3.

We want to know if we did this experiment again and again, what's the probability of observing what we saw or worse (frequentist!)

49 / 72

To falsify the 2:1 hypothesis, we need to know the probability of observing 3, or something GREATER than 3.

We want to know if we did this experiment again and again, what's the probability of observing what we saw or worse (frequentist!)

Probability = 0.04

49 / 72

To falsify the 2:1 hypothesis, we need to know the probability of observing 3, or something GREATER than 3.

We want to know if we did this experiment again and again, what's the probability of observing what we saw or worse (frequentist!)

Probability = 0.04

49 / 72

Null hypothesis testing is asking what is the probability of our observation or more extreme observation given that some null expectation is true.

(it is NOT the probability of any particular alternate hypothesis being true)

50 / 72

R.A. Fisher and The P-Value For Null Hypotheses

P-value: The Probability of making an observation or more extreme observation given that the null hypothesis is true.

51 / 72

Applying Fisher: Evaluation of a Test Statistic

We use our data to calculate a test statistic that maps to a value of the null distribution.

We can then calculate the probability of observing our data, or of observing data even more extreme, given that the null hypothesis is true.

P(XData|H0)

52 / 72

Problems with P

  • Most people don't understand it.
    • See American Statistical Society' recent statements
53 / 72

Problems with P

  • Most people don't understand it.

    • See American Statistical Society' recent statements
  • Like SE, it gets smaller with sample size!

53 / 72

Problems with P

  • Most people don't understand it.

    • See American Statistical Society' recent statements
  • Like SE, it gets smaller with sample size!- Neyman-Pearson Null Hypothesis Significance Testing

    • For Industrial Quality Control, NHST was introduced to establish cutoffs of reasonable p, called an α
    • This corresponds to Confidence intervals - 1-$\alpha$ = CI of interest
    • This has become weaponized so that α=0.05 has become a norm.... and often determines if something is worthy of being published?
    • Chilling effect on science
53 / 72

Problems with P

  • Most people don't understand it.

    • See American Statistical Society' recent statements
  • Like SE, it gets smaller with sample size!- Neyman-Pearson Null Hypothesis Significance Testing

    • For Industrial Quality Control, NHST was introduced to establish cutoffs of reasonable p, called an α
    • This corresponds to Confidence intervals - 1-$\alpha$ = CI of interest
    • This has become weaponized so that α=0.05 has become a norm.... and often determines if something is worthy of being published?
    • Chilling effect on science
  • We don't know how to talk about it

53 / 72

How do you talk about results from a p-value?

  • Based on your experimental design, what is a reasonable range of p-values to expect if the null is false

  • Smaller p values indicate stronger support for rejection, larger ones weaker. Use that language.

  • Accumulate multiple lines of evidence so that the entire edifice of your research does not rest on a single p-value!!!!

54 / 72

For example, what does p = 0.061 mean?

  • There is a 6.1% chance of obtaining the observed data or more extreme data given that the null hypothesis is true.

  • If you choose to reject the null, you have a ~ 1 in 16 chance of being wrong

  • Are you comfortable with that?

  • OR - What other evidence would you need to make you more or less comfortable?

55 / 72

Common Regression Test Statistics

  • Does my model explain variability in the data?

    • Null Hypothesis: The ratio of variability from your predictors versus noise is 1
    • Test Statistic: F distribution (describes ratio of two variances)
  • Are my coefficients not 0?

    • Null Hypothesis: Coefficients are 0
    • Test Statistic: T distribution (normal distribution modified for low sample size)
56 / 72

Does my model explain variability in the data?

Ho = The model predicts no variation in the data.

Ha = The model predicts variation in the data.

57 / 72

Does my model explain variability in the data?

Ho = The model predicts no variation in the data.

Ha = The model predicts variation in the data.

To evaluate these hypotheses, we need to have a measure of variation explained by data versus error - the sums of squares!

57 / 72

Does my model explain variability in the data?

Ho = The model predicts no variation in the data.

Ha = The model predicts variation in the data.

To evaluate these hypotheses, we need to have a measure of variation explained by data versus error - the sums of squares! SSTotal=SSRegression+SSError

57 / 72

Sums of Squares of Error, Visually

58 / 72

Sums of Squares of Regression, Visually

59 / 72

Sums of Squares of Regression, Visually

Distance from ˆy to ˉy

60 / 72

Components of the Total Sums of Squares

SSR=(^YiˉY)2, df=1

SSE=(YiˆYi)2, df=n-2

61 / 72

Components of the Total Sums of Squares

SSR=(^YiˉY)2, df=1

SSE=(YiˆYi)2, df=n-2

To compare them, we need to correct for different DF. This is the Mean Square.

MS=SS/DF

e.g, MSE=SSEn2

61 / 72

The F Distribution and Ratios of Variances

F=MSRMSE with DF=1,n-2

62 / 72

F-Test and Pufferfish

Df Sum Sq Mean Sq F value Pr(>F)
resemblance 1 255.1532 255.153152 27.37094 5.64e-05
Residuals 18 167.7968 9.322047 NA NA



63 / 72

F-Test and Pufferfish

Df Sum Sq Mean Sq F value Pr(>F)
resemblance 1 255.1532 255.153152 27.37094 5.64e-05
Residuals 18 167.7968 9.322047 NA NA



We reject the null hypothesis that resemblance does not explain variability in predator approaches

63 / 72

Testing the Coefficients

  • F-Tests evaluate whether elements of the model contribute to variability in the data
    • Are modeled predictors just noise?
    • What's the difference between a model with only an intercept and an intercept and slope?
64 / 72

Testing the Coefficients

  • F-Tests evaluate whether elements of the model contribute to variability in the data
    • Are modeled predictors just noise?
    • What's the difference between a model with only an intercept and an intercept and slope?
  • T-tests evaluate whether coefficients are different from 0
64 / 72

Testing the Coefficients

  • F-Tests evaluate whether elements of the model contribute to variability in the data
    • Are modeled predictors just noise?
    • What's the difference between a model with only an intercept and an intercept and slope?
  • T-tests evaluate whether coefficients are different from 0

  • Often, F and T agree - but not always

    • T can be more sensitive with multiple predictors
64 / 72

xkcd

65 / 72
66 / 72
67 / 72

T-Distributions are What You'd Expect Sampling a Standard Normal Population with a Small Sample Size

  • t = mean/SE, DF = n-1
  • It assumes a normal population with mean of 0 and SD of 1

68 / 72

Error in the Slope Estimate


SEb=MSESSX

95% CI = b±tα,dfSEb

(~ 1.96 when N is large)

69 / 72

Assessing the Slope with a T-Test


tb=bβ0SEb

DF=n-2

H0:β0=0, but we can test other hypotheses

70 / 72

Slope of Puffer Relationship (DF = 1 for Parameter Tests)

Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.924694 1.5064163 1.277664 0.2176012
resemblance 2.989492 0.5714163 5.231724 0.0000564


We reject the hypothesis of no slope for resemblance, but fail to reject it for the intercept.

71 / 72

So, what can we say?

  • We reject that there is no relationship between resemblance and predator visits in our experiment.
  • 0.6 of the variability in predator visits is associated with resemblance.

72 / 72

Etherpad



https://etherpad.wikimedia.org/p/607-lm-2020

2 / 72
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow