Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Linear Regression

image

1 / 48

The Steps of Statistical Modeling

  1. What is your question?

  2. What model of the world matches your question?

  3. Is your model valid?

  4. Query your model to answer your question.

2 / 48

Our question of the day: What is the relationship between inbreeding coefficient and litter size in wolves?



3 / 48

Roll that beautiful linear regression with 95% CI footage

4 / 48

Regression to Be Mean

  1. What is regression?

  2. What do regression coefficients mean?

  3. What do the error coefficients of a regression mean?

  4. Correlation and Regression

  5. Transformation and Model Structure for More Sensible Coefficients

5 / 48

What is a regression?

y = a + bx + error

6 / 48

What is a regression?

y = a + bx + error

This is 90% of the modeling you will ever do because...

6 / 48

What is a regression?

y = a + bx + error

This is 90% of the modeling you will ever do because...

Everything is a linear model!

  • multiple parameters (x1, x2, etc...)

  • nonlinear transformations of y or x

  • multiplicative terms (b x1 x2) are still additive

  • generalized linear models with non-normal error

  • and so much more....

6 / 48

EVERYTHING IS A LINEAR MODEL

7 / 48

Linear Regression


yi=β0+β1xi+ϵi

ϵii.i.d.N(0,σ)



Then it’s code in the data, give the keyboard a punch
Then cross-correlate and break for some lunch
Correlate, tabulate, process and screen
Program, printout, regress to the mean

-White Coller Holler by Nigel Russell

8 / 48

Regressions You Have Seen

Classic style:

yi=β0+β1xi+ϵi ϵiN(0,σ)

9 / 48

Regressions You Have Seen

Classic style:

yi=β0+β1xi+ϵi ϵiN(0,σ)


Prediction as Part of Error:

^yi=β0+β1xi yiN(^yi,σ)

9 / 48

Regressions You Have Seen

Classic style:

yi=β0+β1xi+ϵi ϵiN(0,σ)


Prediction as Part of Error:

^yi=β0+β1xi yiN(^yi,σ)


Matrix Style: Y=Xβ+ϵ

9 / 48

These All Are Equation-Forms of This Relationship

10 / 48

Regression to Be Mean

  1. What is regression?

  2. What do regression coefficients mean?

  3. What do the error coefficients of a regression mean?

  4. Correlation and Regression

  5. Transformation and Model Structure for More Sensible Coefficients

11 / 48

What are we doing with regression?

Goals:

12 / 48

What are we doing with regression?

Goals:

  1. Association

    • What is the strength of a relationship between two quantities
    • Not causal
12 / 48

What are we doing with regression?

Goals:

  1. Association

    • What is the strength of a relationship between two quantities
    • Not causal
  2. Prediction

    • If we have two groups that differ in their X value by 1 unit, what is the average difference in their Y unit?
    • Not causal
12 / 48

What are we doing with regression?

Goals:

  1. Association

    • What is the strength of a relationship between two quantities
    • Not causal
  2. Prediction

    • If we have two groups that differ in their X value by 1 unit, what is the average difference in their Y unit?
    • Not causal
  3. Counterfactual

    • What would happen to an individual if their value of X increased by one unit?
    • Causal reasoning!
12 / 48

What Can We Say About This?

13 / 48

Model Coefficients: Slope

term estimate std.error
(Intercept) 6.567 0.791
inbreeding.coefficient -11.447 3.189
14 / 48

Model Coefficients: Slope

term estimate std.error
(Intercept) 6.567 0.791
inbreeding.coefficient -11.447 3.189
  1. Association: A one unit increase in inbreeding coefficient is associated with ~11 fewer pups, on average.

  2. Prediction: A new wolf with an inbreeding coefficient 1 unit greater than a second new wolf will have ~11 fewer pups, on average.

  3. Counterfactual: If an individual wolf had had its inbreeding coefficient 1 unit higher, it would have ~11 fewer pups.

14 / 48

Which of these is the correct thing to say? When?

  1. Association: A one unit increase in inbreeding coefficient is associated with ~11 fewer pups, on average.

  2. Prediction: A new wolf with an inbreeding coefficient 1 unit greater than a second new wolf will have ~11 fewer pups, on average.

  3. Counterfactual: If an individual wolf had had its inbreeding coefficient 1 unit higher, it would have ~11 fewer pups.

15 / 48

11 Fewer Pups? What would be, then, a Better Way to Talk About this Slope?

16 / 48

Model Coefficients: Intercept

term estimate std.error
(Intercept) 6.567 0.791
inbreeding.coefficient -11.447 3.189



17 / 48

Model Coefficients: Intercept

term estimate std.error
(Intercept) 6.567 0.791
inbreeding.coefficient -11.447 3.189



When the inbreeding coefficient is 0, a wolves will have ~6.6 pups, on average.

17 / 48

Intercept Has Direct Interpretation on the Visualization

18 / 48

Regression to Be Mean

  1. What is regression?

  2. What do regression coefficients mean?

  3. What do the error coefficients of a regression mean?

  4. Correlation and Regression

  5. Transformation and Model Structure for More Sensible Coefficients

19 / 48

Two kinds of error

  1. Fit error - error due to lack of precision in estimates

    • Coefficient SE
    • Precision of estimates
  2. Residual error - error due to variability not explained by X.

    • Residual SD (from ϵi)
20 / 48

Precision: coefficient SEs

term estimate std.error
(Intercept) 6.567 0.791
inbreeding.coefficient -11.447 3.189
21 / 48

Precision: coefficient SEs

term estimate std.error
(Intercept) 6.567 0.791
inbreeding.coefficient -11.447 3.189


  • Shows precision of ability to estimate coefficients

  • Gets smaller with bigger sample size!

  • Remember, ~ 2 SE covered 95% CI

  • Comes from likelihood surface...but we'll get there

21 / 48

Visualizing Precision: 95% CI (~2 SE)

22 / 48

Visualizing Precision with Simulation from your Model

23 / 48

Residual Error

r.squared sigma
0.369 1.523
  • Sigma is the SD of the residual

ϵiN(0,σ)

  • How much does does # of pups vary beyond the relationship with inbreeding coefficient?

  • For any number of pups estimated on average, ~68% of the # of pups observed will fall within ~1.5 of that number

24 / 48

Visualizing Residual Error's Implications

25 / 48

Residual Error -> Variance Explained

r.squared sigma
0.369 1.523
  • R2=1σ2residualσ2y

    • Fraction of the variation in Y related to X.

    • Here, 36.9% of the variation in pups is related to variation in Inbreeding Coefficient

    • Relates to r, the Pearson correlation coefficient

26 / 48

Regression to Be Mean

  1. What is regression?

  2. What do regression coefficients mean?

  3. What do the error coefficients of a regression mean?

  4. Correlation and Regression

  5. Transformation and Model Structure for More Sensible Coefficients

27 / 48

What is Correlation?

  • The change in standard deviations of variable x per change in 1 SD of variable y
    • Clear, right?
  • Assesses the degree of association between two variables
  • But, unitless (sort of)
    • Between -1 and 1
28 / 48

Calculating Correlation: Start with Covariance

Describes the relationship between two variables. Not scaled.

29 / 48

Calculating Correlation: Start with Covariance

Describes the relationship between two variables. Not scaled.

σxy = population level covariance
sxy = covariance in your sample

29 / 48

Calculating Correlation: Start with Covariance

Describes the relationship between two variables. Not scaled.

σxy = population level covariance
sxy = covariance in your sample




σXY=(XˉX)(yˉY)n1

29 / 48

Calculating Correlation: Start with Covariance

Describes the relationship between two variables. Not scaled.

σxy = population level covariance
sxy = covariance in your sample




σXY=(XˉX)(yˉY)n1

29 / 48

Pearson Correlation

Describes the relationship between two variables.
Scaled between -1 and 1.


ρxy = population level correlation, rxy = correlation in your sample




`ρxy=σxyσxσy`
30 / 48

Assumptions of Pearson Correlation

  • Observations are from a random sample
  • Each observation is independent
  • X and Y are from a Normal Distribution
    • Weaker assumption

31 / 48

The meaning of r

Y is perfectly predicted by X if r = -1 or 1.


R2 = the porportion of variation in y explained by x

32 / 48

Get r in your bones...




http://guessthecorrelation.com/
33 / 48

Example: Wolf Breeding and Litter Size



34 / 48

Example: Wolf Inbreeding and Litter Size

Covariance Matrix:

inbreeding.coefficient pups
inbreeding.coefficient 0.01 -0.11
pups -0.11 3.52
35 / 48

Example: Wolf Inbreeding and Litter Size

Covariance Matrix:

inbreeding.coefficient pups
inbreeding.coefficient 0.01 -0.11
pups -0.11 3.52

Correlation Matrix:

inbreeding.coefficient pups
inbreeding.coefficient 1.00 -0.61
pups -0.61 1.00
35 / 48

Example: Wolf Inbreeding and Litter Size

Covariance Matrix:

inbreeding.coefficient pups
inbreeding.coefficient 0.01 -0.11
pups -0.11 3.52

Correlation Matrix:

inbreeding.coefficient pups
inbreeding.coefficient 1.00 -0.61
pups -0.61 1.00

Yes, you can estimate a SE (cor.test() or bootstrapping)

35 / 48

Wait, so, how does Correlation relate to Regression? Slope versus r...

b=sxys2x =cov(x,y)var(x)

36 / 48

Wait, so, how does Correlation relate to Regression? Slope versus r...

b=sxys2x =cov(x,y)var(x)

=rxysysx

36 / 48

Correlation v. Regression Coefficients

37 / 48

Or really, r is just the coefficient of a fit lm with a z-transform of our predictors

zi=xiˉxσx

  • When we z-transform variables, we put them on the same scale

  • The covariance between two z-transformed variables is their correlation!

38 / 48

Correlation versus Standardized Regression: It's the Same Picture

z(yi)=β0+β1z(xi)+ϵi

term estimate std.error
(Intercept) 0.000 0.166
inbreeding_std -0.608 0.169

versus correlation: -0.608

39 / 48

EVERYTHING IS A LINEAR MODEL

40 / 48

Regression to Be Mean

  1. What is regression?

  2. What do regression coefficients mean?

  3. What do the error coefficients of a regression mean?

  4. Correlation and Regression

  5. Transformation and Model Structure for More Sensible Coefficients

41 / 48

Modifying (transformating) Your Regression: Centering you X

  • Many times X = 0 is silly

  • E.g., if you use year, are you going to regress back to 0?

  • Centering X allows you to evaluate a meaningful intercept

    • what is Y at the mean of X
42 / 48

Centering X to generate a meaningful intercept

xi centered=ximean(x)

term estimate std.error
(Intercept) 3.958 0.311
inbreeding.centered -11.447 3.189

Intercept implies wolves with the average level of inbreeding in this study have ~4 pups. Wolves with higher inbreeding have fewer pups, wolves with lower inbreeding have more.

43 / 48

Centering X to generate a meaningful intercept

44 / 48

Modifying (transformating) Your Regression: Log Transform of Y

  • Often, Y cannot be negative

  • And/or the process generating Y is multiplicative

  • Log(Y) can fix this and other sins.

  • VERY common, but, what do the coefficients mean?

    • exp(β1)1 percent change in Y for chance in 1 unit of X
45 / 48

Other Ways of Looking at This Relationship: Log Transformation of Y

log(yi)=β0+β1xi+ϵi

  • relationship is now curved
  • cannot have negative pups (yay!)

46 / 48

Model Coefficients: Log Slope

term estimate std.error
(Intercept) 1.944 0.215
inbreeding.coefficient -2.994 0.869
47 / 48

Model Coefficients: Log Slope

term estimate std.error
(Intercept) 1.944 0.215
inbreeding.coefficient -2.994 0.869
To understand the coefficient, remember yi=eβ0+β1xi+ϵi

exp(-2.994)-1 = -0.95, so, a 1 unit increase in x causes y to lose 95% of its value, so...

47 / 48

Model Coefficients: Log Slope

term estimate std.error
(Intercept) 1.944 0.215
inbreeding.coefficient -2.994 0.869
To understand the coefficient, remember yi=eβ0+β1xi+ϵi

exp(-2.994)-1 = -0.95, so, a 1 unit increase in x causes y to lose 95% of its value, so...

Association: A one unit increase in inbreeding coefficient is associated with having 95% fewer pups, on average.

47 / 48

You are now a Statistical Wizard. Be Careful. Your Model is a Golem.

(sensu Richard McElreath)

48 / 48

The Steps of Statistical Modeling

  1. What is your question?

  2. What model of the world matches your question?

  3. Is your model valid?

  4. Query your model to answer your question.

2 / 48
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow