\[\Large \boldsymbol{Y} = \boldsymbol{\beta X} + \boldsymbol{\epsilon} \]
This equation is huge. X can be anything - categorical, continuous, squared, sine, etc.
There can be straight additivity, or interactions
So far, the only model we’ve used with >1 predictor is ANOVA
ANOVA
ANCOVA
Multiple Linear Regression
MLR with Interactions
\[\Large \boldsymbol{Y} = \boldsymbol{\beta X} + \boldsymbol{\epsilon}\]
\[Y = \beta_0 + \beta_{1}x + \sum_{j}^{i=1}\beta_j + \epsilon\]
\[Y = \beta_0 + \beta_{1}x + \sum_{j}^{i=1}\beta_j + \epsilon\]
Who had a bigger brain: Neanderthals or us?
ANCOVA: Evaluate a categorical effect(s), controlling for a covariate (parallel lines)
Groups modify the intercept.
Independence of data points
Normality and homoscedacticity within groups (of residuals)
No relationship between fitted and residual values
Additivity of Treatment and Covariate (Parallel Slopes)
Sum Sq | Df | F value | Pr(>F) | |
---|---|---|---|---|
species | 0.0275528 | 1 | 6.220266 | 0.0175024 |
lnmass | 0.1300183 | 1 | 29.352684 | 0.0000045 |
species:lnmass | 0.0048452 | 1 | 1.093849 | 0.3027897 |
Residuals | 0.1550332 | 35 | NA | NA |
Sum Sq | Df | F value | Pr(>F) | |
---|---|---|---|---|
species | 0.0275528 | 1 | 6.204092 | 0.0174947 |
lnmass | 0.1300183 | 1 | 29.276363 | 0.0000043 |
Residuals | 0.1598784 | 36 | NA | NA |
What type of Sums of Squares?
II!
contrast estimate SE df t.ratio p.value
neanderthal - recent -0.0703 0.0282 36 -2.49 0.0175
Evluated at mean of covariate
ANOVA
ANCOVA
Multiple Linear Regression
MLR with Interactions
Curved double-headed arrow indicates COVARIANCE between predictors that we must account for.
MLR controls for the correlation - estimates unique contribution of each predictor.
\[\boldsymbol{Y} = \boldsymbol{b X} + \boldsymbol{\epsilon}\]
klm <- lm(rich ~ cover + firesev + hetero, data=keeley)
cover firesev hetero
cover 1.00000 -0.437135 -0.168378
firesev -0.43713 1.000000 -0.052355
hetero -0.16838 -0.052355 1.000000
\[VIF_1 = \frac{1}{1-R^2_{1}}\]
cover firesev hetero
1.2949 1.2617 1.0504
VIF \(>\) 5 or 10 can be problematic and indicate an unstable solution.
Solution: evaluate correlation and drop a predictor
Sum Sq | Df | F value | Pr(>F) | |
---|---|---|---|---|
cover | 1674.18 | 1 | 12.01 | 0.00 |
firesev | 635.65 | 1 | 4.56 | 0.04 |
hetero | 4864.52 | 1 | 34.91 | 0.00 |
Residuals | 11984.57 | 86 | NA | NA |
If order of entry matters, can use type I. Remember, what models are you comparing?
Type I | Type II | ||
Test for A | A v. 1 | A + B v. B | |
Test for B | A + B v. A | A + B v. A |
- Type II more conservative for A
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 1.68 | 10.67 | 0.16 | 0.88 |
cover | 15.56 | 4.49 | 3.47 | 0.00 |
firesev | -1.82 | 0.85 | -2.14 | 0.04 |
hetero | 65.99 | 11.17 | 5.91 | 0.00 |
R2 = 0.40986
\[r_{xy} = b_{xy}\frac{sd_{x}}{sd_{y}}\]
cover firesev hetero
0.32673 -0.19872 0.50160
Takes effect of predictor + residual of response
ANOVA
ANCOVA
Multiple Linear Regression
MLR with Interactions
\[y = \beta_0 + \beta_{1}x_{1} + \beta_{2}x_{2}+ \beta_{3}x_{1}x_{2}\]
keeley_lm_int <- lm(firesev ~ age*elev, data=keeley)
age elev age:elev
3.2001 5.5175 8.2871
This isn’t that bad. But it can be.
Often, interactions or nonlinear derived predictors are collinear with one or more of their predictors.
To remove, this, we center predictors - i.e., \(X_i - mean(X)\)
\[\huge X_i - \bar{X}\]
\[y = \beta_0 + \beta_{1}(x_{1}-\bar{x_{1}}) + \beta_{2}(x_{2}-\bar{x_{2}})+ \beta_{3}(x_{1}-\bar{x_{1}})(x_{2}-\bar{x_{2}})\]
Variance Inflation Factors for Centered Model:
age_c elev_c age_c:elev_c
1.0167 1.0418 1.0379
What type of Sums of Squares??
Sum Sq | Df | F value | Pr(>F) | |
---|---|---|---|---|
age | 52.9632 | 1 | 27.7092 | 0.00000 |
elev | 6.2531 | 1 | 3.2715 | 0.07399 |
age:elev | 22.3045 | 1 | 11.6693 | 0.00097 |
Residuals | 164.3797 | 86 | NA | NA |
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 1.81322 | 0.61561 | 2.9454 | 0.00415 |
age | 0.12063 | 0.02086 | 5.7823 | 0.00000 |
elev | 0.00309 | 0.00133 | 2.3146 | 0.02302 |
age:elev | -0.00015 | 0.00004 | -3.4160 | 0.00097 |
R2 = 0.32352
Note that additive coefficients signify the effect of one predictor in the abscence of all others.
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 4.60913 | 0.14630 | 31.5040 | 0.00000 |
age_c | 0.05811 | 0.01176 | 4.9419 | 0.00000 |
elev_c | -0.00068 | 0.00058 | -1.1716 | 0.24460 |
age_c:elev_c | -0.00015 | 0.00004 | -3.4160 | 0.00097 |
R2 = 0.32352
Note that additive coefficients signify the effect of one predictor at the average level of all others.
We can do a lot with the general linear model!
You are only limited by the biological models you can imagine.