Loading [MathJax]/jax/output/CommonHTML/jax.js

class: center, middle

![](images/23/wonka_mult_regression.jpg)
## Multiple Predictor Variables in Linear Models

---
## Models with Multiple Predictors
.large[
1. What is multiple regression?  
  
2. Fitting multiple regression to data. 
  
3. Multicollinearity. 
  
4. Inference with fit models

]

---

## Our Model for Simple Linear Regression

`$$\Large{y_i = \beta_0 + \beta_1  x_i + \epsilon_i}$$`
`$$\Large{\epsilon_i \sim N(0, \sigma)}$$`

--

This corresponds to

<img src="mlr_files/figure-html/unnamed-chunk-1-1.jpeg" style="display: block; margin: auto;" />

---
## But what if...

<img src="mlr_files/figure-html/mlr_dag-1.jpeg" style="display: block; margin: auto;" />

--

.large[
`$$y_i = \beta_0 + \beta_1  x_{1i }+ \beta_2  x_{2i} + \epsilon_i$$`
`$$\epsilon_i \sim N(0, \sigma)$$`
]

---
## A Small Problem: We don't know how X's relate to one another

<img src="mlr_files/figure-html/unnamed-chunk-2-1.jpeg" style="display: block; margin: auto;" />

---
## Assumption of Exogeneity and Low Collinearity

<img src="mlr_files/figure-html/mlr_dag_cov-1.jpeg" style="display: block; margin: auto;" />

--

- **Exogeneity**: Xs are not correlated with e. 
  
- **Low collinearity** `\(\mid r_{x_1x_2} \mid\)` less that ~ 0.7 or 0.8

---
## The Mechanics: Correlation and Partial Correlation

We can always calculate correlations.

`$$r_{xy} = \frac{\sigma{xy}}{\sigma_x \sigma_y}$$`

--

Now we can look at a correlation matrix

`$$\rho_{X_i, Y} = 
  \begin{matrix} 
        & Y & X_1 & X_2 \\
      Y & 1 & 0.2 & 0.5 \\
      X_1 &  & 1 & -0.3 \\
      X_2 &  &  & 1 \\
  \end{matrix}$$`

--

From this, we can calculate partial correlations

---
## Partial Correlations

Answers, what is the correlation between `\(X_i\)` and Y if we remove the portion of `\(X_1\)` correlated with `\(X_2\)`

--

`$$\Large{
r_{yx_1, x_2} = \frac{r_{yx_1} - r_{yx_2}r_{x_1x_2}}{\sqrt{(1-r^2_{x_1x_2})(1-r^2_{yx_2})}}
}$$`

--

- Subtract out the correlation of `\(X_2\)` on `\(Y\)` controlling for the correlation between `\(X_1\)` and `\(X_2\)`

--

- Scale by variability left over after accounting for the same

--

`$$\Large{
\beta_{yx_1, x_2} = \rho_{yx_1, x_2} \frac{\sigma_{x_1}}{\sigma_y}
}$$`

---
## Sums of Squares and Partioning

`$$SS_{total} = SS_{model} + SS_{error}$$`

--

$$\sum{(Y_i - \bar{Y})^2} = \sum{(\hat{Y_i} - \bar{Y})^2} + \sum{(Y_i -\hat{Y_i})^2} $$

--

We are trying to minimize `\(SS_{error}\)` and partition `\(SS_{model}\)` between `\(X_1\)` and `\(X_2\)`

---
## Paritioning Variation
<img src="mlr_files/figure-html/venn_mlr-1.jpeg" style="display: block; margin: auto;" />

.center[Each area is a sums of squares (i.e., amount of variability)]

---
## Paritioning Variation
<img src="mlr_files/figure-html/venn_mlr_x_y-1.jpeg" style="display: block; margin: auto;" />

.center[The variation in X<sub>2</sub> associated with Y]

---
## You can see how collinearity would be a problem
<img src="mlr_files/figure-html/venn_mlr_coll-1.jpeg" style="display: block; margin: auto;" />

.center[
How can we partition this? What is unique?
]

---
## Generalizing 2 Predictors to N Predictors

How do we represent our models that go beyond 2 predictors...

`$$y_i = \beta_0 + \beta_1  x_{1i }+ \beta_2  x_{2i} + \epsilon_i$$`
`$$\epsilon_i \sim N(0, \sigma)$$`

--

### With n predictors

`$$y_i = \beta_0 + \sum_{j = 1}^{K} \beta_1  x_{ij} + \epsilon_i$$`
`$$\epsilon_i \sim N(0, \sigma)$$`

---

## Translating to Matrices: The General Linear Model

For a simple linear regression:
`$$\begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ y_6 \\ y_7 \end{bmatrix}
= 
\begin{bmatrix}1 & x_1  \\1 & x_2  \\1 & x_3  \\1 & x_4  \\1 & x_5  \\1 & x_6 \\ 1 & x_7  \end{bmatrix}
\begin{bmatrix} \beta_0 \\ \beta_1  \end{bmatrix}
+
\begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\
\varepsilon_4 \\ \varepsilon_5 \\ \varepsilon_6 \\ \varepsilon_7 \end{bmatrix}$$`

---

## Multiple Regression in Matrix Form

$$
`\begin{equation*}
\begin{bmatrix}Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix}  = 
\begin{bmatrix}1 & X_{11} & X_{12} & \cdots & X_{1,p-1} \\ 
  1 & X_{21} & X_{22} & \cdots & X_{2,p-1} \\
  \vdots & \vdots & \vdots & & \vdots \\ 
  1 & X_{n1} & X_{n2} & \cdots & X_{n,p-1} \end{bmatrix}  \times 
\begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_{p-1} \end{bmatrix}  + 
\begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix}
\end{equation*}`
$$
--
<br><br><br>

`$$\widehat {\textbf{Y} } = \textbf{X}\beta$$`
`$$\textbf{Y} \sim \mathcal{N}(\widehat {\textbf{Y} }, \Sigma)$$`
---
## The Expansiveness of the General Linear Model
.Large[
`$$\widehat {\textbf{Y} } = \textbf{X}\beta$$`
`$$\textbf{Y} \sim \mathcal{N}(\widehat {\textbf{Y} }, \Sigma)$$`
]

-   This equation is huge. X can be anything - categorical,
    continuous, squared, sine, etc.

-   There can be straight additivity, or interactions

---
## Why Multiple Predictors?

<img src="mlr_files/figure-html/unnamed-chunk-3-1.jpeg" style="display: block; margin: auto;" />

---
## Models with Multiple Predictors
.large[
1. What is multiple regression?  
  
2. .red[Fitting multiple regression to data] 
  
3. Multicollinearity 
  
4. Inference with fit models

]

---
background-image: url("images/23/fires.jpg")
class: center, inverse
background-size: cover

.bottom[ .left[ .small[ Five year study of wildfires & recovery in Southern California shurblands in 1993. 90 plots (20 x 50m)

(data from Jon Keeley et al.)
]]]

---
## What causes species richness?

- Distance from fire patch 
- Elevation
- Abiotic index
- Patch age
- Patch heterogeneity
- Severity of last fire
- Plant cover

---
## Many Things may Influence Species Richness

<img src="mlr_files/figure-html/keeley_pairs-1.jpeg" style="display: block; margin: auto;" />

---
## Our Model

`$$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$$`

`$$\epsilon_i \sim \mathcal{N}(0, \sigma^2)$$`
--

**In R code:**

.large[

```r
klm <- lm(rich ~ cover + firesev + hetero, data=keeley)
```
]

---

## Testing Assumptions

- Data Generating Process: Linearity

--

- Error Generating Process: Normality & homoscedasticity of residuals

--

- Data: Outliers not influencing residuals

--

- Predictors: **Minimal multicollinearity**

---
## Did We Match our Data?

<img src="mlr_files/figure-html/unnamed-chunk-4-1.jpeg" style="display: block; margin: auto;" />

---
## How About That Linearity?

<img src="mlr_files/figure-html/unnamed-chunk-5-1.jpeg" style="display: block; margin: auto;" />

---
## OK, Normality of Residuals?
<img src="mlr_files/figure-html/unnamed-chunk-6-1.jpeg" style="display: block; margin: auto;" />

---
## OK, Normality of qResiduals?
<img src="mlr_files/figure-html/unnamed-chunk-7-1.jpeg" style="display: block; margin: auto;" />

---
## No Heteroskedasticity?
<img src="mlr_files/figure-html/unnamed-chunk-8-1.jpeg" style="display: block; margin: auto;" />

---
## Outliers?
<img src="mlr_files/figure-html/unnamed-chunk-9-1.jpeg" style="display: block; margin: auto;" />

---
class: center, middle
## 
![](./images/23/gosling_multicollinearity.jpg)

---
## Models with Multiple Predictors
.large[
1. What is multiple regression?  
  
2. Fitting multiple regression to data. 
  
3. .red[Multicollinearity] 
  
4. Inference with fit models

]

---
## Why Worry about Multicollinearity?

- Adding more predictors decreases precision of estimates

--

- If predictors are too collinear, can lead to difficulty fitting model

--

- If predictors are too collinear, can inflate SE of estimates further

--

- If predictors are too collinear, are we *really* getting **unique information**

---

## Checking for Multicollinearity: Correlation Matrices

```
             cover     firesev      hetero
cover    1.0000000 -0.43713460 -0.16837784
firesev -0.4371346  1.00000000 -0.05235518
hetero  -0.1683778 -0.05235518  1.00000000
```

- Correlations over 0.4 can
be problematic, but, meh, they may be OK even as high as 0.8.   
  
- To be sure, we should look at how badly they change the SE around predictors. 
     - How much do they **inflate variance** and harm our precision
     
---
## Checking for Multicollinearity: Variance Inflation Factor

Consider our model:

`$$y = \beta_{0} + \beta_{1}x_{1}  + \beta_{2}x_{2} + \epsilon$$`

--

We can also model:

`$$X_{1} = \alpha_{0} + \alpha_{2}x_{2} + \epsilon_j$$`

The variance of `\(X_1\)` associated with other predictors is `\(R^{2}_1\)`

--

In MLR, the variance around our parameter estimate (square of SE) is:

`$$var(\beta_{1}) = \frac{\sigma^2}{(n-1)\sigma^2_{X_1}}\frac{1}{1-R^{2}_1}$$`

--

The second term in that equation is the **Variance Inflation Parameter**

`$$VIF = \frac{1}{1-R^2_{1}}$$`

---
## Checking for Multicollinearity: Variance Inflation Factor
`$$VIF_1 = \frac{1}{1-R^2_{1}}$$`

<img src="mlr_files/figure-html/klm_vif-1.jpeg" style="display: block; margin: auto;" />

VIF `\(>\)` 5 or 10 can be problematic and indicate an unstable solution.

---
## What Do We Do with High Collinearity?

- Cry.

--

- Evaluate **why**

--

- Can drop a predictor if information is redundant

--

- Can combine predictors into an index

- Add them? Or other combination.

- PCA for orthogonal axes

- Factor analysis to compress into one variable

---
## Models with Multiple Predictors
.large[
1. What is multiple regression?  
  
2. Fitting multiple regression to data. 
  
3. Multicollinearity. 
  
4. .red[Inference with fit models]

]

---
## What does it all mean: the coefficients
`$$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$$`

<table class="table" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> 1.68 </td>
   <td style="text-align:right;"> 10.67 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> cover </td>
   <td style="text-align:right;"> 15.56 </td>
   <td style="text-align:right;"> 4.49 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> firesev </td>
   <td style="text-align:right;"> -1.82 </td>
   <td style="text-align:right;"> 0.85 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> hetero </td>
   <td style="text-align:right;"> 65.99 </td>
   <td style="text-align:right;"> 11.17 </td>
  </tr>
</tbody>
</table>

- `\(\beta_0\)` - the intercept -  is the # of species when **all other predictors are 0**  
    - Note the very large SE  
    
--

- All other `\(\beta\)`s are the effect of a 1 unit increase on # of species  
     - They are **not** on the same scale
     - They are each in the scale of species per unit of individual x

---
## Comparing Coefficients on the Same Scale

`$$r_{xy} = b_{xy}\frac{sd_{x}}{sd_{y}}$$`

```
# Standardization method: basic

Parameter   | Std. Coef. |         95% CI
-----------------------------------------
(Intercept) |       0.00 | [ 0.00,  0.00]
cover       |       0.33 | [ 0.14,  0.51]
firesev     |      -0.20 | [-0.38, -0.01]
hetero      |       0.50 | [ 0.33,  0.67]
```

--

- For linear model, makes intuitive sense to compare strength of association

- Note, this is Pearson's correlation, so, it's in units of `\(sd_y/sd_x\)`

---
## How Much Variation is Associated with the Predictors
<table class="table" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> r.squared </th>
   <th style="text-align:right;"> adj.r.squared </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0.41 </td>
   <td style="text-align:right;"> 0.39 </td>
  </tr>
</tbody>
</table>

- 41% of the variation in # of species is associated with the predictors

--

- Note that this is **all model**, not individual predictors

--
- `\(R_{adj}^2 = 1 - \frac{(1-R^2)(n-1)} {n-k-1}\)`
     - Scales fit by model complexity
     - If we add more terms, but don't increase `\(R^2\)`, it can go down

---
## So, Uh, How Do We Visualize This?

<img src="mlr_files/figure-html/klm_see_effects-1.jpeg" style="display: block; margin: auto;" />

---
## Visualization Strategies for Multivariate Models

- Plot the effect of each variable holding the other variables constant  
     - Mean, Median, 0
     - Or your choice!

- Plot **counterfactual scenarios** from model
     - Can match data (and be shown as such)
     - Can bexplore the response surface

---
## Added Variable Plot to Show Unique Contributions when Holding Others at 0
<img src="mlr_files/figure-html/klm_avplot-1.jpeg" style="display: block; margin: auto;" />

---
## Plots at Median of Other Variables

<img src="mlr_files/figure-html/klm_visreg-1.jpeg" style="display: block; margin: auto;" />

---
## Counterfactual Predictions Overlaid on Data

<img src="mlr_files/figure-html/crazy-1.jpeg" style="display: block; margin: auto;" />

---
## Counterfactual Surfaces at Means of Other Variables
<img src="mlr_files/figure-html/unnamed-chunk-11-1.jpeg" style="display: block; margin: auto;" />

---
class:center, middle

![](images/23/matrix_regression.jpg)

Notes for current slide

Notes for next slide

Multiple Predictor Variables in Linear Models

1 / 47

Models with Multiple Predictors

What is multiple regression?
Fitting multiple regression to data.
Multicollinearity.
Inference with fit models

2 / 47

Our Model for Simple Linear Regression

$\Large{y_i = \beta_0 + \beta_1 x_i + \epsilon_i}$ $\Large{\epsilon_i \sim N(0, \sigma)}$

3 / 47

Our Model for Simple Linear Regression

$\Large{y_i = \beta_0 + \beta_1 x_i + \epsilon_i}$ $\Large{\epsilon_i \sim N(0, \sigma)}$

This corresponds to

3 / 47

But what if...

4 / 47

But what if...

$y_i = \beta_0 + \beta_1 x_{1i }+ \beta_2 x_{2i} + \epsilon_i$ $\epsilon_i \sim N(0, \sigma)$

4 / 47

A Small Problem: We don't know how X's relate to one another

5 / 47

Assumption of Exogeneity and Low Collinearity

6 / 47

Assumption of Exogeneity and Low Collinearity

Exogeneity: Xs are not correlated with e.
Low collinearity $\mid r_{x_1x_2} \mid$ less that ~ 0.7 or 0.8

6 / 47

The Mechanics: Correlation and Partial Correlation

We can always calculate correlations.

$r_{xy} = \frac{\sigma{xy}}{\sigma_x \sigma_y}$

7 / 47

The Mechanics: Correlation and Partial Correlation

We can always calculate correlations.

$r_{xy} = \frac{\sigma{xy}}{\sigma_x \sigma_y}$

Now we can look at a correlation matrix

$\rho_{X_i, Y} = \begin{matrix} & Y & X_1 & X_2 \\ Y & 1 & 0.2 & 0.5 \\ X_1 & & 1 & -0.3 \\ X_2 & & & 1 \\ \end{matrix}$

7 / 47

The Mechanics: Correlation and Partial Correlation

We can always calculate correlations.

$r_{xy} = \frac{\sigma{xy}}{\sigma_x \sigma_y}$

Now we can look at a correlation matrix

$\rho_{X_i, Y} = \begin{matrix} & Y & X_1 & X_2 \\ Y & 1 & 0.2 & 0.5 \\ X_1 & & 1 & -0.3 \\ X_2 & & & 1 \\ \end{matrix}$

From this, we can calculate partial correlations

7 / 47

Partial Correlations

Answers, what is the correlation between $X_i$ and Y if we remove the portion of $X_1$ correlated with $X_2$

8 / 47

Partial Correlations

Answers, what is the correlation between $X_i$ and Y if we remove the portion of $X_1$ correlated with $X_2$

$\Large{ r_{yx_1, x_2} = \frac{r_{yx_1} - r_{yx_2}r_{x_1x_2}}{\sqrt{(1-r^2_{x_1x_2})(1-r^2_{yx_2})}} }$

8 / 47

Partial Correlations

Answers, what is the correlation between $X_i$ and Y if we remove the portion of $X_1$ correlated with $X_2$

$\Large{ r_{yx_1, x_2} = \frac{r_{yx_1} - r_{yx_2}r_{x_1x_2}}{\sqrt{(1-r^2_{x_1x_2})(1-r^2_{yx_2})}} }$

Subtract out the correlation of $X_2$ on $Y$ controlling for the correlation between $X_1$ and $X_2$

8 / 47

Partial Correlations

Answers, what is the correlation between $X_i$ and Y if we remove the portion of $X_1$ correlated with $X_2$

$\Large{ r_{yx_1, x_2} = \frac{r_{yx_1} - r_{yx_2}r_{x_1x_2}}{\sqrt{(1-r^2_{x_1x_2})(1-r^2_{yx_2})}} }$

Subtract out the correlation of $X_2$ on $Y$ controlling for the correlation between $X_1$ and $X_2$
Scale by variability left over after accounting for the same

8 / 47

Partial Correlations

Answers, what is the correlation between $X_i$ and Y if we remove the portion of $X_1$ correlated with $X_2$

$\Large{ r_{yx_1, x_2} = \frac{r_{yx_1} - r_{yx_2}r_{x_1x_2}}{\sqrt{(1-r^2_{x_1x_2})(1-r^2_{yx_2})}} }$

Subtract out the correlation of $X_2$ on $Y$ controlling for the correlation between $X_1$ and $X_2$
Scale by variability left over after accounting for the same

$\Large{ \beta_{yx_1, x_2} = \rho_{yx_1, x_2} \frac{\sigma_{x_1}}{\sigma_y} }$

8 / 47

Sums of Squares and Partioning

$SS_{total} = SS_{model} + SS_{error}$

9 / 47

Sums of Squares and Partioning

$SS_{total} = SS_{model} + SS_{error}$

$\sum{(Y_i - \bar{Y})^2} = \sum{(\hat{Y_i} - \bar{Y})^2} + \sum{(Y_i -\hat{Y_i})^2}$

9 / 47

Sums of Squares and Partioning

$SS_{total} = SS_{model} + SS_{error}$

$\sum{(Y_i - \bar{Y})^2} = \sum{(\hat{Y_i} - \bar{Y})^2} + \sum{(Y_i -\hat{Y_i})^2}$

We are trying to minimize $SS_{error}$ and partition $SS_{model}$ between $X_1$ and $X_2$

9 / 47

Paritioning Variation

Each area is a sums of squares (i.e., amount of variability)

10 / 47

Paritioning Variation

The variation in X₂ associated with Y

11 / 47

You can see how collinearity would be a problem

How can we partition this? What is unique?

12 / 47

Generalizing 2 Predictors to N Predictors

How do we represent our models that go beyond 2 predictors...

$y_i = \beta_0 + \beta_1 x_{1i }+ \beta_2 x_{2i} + \epsilon_i$ $\epsilon_i \sim N(0, \sigma)$

13 / 47

Generalizing 2 Predictors to N Predictors

How do we represent our models that go beyond 2 predictors...

$y_i = \beta_0 + \beta_1 x_{1i }+ \beta_2 x_{2i} + \epsilon_i$ $\epsilon_i \sim N(0, \sigma)$

With n predictors

$y_i = \beta_0 + \sum_{j = 1}^{K} \beta_1 x_{ij} + \epsilon_i$ $\epsilon_i \sim N(0, \sigma)$

13 / 47

Translating to Matrices: The General Linear Model

For a simple linear regression: $\begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ y_6 \\ y_7 \end{bmatrix} = \begin{bmatrix}1 & x_1 \\1 & x_2 \\1 & x_3 \\1 & x_4 \\1 & x_5 \\1 & x_6 \\ 1 & x_7 \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \\ \varepsilon_5 \\ \varepsilon_6 \\ \varepsilon_7 \end{bmatrix}$

14 / 47

Multiple Regression in Matrix Form

$\begin{equation*} \begin{bmatrix}Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix} = \begin{bmatrix}1 & X_{11} & X_{12} & \cdots & X_{1,p-1} \\ 1 & X_{21} & X_{22} & \cdots & X_{2,p-1} \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & X_{n1} & X_{n2} & \cdots & X_{n,p-1} \end{bmatrix} \times \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_{p-1} \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix} \end{equation*}$

15 / 47

Multiple Regression in Matrix Form

$\begin{equation*} \begin{bmatrix}Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix} = \begin{bmatrix}1 & X_{11} & X_{12} & \cdots & X_{1,p-1} \\ 1 & X_{21} & X_{22} & \cdots & X_{2,p-1} \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & X_{n1} & X_{n2} & \cdots & X_{n,p-1} \end{bmatrix} \times \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_{p-1} \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix} \end{equation*}$

$\widehat {\textbf{Y} } = \textbf{X}\beta$ $\textbf{Y} \sim \mathcal{N}(\widehat {\textbf{Y} }, \Sigma)$

15 / 47

The Expansiveness of the General Linear Model

$\widehat {\textbf{Y} } = \textbf{X}\beta$ $\textbf{Y} \sim \mathcal{N}(\widehat {\textbf{Y} }, \Sigma)$

This equation is huge. X can be anything - categorical, continuous, squared, sine, etc.
There can be straight additivity, or interactions

16 / 47

Why Multiple Predictors?

17 / 47

Models with Multiple Predictors

What is multiple regression?
Fitting multiple regression to data
Multicollinearity
Inference with fit models

18 / 47

Five year study of wildfires & recovery in Southern California shurblands in 1993. 90 plots (20 x 50m)

(data from Jon Keeley et al.)

19 / 47

What causes species richness?Distance from fire patch 
Elevation
Abiotic index
Patch age
Patch heterogeneity
Severity of last fire
Plant cover
20 / 47

Many Things may Influence Species Richness

21 / 47

Our Model

$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$

$\epsilon_i \sim \mathcal{N}(0, \sigma^2)$

22 / 47

Our Model

$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$

$\epsilon_i \sim \mathcal{N}(0, \sigma^2)$ In R code:

klm <- lm(rich ~ cover + firesev + hetero, data=keeley)

22 / 47

Testing AssumptionsData Generating Process: Linearity 
23 / 47

Testing Assumptions

Data Generating Process: Linearity
Error Generating Process: Normality & homoscedasticity of residuals

23 / 47

Testing Assumptions

Data Generating Process: Linearity
Error Generating Process: Normality & homoscedasticity of residuals
Data: Outliers not influencing residuals

23 / 47

Testing Assumptions

Data Generating Process: Linearity
Error Generating Process: Normality & homoscedasticity of residuals
Data: Outliers not influencing residuals
Predictors: Minimal multicollinearity

23 / 47

Did We Match our Data?

24 / 47

How About That Linearity?

25 / 47

OK, Normality of Residuals?

26 / 47

OK, Normality of qResiduals?

27 / 47

No Heteroskedasticity?

28 / 47

Outliers?

29 / 47

30 / 47

Models with Multiple Predictors

What is multiple regression?
Fitting multiple regression to data.
Multicollinearity
Inference with fit models

31 / 47

Why Worry about Multicollinearity?Adding more predictors decreases precision of estimates  
32 / 47

Why Worry about Multicollinearity?

Adding more predictors decreases precision of estimates
If predictors are too collinear, can lead to difficulty fitting model

32 / 47

Why Worry about Multicollinearity?

Adding more predictors decreases precision of estimates
If predictors are too collinear, can lead to difficulty fitting model
If predictors are too collinear, can inflate SE of estimates further

32 / 47

Why Worry about Multicollinearity?

Adding more predictors decreases precision of estimates
If predictors are too collinear, can lead to difficulty fitting model
If predictors are too collinear, can inflate SE of estimates further
If predictors are too collinear, are we really getting unique information

32 / 47

Checking for Multicollinearity: Correlation Matrices

             cover     firesev      hetero
cover    1.0000000 -0.43713460 -0.16837784
firesev -0.4371346  1.00000000 -0.05235518
hetero  -0.1683778 -0.05235518  1.00000000

Correlations over 0.4 can be problematic, but, meh, they may be OK even as high as 0.8.
To be sure, we should look at how badly they change the SE around predictors.
- How much do they inflate variance and harm our precision

33 / 47

Checking for Multicollinearity: Variance Inflation Factor

Consider our model:

$y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \epsilon$

34 / 47

Checking for Multicollinearity: Variance Inflation Factor

Consider our model:

$y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \epsilon$

We can also model:

$X_{1} = \alpha_{0} + \alpha_{2}x_{2} + \epsilon_j$

The variance of $X_1$ associated with other predictors is $R^{2}_1$

34 / 47

Checking for Multicollinearity: Variance Inflation Factor

Consider our model:

$y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \epsilon$

We can also model:

$X_{1} = \alpha_{0} + \alpha_{2}x_{2} + \epsilon_j$

The variance of $X_1$ associated with other predictors is $R^{2}_1$

In MLR, the variance around our parameter estimate (square of SE) is:

$var(\beta_{1}) = \frac{\sigma^2}{(n-1)\sigma^2_{X_1}}\frac{1}{1-R^{2}_1}$

34 / 47

Checking for Multicollinearity: Variance Inflation Factor

Consider our model:

$y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \epsilon$

We can also model:

$X_{1} = \alpha_{0} + \alpha_{2}x_{2} + \epsilon_j$

The variance of $X_1$ associated with other predictors is $R^{2}_1$

In MLR, the variance around our parameter estimate (square of SE) is:

$var(\beta_{1}) = \frac{\sigma^2}{(n-1)\sigma^2_{X_1}}\frac{1}{1-R^{2}_1}$

The second term in that equation is the Variance Inflation Parameter

$VIF = \frac{1}{1-R^2_{1}}$

34 / 47

Checking for Multicollinearity: Variance Inflation Factor

$VIF_1 = \frac{1}{1-R^2_{1}}$

VIF $>$ 5 or 10 can be problematic and indicate an unstable solution.

35 / 47

What Do We Do with High Collinearity?Cry.  
36 / 47

What Do We Do with High Collinearity?

Cry.
Evaluate why

36 / 47

What Do We Do with High Collinearity?

Cry.
Evaluate why

Can drop a predictor if information is redundant

36 / 47

What Do We Do with High Collinearity?

Cry.
Evaluate why

Can drop a predictor if information is redundant
Can combine predictors into an index
- Add them? Or other combination.
- PCA for orthogonal axes
- Factor analysis to compress into one variable

36 / 47

Models with Multiple Predictors

What is multiple regression?
Fitting multiple regression to data.
Multicollinearity.
Inference with fit models

37 / 47

What does it all mean: the coefficients

$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$

term	estimate	std.error
(Intercept)	1.68	10.67
cover	15.56	4.49
firesev	-1.82	0.85
hetero	65.99	11.17

β0 - the intercept - is the # of species when all other predictors are 0
- Note the very large SE

38 / 47

What does it all mean: the coefficients

$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$

term	estimate	std.error
(Intercept)	1.68	10.67
cover	15.56	4.49
firesev	-1.82	0.85
hetero	65.99	11.17

$\beta_0$ - the intercept - is the # of species when all other predictors are 0
- Note the very large SE
All other $\beta$ s are the effect of a 1 unit increase on # of species
- They are not on the same scale
- They are each in the scale of species per unit of individual x

38 / 47

Comparing Coefficients on the Same Scale

$r_{xy} = b_{xy}\frac{sd_{x}}{sd_{y}}$

# Standardization method: basic
Parameter   | Std. Coef. |         95% CI
-----------------------------------------
(Intercept) |       0.00 | [ 0.00,  0.00]
cover       |       0.33 | [ 0.14,  0.51]
firesev     |      -0.20 | [-0.38, -0.01]
hetero      |       0.50 | [ 0.33,  0.67]

39 / 47

Comparing Coefficients on the Same Scale

$r_{xy} = b_{xy}\frac{sd_{x}}{sd_{y}}$

# Standardization method: basic
Parameter   | Std. Coef. |         95% CI
-----------------------------------------
(Intercept) |       0.00 | [ 0.00,  0.00]
cover       |       0.33 | [ 0.14,  0.51]
firesev     |      -0.20 | [-0.38, -0.01]
hetero      |       0.50 | [ 0.33,  0.67]

For linear model, makes intuitive sense to compare strength of association

Note, this is Pearson's correlation, so, it's in units of $sd_y/sd_x$

39 / 47

How Much Variation is Associated with the Predictors
 
    r.squared 
    adj.r.squared 
  
    0.41 
    0.39 
  
41% of the variation in # of species is associated with the predictors  
40 / 47

How Much Variation is Associated with the Predictors

r.squared	adj.r.squared
0.41	0.39

41% of the variation in # of species is associated with the predictors
Note that this is all model, not individual predictors

40 / 47

How Much Variation is Associated with the Predictors

r.squared	adj.r.squared
0.41	0.39

41% of the variation in # of species is associated with the predictors
Note that this is all model, not individual predictors
R2adj=1−(1−R2)(n−1)n−k−1
- Scales fit by model complexity
- If we add more terms, but don't increase $R^2$ , it can go down

40 / 47

So, Uh, How Do We Visualize This?

41 / 47

Visualization Strategies for Multivariate ModelsPlot the effect of each variable holding the other variables constant  Mean, Median, 0
Or your choice!  

Plot counterfactual scenarios from modelCan match data (and be shown as such)
Can bexplore the response surface

42 / 47

Added Variable Plot to Show Unique Contributions when Holding Others at 0

43 / 47

Plots at Median of Other Variables

44 / 47

Counterfactual Predictions Overlaid on Data

45 / 47

Counterfactual Surfaces at Means of Other Variables

46 / 47

47 / 47

Models with Multiple Predictors

What is multiple regression?
Fitting multiple regression to data.
Multicollinearity.
Inference with fit models

2 / 47

Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Esc	Back to slideshow