Multiple Linear Regression

class: center, middle

![](images/23/wonka_mult_regression.jpg)
## Multiple Predictor Variables in Linear Models

---
## Models with Multiple Predictors
.large[
1. What is multiple regression?  
  
2. Fitting multiple regression to data. 
  
3. Multicollinearity. 
  
4. Inference with fit models

]

---

## Our Model for Simple Linear Regression

`$$\Large{y_i = \beta_0 + \beta_1  x_i + \epsilon_i}$$`
`$$\Large{\epsilon_i \sim N(0, \sigma)}$$`

This corresponds to

---
## But what if...

.large[
`$$y_i = \beta_0 + \beta_1  x_{1i }+ \beta_2  x_{2i} + \epsilon_i$$`
`$$\epsilon_i \sim N(0, \sigma)$$`
]

---
## A Small Problem: We don't know how X's relate to one another

---
## Assumption of Exogeneity and Low Collinearity

- **Exogeneity**: Xs are not correlated with e. 
  
- **Low collinearity** `$\mid r_{x_1x_2} \mid$` less that ~ 0.7 or 0.8

---
## The Mechanics: Correlation and Partial Correlation

We can always calculate correlations.

`$$r_{xy} = \frac{\sigma{xy}}{\sigma_x \sigma_y}$$`

Now we can look at a correlation matrix

`$$\rho_{X_i, Y} = 
  \begin{matrix} 
        & Y & X_1 & X_2 \\
      Y & 1 & 0.2 & 0.5 \\
      X_1 &  & 1 & -0.3 \\
      X_2 &  &  & 1 \\
  \end{matrix}$$`

From this, we can calculate partial correlations

---
## Partial Correlations

Answers, what is the correlation between `$X_i$` and Y if we remove the portion of `$X_1$` correlated with `$X_2$`

`$$\Large{
r_{yx_1, x_2} = \frac{r_{yx_1} - r_{yx_2}r_{x_1x_2}}{\sqrt{(1-r^2_{x_1x_2})(1-r^2_{yx_2})}}
}$$`

- Subtract out the correlation of `$X_2$` on `$Y$` controlling for the correlation between `$X_1$` and `$X_2$`

- Scale by variability left over after accounting for the same

`$$\Large{
\beta_{yx_1, x_2} = \rho_{yx_1, x_2} \frac{\sigma_{x_1}}{\sigma_y}
}$$`

---
## Sums of Squares and Partioning

`$$SS_{total} = SS_{model} + SS_{error}$$`

$$\sum{(Y_i - \bar{Y})^2} = \sum{(\hat{Y_i} - \bar{Y})^2} + \sum{(Y_i -\hat{Y_i})^2} $$

We are trying to minimize `$SS_{error}$` and partition `$SS_{model}$` between `$X_1$` and `$X_2$`

---
## Paritioning Variation
<img src="mlr_files/figure-html/venn_mlr-1.jpeg" style="display: block; margin: auto;" />

.center[Each area is a sums of squares (i.e., amount of variability)]

---
## Paritioning Variation
<img src="mlr_files/figure-html/venn_mlr_x_y-1.jpeg" style="display: block; margin: auto;" />

.center[The variation in X<sub>2</sub> associated with Y]

---
## You can see how collinearity would be a problem
<img src="mlr_files/figure-html/venn_mlr_coll-1.jpeg" style="display: block; margin: auto;" />

.center[
How can we partition this? What is unique?
]

---
## Generalizing 2 Predictors to N Predictors

How do we represent our models that go beyond 2 predictors...

`$$y_i = \beta_0 + \beta_1  x_{1i }+ \beta_2  x_{2i} + \epsilon_i$$`
`$$\epsilon_i \sim N(0, \sigma)$$`

### With K predictors

`$$y_i = \beta_0 + \sum_{j = 1}^{K} \beta_j  x_{ij} + \epsilon_i$$`
`$$\epsilon_i \sim N(0, \sigma)$$`

---

## Translating to Matrices: The General Linear Model

For a simple linear regression:
`$$\begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ y_6 \\ y_7 \end{bmatrix}
= 
\begin{bmatrix}1 & x_1  \\1 & x_2  \\1 & x_3  \\1 & x_4  \\1 & x_5  \\1 & x_6 \\ 1 & x_7  \end{bmatrix}
\begin{bmatrix} \beta_0 \\ \beta_1  \end{bmatrix}
+
\begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\
\varepsilon_4 \\ \varepsilon_5 \\ \varepsilon_6 \\ \varepsilon_7 \end{bmatrix}$$`

---

## Multiple Regression in Matrix Form

$$
`\begin{equation*}
\begin{bmatrix}Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix}  = 
\begin{bmatrix}1 & X_{11} & X_{12} & \cdots & X_{1,p-1} \\ 
  1 & X_{21} & X_{22} & \cdots & X_{2,p-1} \\
  \vdots & \vdots & \vdots & & \vdots \\ 
  1 & X_{n1} & X_{n2} & \cdots & X_{n,p-1} \end{bmatrix}  \times 
\begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_{p-1} \end{bmatrix}  + 
\begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix}
\end{equation*}`
$$
--
<br><br><br>

`$$\widehat {\textbf{Y} } = \textbf{X}\beta$$`
`$$\textbf{Y} \sim \mathcal{N}(\widehat {\textbf{Y} }, \Sigma)$$`
---
## The Expansiveness of the General Linear Model
.Large[
`$$\widehat {\textbf{Y} } = \textbf{X}\beta$$`
`$$\textbf{Y} \sim \mathcal{N}(\widehat {\textbf{Y} }, \Sigma)$$`
]

-   This equation is huge. X can be anything - categorical,
    continuous, squared, sine, etc.

-   There can be straight additivity, or interactions

---
## Why Multiple Predictors?

---
## Supression and Simpson's Paradox

Frog Density = 8 * Temperature + -10 * Pond Depth + 5 + `$\epsilon_1$`  
Temperature = 1* Pond Depth + 10 + `$\epsilon_2$`

---
## Supression and Simpson's Paradox

---
## Supression and Simpson's Paradox

---
## Models with Multiple Predictors
.large[
1. What is multiple regression?  
  
2. .red[Fitting multiple regression to data] 
  
3. Multicollinearity 
  
4. Inference with fit models

]

---
background-image: url("images/23/fires.jpg")
class: center, inverse
background-size: cover

.bottom[ .left[ .small[ Five year study of wildfires & recovery in Southern California shurblands in 1993. 90 plots (20 x 50m)

(data from Jon Keeley et al.)
]]]

---
## What causes species richness?

- Distance from fire patch 
- Elevation
- Abiotic index
- Patch age
- Patch heterogeneity
- Severity of last fire
- Plant cover

---
## Many Things may Influence Species Richness

---
## Our Model

`$$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$$`

`$$\epsilon_i \sim \mathcal{N}(0, \sigma^2)$$`
--

**In R code:**

.large[

``` r
klm <- lm(rich ~ cover + firesev + hetero, data=keeley)
```
]

---

## Testing Assumptions

- Data Generating Process: Linearity

- Error Generating Process: Normality & homoscedasticity of residuals

- Data: Outliers not influencing residuals

- Predictors: **Minimal multicollinearity**

---
## Did We Match our Data?

---
## How About That Linearity?

---
## OK, Normality of Residuals?
<img src="mlr_files/figure-html/unnamed-chunk-6-1.jpeg" style="display: block; margin: auto;" />

---
## OK, Normality of qResiduals?
<img src="mlr_files/figure-html/unnamed-chunk-7-1.jpeg" style="display: block; margin: auto;" />

---
## No Heteroskedasticity?
<img src="mlr_files/figure-html/unnamed-chunk-8-1.jpeg" style="display: block; margin: auto;" />

---
## Outliers?
<img src="mlr_files/figure-html/unnamed-chunk-9-1.jpeg" style="display: block; margin: auto;" />

---
class: center, middle
## 
![](./images/23/gosling_multicollinearity.jpg)

---
## Models with Multiple Predictors
.large[
1. What is multiple regression?  
  
2. Fitting multiple regression to data. 
  
3. .red[Multicollinearity] 
  
4. Inference with fit models

]

---
## Why Worry about Multicollinearity?

- Adding more predictors decreases precision of estimates

- If predictors are too collinear, can lead to difficulty fitting model

- If predictors are too collinear, can inflate SE of estimates further

- If predictors are too collinear, are we *really* getting **unique information**

---

## Checking for Multicollinearity: Correlation Matrices

```
             cover     firesev      hetero
cover    1.0000000 -0.43713460 -0.16837784
firesev -0.4371346  1.00000000 -0.05235518
hetero  -0.1683778 -0.05235518  1.00000000
```

- Correlations over 0.4 can
be problematic, but, meh, they may be OK even as high as 0.8.   
  
- To be sure, we should look at how badly they change the SE around predictors. 
     - How much do they **inflate variance** and harm our precision
     
---
## Checking for Multicollinearity: Variance Inflation Factor

Consider our model:

`$$y = \beta_{0} + \beta_{1}x_{1}  + \beta_{2}x_{2} + \epsilon$$`

We can also model:

`$$X_{1} = \alpha_{0} + \alpha_{2}x_{2} + \epsilon_j$$`

The variance of `$X_1$` associated with other predictors is `$R^{2}_1$`

In MLR, the variance around our parameter estimate (square of SE) is:

`$$var(\beta_{1}) = \frac{\sigma^2}{(n-1)\sigma^2_{X_1}}\frac{1}{1-R^{2}_1}$$`

The second term in that equation is the **Variance Inflation Parameter**

`$$VIF = \frac{1}{1-R^2_{1}}$$`

---
## Checking for Multicollinearity: Variance Inflation Factor
`$$VIF_1 = \frac{1}{1-R^2_{1}}$$`

VIF `$>$` 5 or 10 can be problematic and indicate an unstable solution.

---
## What Do We Do with High Collinearity?

- Cry.

- Evaluate **why**

- Can drop a predictor if information is redundant

- Can combine predictors into an index

- Add them? Or other combination.

- PCA for orthogonal axes

- Factor analysis to compress into one variable

---
## Models with Multiple Predictors
.large[
1. What is multiple regression?  
  
2. Fitting multiple regression to data. 
  
3. Multicollinearity. 
  
4. .red[Inference with fit models]

]

---
## What does it all mean: the coefficients
`$$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$$`

<table class="table" style="color: black; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> 1.68 </td>
   <td style="text-align:right;"> 10.67 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> cover </td>
   <td style="text-align:right;"> 15.56 </td>
   <td style="text-align:right;"> 4.49 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> firesev </td>
   <td style="text-align:right;"> -1.82 </td>
   <td style="text-align:right;"> 0.85 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> hetero </td>
   <td style="text-align:right;"> 65.99 </td>
   <td style="text-align:right;"> 11.17 </td>
  </tr>
</tbody>
</table>

- `$\beta_0$` - the intercept -  is the # of species when **all other predictors are 0**  
    - Note the very large SE  
    
---
## What does it all mean: the coefficients
`$$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$$`

- All other `$\beta$`s are the effect of a 1 unit increase on # of species  
     - They are **not** on the same scale
     - They are each in the scale of species per unit of individual x

---
## Comparing Coefficients on the Same Scale

`$$r_{xy} = b_{xy}\frac{sd_{x}}{sd_{y}}$$`

```
# Standardization method: basic

Parameter   | Std. Coef. |         95% CI
-----------------------------------------
(Intercept) |       0.00 | [ 0.00,  0.00]
cover       |       0.33 | [ 0.14,  0.51]
firesev     |      -0.20 | [-0.38, -0.01]
hetero      |       0.50 | [ 0.33,  0.67]
```

- For linear model, makes intuitive sense to compare strength of association

- Note, this is Pearson's correlation, so, it's in units of `$sd_y/sd_x$`

---
## How Much Variation is Associated with the Predictors
<table class="table" style="color: black; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> r.squared </th>
   <th style="text-align:right;"> adj.r.squared </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0.41 </td>
   <td style="text-align:right;"> 0.39 </td>
  </tr>
</tbody>
</table>

- 41% of the variation in # of species is associated with the predictors

- Note that this is **all model**, not individual predictors

--
- `$R_{adj}^2 = 1 - \frac{(1-R^2)(n-1)} {n-k-1}$`
     - Scales fit by model complexity
     - If we add more terms, but don't increase `$R^2$`, it can go down

---
## So, Uh, How Do We Visualize This?

---
## Visualization Strategies for Multivariate Models

- Show the added effect of one variable after accounting for all others
     - Classic added-variable plot
     - On scales of contribution after detrending others
  
--

- Plot the effect of each variable holding the other variables constant  
     - Mean, Median, 0
     - Or your choice!

- Plot **counterfactual scenarios** from model
     - Can match data (and be shown as such)
     - Can bexplore the response surface

---
## Added Variable Plot to Show Unique Contributions when Holding Others at 0
<img src="mlr_files/figure-html/klm_avplot-1.jpeg" style="display: block; margin: auto;" />

---
## Plots at Median of Other Variables

---
## Counterfactual Predictions Overlaid on Data

---
## Counterfactual Surfaces at Means of Other Variables
<img src="mlr_files/figure-html/unnamed-chunk-11-1.jpeg" style="display: block; margin: auto;" />

---
class:center, middle

![](images/23/matrix_regression.jpg)