class: center, middle ![](images/23/wonka_mult_regression.jpg) ## Multiple Predictor Variables in Linear Models --- ## Models with Multiple Predictors .large[ 1. What is multiple regression? 2. Fitting multiple regression to data. 3. Multicollinearity. 4. Inference with fit models ] --- ## Our Model for Simple Linear Regression `$$\Large{y_i = \beta_0 + \beta_1 x_i + \epsilon_i}$$` `$$\Large{\epsilon_i \sim N(0, \sigma)}$$` -- This corresponds to <img src="mlr_files/figure-html/unnamed-chunk-1-1.jpeg" style="display: block; margin: auto;" /> --- ## But what if... <img src="mlr_files/figure-html/mlr_dag-1.jpeg" style="display: block; margin: auto;" /> -- .large[ `$$y_i = \beta_0 + \beta_1 x_{1i }+ \beta_2 x_{2i} + \epsilon_i$$` `$$\epsilon_i \sim N(0, \sigma)$$` ] --- ## A Small Problem: We don't know how X's relate to one another <img src="mlr_files/figure-html/unnamed-chunk-2-1.jpeg" style="display: block; margin: auto;" /> --- ## Assumption of Exogeneity and Low Collinearity <img src="mlr_files/figure-html/mlr_dag_cov-1.jpeg" style="display: block; margin: auto;" /> -- - **Exogeneity**: Xs are not correlated with e. - **Low collinearity** `\(\mid r_{x_1x_2} \mid\)` less that ~ 0.7 or 0.8 --- ## The Mechanics: Correlation and Partial Correlation We can always calculate correlations. `$$r_{xy} = \frac{\sigma{xy}}{\sigma_x \sigma_y}$$` -- Now we can look at a correlation matrix `$$\rho_{X_i, Y} = \begin{matrix} & Y & X_1 & X_2 \\ Y & 1 & 0.2 & 0.5 \\ X_1 & & 1 & -0.3 \\ X_2 & & & 1 \\ \end{matrix}$$` -- From this, we can calculate partial correlations --- ## Partial Correlations Answers, what is the correlation between `\(X_i\)` and Y if we remove the portion of `\(X_1\)` correlated with `\(X_2\)` -- `$$\Large{ r_{yx_1, x_2} = \frac{r_{yx_1} - r_{yx_2}r_{x_1x_2}}{\sqrt{(1-r^2_{x_1x_2})(1-r^2_{yx_2})}} }$$` -- - Subtract out the correlation of `\(X_2\)` on `\(Y\)` controlling for the correlation between `\(X_1\)` and `\(X_2\)` -- - Scale by variability left over after accounting for the same -- `$$\Large{ \beta_{yx_1, x_2} = \rho_{yx_1, x_2} \frac{\sigma_{x_1}}{\sigma_y} }$$` --- ## Sums of Squares and Partioning `$$SS_{total} = SS_{model} + SS_{error}$$` -- $$\sum{(Y_i - \bar{Y})^2} = \sum{(\hat{Y_i} - \bar{Y})^2} + \sum{(Y_i -\hat{Y_i})^2} $$ -- We are trying to minimize `\(SS_{error}\)` and partition `\(SS_{model}\)` between `\(X_1\)` and `\(X_2\)` --- ## Paritioning Variation <img src="mlr_files/figure-html/venn_mlr-1.jpeg" style="display: block; margin: auto;" /> .center[Each area is a sums of squares (i.e., amount of variability)] --- ## Paritioning Variation <img src="mlr_files/figure-html/venn_mlr_x_y-1.jpeg" style="display: block; margin: auto;" /> .center[The variation in X<sub>2</sub> associated with Y] --- ## You can see how collinearity would be a problem <img src="mlr_files/figure-html/venn_mlr_coll-1.jpeg" style="display: block; margin: auto;" /> .center[ How can we partition this? What is unique? ] --- ## Generalizing 2 Predictors to N Predictors How do we represent our models that go beyond 2 predictors... `$$y_i = \beta_0 + \beta_1 x_{1i }+ \beta_2 x_{2i} + \epsilon_i$$` `$$\epsilon_i \sim N(0, \sigma)$$` -- ### With n predictors `$$y_i = \beta_0 + \sum_{j = 1}^{K} \beta_1 x_{ij} + \epsilon_i$$` `$$\epsilon_i \sim N(0, \sigma)$$` --- ## Translating to Matrices: The General Linear Model For a simple linear regression: `$$\begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ y_6 \\ y_7 \end{bmatrix} = \begin{bmatrix}1 & x_1 \\1 & x_2 \\1 & x_3 \\1 & x_4 \\1 & x_5 \\1 & x_6 \\ 1 & x_7 \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \\ \varepsilon_5 \\ \varepsilon_6 \\ \varepsilon_7 \end{bmatrix}$$` --- ## Multiple Regression in Matrix Form $$ `\begin{equation*} \begin{bmatrix}Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix} = \begin{bmatrix}1 & X_{11} & X_{12} & \cdots & X_{1,p-1} \\ 1 & X_{21} & X_{22} & \cdots & X_{2,p-1} \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & X_{n1} & X_{n2} & \cdots & X_{n,p-1} \end{bmatrix} \times \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_{p-1} \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix} \end{equation*}` $$ -- <br><br><br> `$$\widehat {\textbf{Y} } = \textbf{X}\beta$$` `$$\textbf{Y} \sim \mathcal{N}(\widehat {\textbf{Y} }, \Sigma)$$` --- ## The Expansiveness of the General Linear Model .Large[ `$$\widehat {\textbf{Y} } = \textbf{X}\beta$$` `$$\textbf{Y} \sim \mathcal{N}(\widehat {\textbf{Y} }, \Sigma)$$` ] - This equation is huge. X can be anything - categorical, continuous, squared, sine, etc. - There can be straight additivity, or interactions --- ## Why Multiple Predictors? <img src="mlr_files/figure-html/unnamed-chunk-3-1.jpeg" style="display: block; margin: auto;" /> --- ## Models with Multiple Predictors .large[ 1. What is multiple regression? 2. .red[Fitting multiple regression to data] 3. Multicollinearity 4. Inference with fit models ] --- background-image: url("images/23/fires.jpg") class: center, inverse background-size: cover .bottom[ .left[ .small[ Five year study of wildfires & recovery in Southern California shurblands in 1993. 90 plots (20 x 50m) (data from Jon Keeley et al.) ]]] --- ## What causes species richness? - Distance from fire patch - Elevation - Abiotic index - Patch age - Patch heterogeneity - Severity of last fire - Plant cover --- ## Many Things may Influence Species Richness <img src="mlr_files/figure-html/keeley_pairs-1.jpeg" style="display: block; margin: auto;" /> --- ## Our Model `$$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$$` `$$\epsilon_i \sim \mathcal{N}(0, \sigma^2)$$` -- **In R code:** .large[ ```r klm <- lm(rich ~ cover + firesev + hetero, data=keeley) ``` ] --- ## Testing Assumptions - Data Generating Process: Linearity -- - Error Generating Process: Normality & homoscedasticity of residuals -- - Data: Outliers not influencing residuals -- - Predictors: **Minimal multicollinearity** --- ## Did We Match our Data? <img src="mlr_files/figure-html/unnamed-chunk-4-1.jpeg" style="display: block; margin: auto;" /> --- ## How About That Linearity? <img src="mlr_files/figure-html/unnamed-chunk-5-1.jpeg" style="display: block; margin: auto;" /> --- ## OK, Normality of Residuals? <img src="mlr_files/figure-html/unnamed-chunk-6-1.jpeg" style="display: block; margin: auto;" /> --- ## OK, Normality of qResiduals? <img src="mlr_files/figure-html/unnamed-chunk-7-1.jpeg" style="display: block; margin: auto;" /> --- ## No Heteroskedasticity? <img src="mlr_files/figure-html/unnamed-chunk-8-1.jpeg" style="display: block; margin: auto;" /> --- ## Outliers? <img src="mlr_files/figure-html/unnamed-chunk-9-1.jpeg" style="display: block; margin: auto;" /> --- class: center, middle ## ![](./images/23/gosling_multicollinearity.jpg) --- ## Models with Multiple Predictors .large[ 1. What is multiple regression? 2. Fitting multiple regression to data. 3. .red[Multicollinearity] 4. Inference with fit models ] --- ## Why Worry about Multicollinearity? - Adding more predictors decreases precision of estimates -- - If predictors are too collinear, can lead to difficulty fitting model -- - If predictors are too collinear, can inflate SE of estimates further -- - If predictors are too collinear, are we *really* getting **unique information** --- ## Checking for Multicollinearity: Correlation Matrices ``` cover firesev hetero cover 1.0000000 -0.43713460 -0.16837784 firesev -0.4371346 1.00000000 -0.05235518 hetero -0.1683778 -0.05235518 1.00000000 ``` - Correlations over 0.4 can be problematic, but, meh, they may be OK even as high as 0.8. - To be sure, we should look at how badly they change the SE around predictors. - How much do they **inflate variance** and harm our precision --- ## Checking for Multicollinearity: Variance Inflation Factor Consider our model: `$$y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \epsilon$$` -- We can also model: `$$X_{1} = \alpha_{0} + \alpha_{2}x_{2} + \epsilon_j$$` The variance of `\(X_1\)` associated with other predictors is `\(R^{2}_1\)` -- In MLR, the variance around our parameter estimate (square of SE) is: `$$var(\beta_{1}) = \frac{\sigma^2}{(n-1)\sigma^2_{X_1}}\frac{1}{1-R^{2}_1}$$` -- The second term in that equation is the **Variance Inflation Parameter** `$$VIF = \frac{1}{1-R^2_{1}}$$` --- ## Checking for Multicollinearity: Variance Inflation Factor `$$VIF_1 = \frac{1}{1-R^2_{1}}$$` <img src="mlr_files/figure-html/klm_vif-1.jpeg" style="display: block; margin: auto;" /> VIF `\(>\)` 5 or 10 can be problematic and indicate an unstable solution. --- ## What Do We Do with High Collinearity? - Cry. -- - Evaluate **why** -- - Can drop a predictor if information is redundant -- - Can combine predictors into an index - Add them? Or other combination. - PCA for orthogonal axes - Factor analysis to compress into one variable --- ## Models with Multiple Predictors .large[ 1. What is multiple regression? 2. Fitting multiple regression to data. 3. Multicollinearity. 4. .red[Inference with fit models] ] --- ## What does it all mean: the coefficients `$$Richness_i =\beta_{0}+ \beta_{1} cover_i +\beta_{2} firesev_i + \beta_{3}hetero_i +\epsilon_i$$` <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 1.68 </td> <td style="text-align:right;"> 10.67 </td> </tr> <tr> <td style="text-align:left;"> cover </td> <td style="text-align:right;"> 15.56 </td> <td style="text-align:right;"> 4.49 </td> </tr> <tr> <td style="text-align:left;"> firesev </td> <td style="text-align:right;"> -1.82 </td> <td style="text-align:right;"> 0.85 </td> </tr> <tr> <td style="text-align:left;"> hetero </td> <td style="text-align:right;"> 65.99 </td> <td style="text-align:right;"> 11.17 </td> </tr> </tbody> </table> - `\(\beta_0\)` - the intercept - is the # of species when **all other predictors are 0** - Note the very large SE -- - All other `\(\beta\)`s are the effect of a 1 unit increase on # of species - They are **not** on the same scale - They are each in the scale of species per unit of individual x --- ## Comparing Coefficients on the Same Scale `$$r_{xy} = b_{xy}\frac{sd_{x}}{sd_{y}}$$` ``` # Standardization method: basic Parameter | Std. Coef. | 95% CI ----------------------------------------- (Intercept) | 0.00 | [ 0.00, 0.00] cover | 0.33 | [ 0.14, 0.51] firesev | -0.20 | [-0.38, -0.01] hetero | 0.50 | [ 0.33, 0.67] ``` -- - For linear model, makes intuitive sense to compare strength of association - Note, this is Pearson's correlation, so, it's in units of `\(sd_y/sd_x\)` --- ## How Much Variation is Associated with the Predictors <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> r.squared </th> <th style="text-align:right;"> adj.r.squared </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.41 </td> <td style="text-align:right;"> 0.39 </td> </tr> </tbody> </table> - 41% of the variation in # of species is associated with the predictors -- - Note that this is **all model**, not individual predictors -- - `\(R_{adj}^2 = 1 - \frac{(1-R^2)(n-1)} {n-k-1}\)` - Scales fit by model complexity - If we add more terms, but don't increase `\(R^2\)`, it can go down --- ## So, Uh, How Do We Visualize This? <img src="mlr_files/figure-html/klm_see_effects-1.jpeg" style="display: block; margin: auto;" /> --- ## Visualization Strategies for Multivariate Models - Plot the effect of each variable holding the other variables constant - Mean, Median, 0 - Or your choice! - Plot **counterfactual scenarios** from model - Can match data (and be shown as such) - Can bexplore the response surface --- ## Added Variable Plot to Show Unique Contributions when Holding Others at 0 <img src="mlr_files/figure-html/klm_avplot-1.jpeg" style="display: block; margin: auto;" /> --- ## Plots at Median of Other Variables <img src="mlr_files/figure-html/klm_visreg-1.jpeg" style="display: block; margin: auto;" /> --- ## Counterfactual Predictions Overlaid on Data <img src="mlr_files/figure-html/crazy-1.jpeg" style="display: block; margin: auto;" /> --- ## Counterfactual Surfaces at Means of Other Variables <img src="mlr_files/figure-html/unnamed-chunk-11-1.jpeg" style="display: block; margin: auto;" /> --- class:center, middle ![](images/23/matrix_regression.jpg)