Complex Linear Models

Homework adapted from https://statistics4ecologistsexercises.netlify.app/multiple-regression-and-design-matrices

1. Complex Linear Models and their Design Matrices

  1. For this exercise, we will consider the penguins data set from the palmerpenguins package. The data set contains various measures of penguin size, species, and sex, along with the year and island when/where the observations were made. Begin by loading the data and dropping observations that have missing values for some of the predictors. For this data, we can filter on not being NA for sex:
library(dplyr)
library(palmerpenguins)
data(penguins)

#some incomplete cases to filter out
penguins <- penguins |> filter(!is.na(sex))

1A. First, consider a linear model in which you attempt to predict a penguin’s body mass (body_mass_g) from its flipper length (flipper_length-mm) and the sex (sex) of the penguin.

1B. Write down the entries in the design matrix, \(X\) for the first 3 observations in the data set. Don’t forget the intercept! Verify you are correct with the model.matrix() function. Note, to see how to write a table in markdown for a quarto document, check out https://quarto.org/docs/authoring/tables.html

penguins[1:3,c("body_mass_g", "flipper_length_mm", "sex")]
# A tibble: 3 × 3
  body_mass_g flipper_length_mm sex   
        <int>             <int> <fct> 
1        3750               181 male  
2        3800               186 female
3        3250               195 female

1C. Now, consider adding species to the model in addition to flipper length (flipper_length-mm) and the sex (sex) of the penguin. Fit the model using effects coding.

Write down the entries in the design matrix for the following observations:

penguins[c(1, 2, 200, 201, 300, 301),
         c("body_mass_g", "flipper_length_mm", "sex", "species")]
# A tibble: 6 × 4
  body_mass_g flipper_length_mm sex    species  
        <int>             <int> <fct>  <fct>    
1        3750               181 male   Adelie   
2        3800               186 female Adelie   
3        4900               217 female Gentoo   
4        5050               220 male   Gentoo   
5        3600               195 male   Chinstrap
6        3900               199 female Chinstrap

1D. Lastly, let’s allow the effect of flipper length to be sex-specific. This can be accomplished by adding an interaction between sex and flipper_length_mm. Again, write down the entries in the design matrix for 6 observations selected just above. Check yourself with model.matrix()

2. Three variations on a theme

For this exercise, we will use the leaftemp data set in the DAAG package. The data set contains measurements of vapor pressure (vapPress) and differences between leaf and air temperatures (tempDiff) in an experiment conducted at three different levels of carbon dioxide (CO2level).

pacman::p_load(DAAG)
data(leaftemp)

2A. Fit the following three models to these data:

  • simple linear regression: lm(tempDiff ~ vapPress, data = leaftemp)
  • Analysis of covariance: lm(tempDiff ~ vapPress + CO2level, data= leaftemp)
  • Interaction model: lm(tempDiff ~ vapPress*CO2level, data= leaftemp)
lm1 <- lm(tempDiff ~ vapPress, data = leaftemp)
lm2 <- lm(tempDiff ~ vapPress + CO2level, data= leaftemp)
lm3 <- lm(tempDiff ~ vapPress*CO2level, data= leaftemp)

2B. Do all of these models pass checks of assumptions?

2C. For the Analysis of covariance model, write down the equation Corresponding to the model. In quarto, you can use LaTeX to write equations fairly simply. Check out this quick primer here and/or just use this LaTeX equation generator here. I promise, it’s super useful!

2D. Plot the predicted mean temperature difference as a function of vapor pressure (and when appropriate, CO\(_2\) level) for each of the 3 models.

3. Interactions with Continuous Variables

Scientists wanted to simulate how different biological interactions might influence the carbon burial potential of sinking algae in the deep ocean. Let’s use this simulated data which features sinking rate, microbial abundance, and detritovore abundance as predictors of net carbon sequestration.

3A Load the data, inspect it, and fit a model with a 3-way interaction, Do you meet assumptions?

3B Now the fun part - inference. What do the coefficients tell you?

3C OK - that’s a lot. Use your skills of visualization do tease out what the data is telling us. You can use visreg() or augment() with data_grid() or whatever you would like. Make this model make sense so that you can tell your audience how these three parameters work together to influence carbon burial!


Meta 1.

Where do you think we will go next with models like there?

Meta 2.

In particular, what do you find most interesting about intereaction effects? What do you find most intimidating?

Meta 3.

How do you think you will use complex linear models like these in your own work?

Meta 3.

Now that we have fully explored purely “linear” models, what one question or concern do you still have?

Meta 4.

How much time did this take you, roughly? Again, I’m trying to keep track that these assignments aren’t killer, more than anything.

Meta 5.

Please give yourself a weak/sufficient/strong assessment on this assigment. Feel free to comment on why.