library(readr)
library(dplyr)
library(performance)
library(car)
library(broom)
library(modelr)
library(emmeans)
library(ggplot2)
library(GGally)
library(visreg)
library(piecewiseSEM)
theme_set(theme_bw(base_size = 14))
Complex Linear Models
For today, let’s start with some standard libraries for these analyses.
1. Many Categorical Varibles
We’ll work with the zooplankton depredation dataset for two-way linear model with categorical predictors. This is a blocked experiment, so, each treatment is in each block just once.
<- read_csv("./data/18e2ZooplanktonDepredation.csv") zooplankton
Rows: 15 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): treatment
dbl (2): zooplankton, block
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(data = zooplankton,
aes(x = treatment, y = zooplankton)) +
geom_boxplot()
ggplot(data = zooplankton,
aes(x = block, y = zooplankton)) +
geom_boxplot()
Warning: Continuous x aesthetic
ℹ did you forget `aes(group = ...)`?
Oh. That’s odd. What is up with block? AH HA! It’s continuous. We need to make it discrete to work with it.
<- zooplankton |>
zooplankton mutate(block = as.character(block))
ggplot(data = zooplankton,
aes(x = block, y = zooplankton)) +
geom_boxplot()
There we go. Always check!
1.1 Fit and Assumption Evaluation
Fit is quite easy. We just add one more factor to an lm model!
<- lm(zooplankton ~ treatment + block,
zooplankton_lm data = zooplankton)
We then evaluate residuals almost as usual…
library(performance)
check_model(zooplankton_lm)
We want to look more deeply by fitted versus residuals to inspect for nonadditivity. Let’s check it out.
check_model(zooplankton_lm, check = "linearity") |> plot()
Looks good!
1.2 Coefficients
Let’s look at the coefs!
tidy(zooplankton_lm)
# A tibble: 7 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 3.42e+ 0 0.313 1.09e+ 1 0.00000433
2 treatmenthigh -1.64e+ 0 0.289 -5.67e+ 0 0.000473
3 treatmentlow -1.02e+ 0 0.289 -3.52e+ 0 0.00781
4 block2 9.57e-16 0.374 2.56e-15 1.00
5 block3 -7.00e- 1 0.374 -1.87e+ 0 0.0979
6 block4 -1.00e+ 0 0.374 -2.68e+ 0 0.0281
7 block5 -3.00e- 1 0.374 -8.03e- 1 0.445
This is still a little odd, though, as our treatments are evaluated in block 1 control. To truly get just the treatment effect, we need to look at the estimated marginal means - the emmeans
! The big thing about emmeans
is that it creates a reference grid based on the blocks. It then calculates the treatment effect averaged over all blocks, rather than just in block 1.
library(emmeans)
<- emmeans(zooplankton_lm, ~treatment)
zoop_em
zoop_em
treatment emmean SE df lower.CL upper.CL
control 3.02 0.205 8 2.548 3.49
high 1.38 0.205 8 0.908 1.85
low 2.00 0.205 8 1.528 2.47
Results are averaged over the levels of: block
Confidence level used: 0.95
1.3 Evaluating treatment differences
Here, emmeans
gets interesting.
contrast(zoop_em, method = "pairwise") |>
confint()
contrast estimate SE df lower.CL upper.CL
control - high 1.64 0.289 8 0.813 2.467
control - low 1.02 0.289 8 0.193 1.847
high - low -0.62 0.289 8 -1.447 0.207
Results are averaged over the levels of: block
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 3 estimates
Note the message that we’ve averaged over the levels of block. You can do any sort of posthoc evaluation (query your model!) as you wanted before. And, you could have done the same workflow for block as well.
1.4 Faded Examples
Given then similarity with models from last week, let’s just jump right into two examples, noting a key difference or two here and there.
To start with, let’s look at gene expression by different types of bees.
<- read.csv("./data/18q07BeeGeneExpression.csv")
bees
#Visualize
________(data=____,
aes(x = type, y = Expression)) +
_________()
#fit
<- __(______ ~ ______ + _____, data=________)
bee_lm
#assumptions
________(______)
#pairwise comparison
emmeans(__________, spec = ~ ________) |>
contrast(method = "pairwise") |>
confint()
Wow, not that different, save adding one more term and the residualPlots.
OK, one more…. repeating an experiment in the intertidal?
<- read.csv("./data/18e3IntertidalAlgae.csv")
intertidal
#Visualize
ggplot(data = _____________,
_____ = ______(x = herbivores, y = sqrtarea)) +
_____________()
#fit
<- __(______ ~ ______ + _____, data=________)
intertidal_lm
#assumptions
________(______)
#pairwise comparison
emmeans(__________, spec = ~ ________) |>
contrast(method = "___________") |>
________()
Did that last one pass the test of non-additivity?
2. Mixing Categorical and Continuous Variables
Combining categorical and continuous variables is not that different from what we have done above. To start with, let’s look at the neanderthal data from class.
<- read.csv("./data/18q09NeanderthalBrainSize.csv")
neand head(neand)
lnmass lnbrain species
1 4.00 7.19 recent
2 3.96 7.23 recent
3 3.95 7.26 recent
4 4.04 7.23 recent
5 4.11 7.22 recent
6 4.10 7.24 recent
We can clearly see both the categorical variable we’re interested in, and the covariate.
To get a preliminary sense of what’s going on here, do some exploratory visualization with ggplot2
why doncha!
ggplot(data = neand,
mapping = aes(x = lnmass, y = lnbrain,
color = species)) +
geom_point() +
stat_smooth(method = "lm")
Now, the CIs are going to be off as this wasn’t all tested in the same model, but you begin to get a sense of whether things are parallel or not, and whether this covariate is important.
What other plots might you produce?
As this is a general linear model, good olde lm()
is still there for us.
<- lm(lnbrain ~ species + lnmass, data=neand) neand_lm
2.1 Testing Assumptions
In addition to the ususal tests, we need to make sure that the slopes really are parallel. We do that by fitting a model with an interaction and testing it (which, well, if there was and interaction, might that be interesting).
First, the usual
check_model(neand_lm)
#And now look at residuals by group/predictors
library(car)
residualPlots(neand_lm, tests=FALSE)
To test the parallel presumption
<- lm(lnbrain ~ species*lnmass, data=neand)
neand_int
tidy(neand_int)
# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.25 0.977 4.35 0.000111
2 speciesrecent 1.18 1.06 1.11 0.274
3 lnmass 0.714 0.227 3.14 0.00340
4 speciesrecent:lnmass -0.259 0.248 -1.05 0.303
We can see that there is no interaction, and hence we are a-ok to proceed with the parallel lines model.
2.2 Assessing results
Now, let’s look at our coefficients
tidy(neand_lm)
# A tibble: 3 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 5.19 0.395 13.1 2.74e-15
2 speciesrecent 0.0703 0.0282 2.49 1.75e- 2
3 lnmass 0.496 0.0917 5.41 4.26e- 6
Everything looks like it matters. If you had wanted to do this with a centered covariate, how would things differ? Try it out.
Second, does our model explain much?
r2(neand_lm)
# R2 for Linear Regression
R2: 0.449
adj. R2: 0.418
Not bad - and both types agree fairly well.
Next, we might want to compare covariate adjusted groups and/or look at covariate adjusted means.
library(emmeans)
<- emmeans(neand_lm, specs = ~species)
adj_means
#adjusted means
adj_means
species emmean SE df lower.CL upper.CL
neanderthal 7.272 0.02419 36 7.223 7.321
recent 7.342 0.01250 36 7.317 7.367
Confidence level used: 0.95
Nice! But, what if we wanted to know what value of the covariate we were using? To get those, we need to use say that we are using lnmass
as a reference grid - i.e. we’re marginalizing over it.
<- emmeans(neand_lm, ~species|lnmass)
avg_sp_means
avg_sp_means
lnmass = 4.198:
species emmean SE df lower.CL upper.CL
neanderthal 7.272 0.02419 36 7.223 7.321
recent 7.342 0.01250 36 7.317 7.367
Confidence level used: 0.95
We can then use this for pairwise comparisons, etc.
#comparisons
contrast(avg_sp_means, method="pairwise") |>
confint()
lnmass = 4.2:
contrast estimate SE df lower.CL upper.CL
neanderthal - recent -0.0703 0.0282 36 -0.128 -0.0131
Confidence level used: 0.95
You can also specify other levels of mass, if you are more interested in those for biological relevance. You can even specify multiple levels, or even a smooth curve and plot the results!
We can also see the estimates of the slopes for each species (which is more useful when there is an interaction - as summary
is just fine without.)
#no interaction
emtrends(neand_lm, ~species, var = "lnmass")
species lnmass.trend SE df lower.CL upper.CL
neanderthal 0.496 0.0917 36 0.31 0.682
recent 0.496 0.0917 36 0.31 0.682
Confidence level used: 0.95
2.3 Visualization
Visualization is funny, as you want to make parallel lines and also get the CIs right. Rather than rely on ggplot2
to do this natively, we need to futz around a bit with generating predictions
<- augment(neand_lm, interval="confidence")
neand_predict
head(neand_predict)
# A tibble: 6 × 11
lnbrain species lnmass .fitted .lower .upper .resid .hat .sigma .cooksd
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7.19 recent 4 7.24 7.20 7.28 -0.0536 0.0860 0.0669 0.0222
2 7.23 recent 3.96 7.22 7.18 7.27 0.00624 0.114 0.0676 0.000425
3 7.26 recent 3.95 7.22 7.17 7.27 0.0412 0.122 0.0672 0.0202
4 7.23 recent 4.04 7.26 7.23 7.30 -0.0335 0.0637 0.0673 0.00611
5 7.22 recent 4.11 7.30 7.27 7.33 -0.0782 0.0394 0.0662 0.0196
6 7.24 recent 4.1 7.29 7.27 7.32 -0.0532 0.0418 0.0670 0.00968
# ℹ 1 more variable: .std.resid <dbl>
So, here we have fit values, lower confidence interval, and upper confidence intervals. As we have not fed augment()
any new data, these values line up with our neand
data frame, so can use them immediately.
ggplot(data = neand_predict,
aes(x = lnmass, color = species)) +
geom_point(mapping=aes(y=lnbrain)) +
geom_line(mapping = aes(y=.fitted)) +
geom_ribbon(aes(ymin=.lower,
ymax=.upper,
group = species),
fill="lightgrey",
alpha=0.5,
color = NA) +
theme_bw()
And there we have nice parallel lines with model predicted confidence intervals!
2.4 Examples
I’ve provided two data sets:
1) 18e4MoleRatLayabouts.csv
looking at how caste and mass affect the energy mole rates expend.
18q11ExploitedLarvalFish.csv
looking at how status of a marine area - protected or not - influences the CV around age of maturity of a number of different fish (so, age is a predictor)
Using the following workflow, analyze these data sets.
# Load the data
# Perform a preliminary visualization
# Fit a model
# Test Assumptions and modify model if needed
# Evaluate results
# Post-hocs if you can
# Visualize results
3. Interacting Categorical Variables
Going with that previous intertidal example, if you really looked, it was a factorial design, with multiple treatments and conditions.
<- read.csv("./data/18e3IntertidalAlgae.csv")
intertidal
ggplot(intertidal,
mapping = aes(x = herbivores, y = sqrtarea,
fill = height)) +
geom_boxplot()
3.1 Fit and Assumption Evaluation
We fit factorial models using one of two different notations - both expand to the same thing
<- lm(sqrtarea ~ herbivores + height + herbivores:height, data=intertidal)
intertidal_lm
<- lm(sqrtarea ~ herbivores*height, data=intertidal) intertidal_lm
Both mean the same thing as :
is the interaction. *
just means, expand all the interactions.
But, after that’s done…all of the assumption tests are the same. Try them out.
check_model(intertidal_lm)
3.2 Post-Hocs
Post-hocs are a bit funnier. But not by much. As we have an interaction, let’s look at the simple effects. There are a few ways we can do this
emmeans(intertidal_lm, ~ herbivores + height)
herbivores height emmean SE df lower.CL upper.CL
minus low 32.9 3.86 60 25.20 40.6
plus low 10.4 3.86 60 2.69 18.1
minus mid 22.5 3.86 60 14.77 30.2
plus mid 25.6 3.86 60 17.84 33.3
Confidence level used: 0.95
emmeans(intertidal_lm, ~ herbivores | height)
height = low:
herbivores emmean SE df lower.CL upper.CL
minus 32.9 3.86 60 25.20 40.6
plus 10.4 3.86 60 2.69 18.1
height = mid:
herbivores emmean SE df lower.CL upper.CL
minus 22.5 3.86 60 14.77 30.2
plus 25.6 3.86 60 17.84 33.3
Confidence level used: 0.95
emmeans(intertidal_lm, ~ height | herbivores)
herbivores = minus:
height emmean SE df lower.CL upper.CL
low 32.9 3.86 60 25.20 40.6
mid 22.5 3.86 60 14.77 30.2
herbivores = plus:
height emmean SE df lower.CL upper.CL
low 10.4 3.86 60 2.69 18.1
mid 25.6 3.86 60 17.84 33.3
Confidence level used: 0.95
Notice how each presents the information in a different way. The numbers are not different, they just show you information in different ways. The contrasts each reference grid implies do make s difference, though, in how p-value corrections for FWER is handled. Consider the first case.
emmeans(intertidal_lm, ~ herbivores + height) |>
contrast(method = "pairwise")
contrast estimate SE df t.ratio p.value
minus low - plus low 22.51 5.45 60 4.128 0.0006
minus low - minus mid 10.43 5.45 60 1.913 0.2336
minus low - plus mid 7.36 5.45 60 1.350 0.5350
plus low - minus mid -12.08 5.45 60 -2.215 0.1307
plus low - plus mid -15.15 5.45 60 -2.778 0.0357
minus mid - plus mid -3.07 5.45 60 -0.563 0.9427
P value adjustment: tukey method for comparing a family of 4 estimates
OK, cool. Every comparison is made. But what if we don’t want to do that? What if we just want to see if the herbivore effect differs by height?
emmeans(intertidal_lm, ~ herbivores | height) |>
contrast(method = "pairwise")
height = low:
contrast estimate SE df t.ratio p.value
minus - plus 22.51 5.45 60 4.128 0.0001
height = mid:
contrast estimate SE df t.ratio p.value
minus - plus -3.07 5.45 60 -0.563 0.5758
Notice that, because we’re only doing two contrasts, the correction is not as extreme. This method of contrasts might be more what you are interested in given your question. We can also see how this works visuall.
<- emmeans(intertidal_lm, ~ herbivores | height) |>
cont contrast(method = "pairwise")
plot(cont) +
geom_vline(xintercept = 0, color = "red", lty = 2)
1.3 A Kelpy example
Let’s just jump right in with an example, as you should have all of this well in your bones by now. This was from a kelp, predator-diversity experiment I ran ages ago. Note, some things that you want to be factors might be loaded as
<- read_csv("lab/data/kelp_pred_div_byrnesetal2006.csv")
kelp
## Check and correct for non-character
## variables - (TRIAL!)
## (this is some dplyr fun)
____________
_________
#Visualize
________(data = _____,
aes(Treatment, Porp_Change, fill=Trial)) +
geom______()
#fit
<- __(______ ~ ______ * _____, data=________)
kelp_lm
#assumptions
::_______(__________)
performance
# look at the model
::____________(__________)
broom
# how well did we explain the data?
________(_____________)
#Pairwise Comparison of simple effects
emmeans(_______, specs = ~ __________ |___________) |>
contrast(method = "________")
3.4 Replicated Regression and mixing Categorical and Continuous Variables
So…. the above was actually a replicated regression design. This is a design where you have continuous treatments, but you replicate at each treatment level. There are a few ways to deal with this. Note the column Predator_Diversity
Try this whole thing as a model with diversity as a continuous predictor an interaction by trial. What do you see?
Note - this is a great time to try out emmeans::emtrends()
.
Make a new column that is Predator_Diversity
as a factor. Refit the factorial ANOVA with this as your treatment. Now try a pairwise test. What do you see? Visualize both. How do they give you different information? What would you conclude?
4. Interaction Effects in MLR
Let’s use the keeley fire severity plant richness data to see it Multiple Linear Regression with interaction effects action.
data(keeley)
ggpairs(keeley)
For our purposes, we’ll focus on fire severity and plot age as predictors of richness.
4.1 Explore and Model
What if there was an interaction here? One reasonable hypothesis is that fires affect stands of different ages differently. So, let’s look at richness as a function of age and fire severity. First, we’ll visualize, then fit a model.
ggplot(keeley,
aes(x = age, y = rich, color = firesev)) +
geom_point(size = 2) +
scale_color_viridis_c(option = "B") +
facet_wrap(~cut_number(firesev, 4))
Huh. Kinda looks like age doesn’t have any effect until high fire severity.
<- lm(rich ~ age * firesev, data=keeley) keeley_mlr_int
4.2 Assumptions
The assumptions here are the same as for MLR - normality of residuals, homoscedasticity, the relationship should not be driven by outliers alone, and, of course, lack of collinearity. Let’s explore the first few.
check_model(keeley_mlr_int)
Also, excellent. Now, the vif…
vif(keeley_mlr_int)
there are higher-order terms (interactions) in this model
consider setting type = 'predictor'; see ?vif
age firesev age:firesev
7.967244 4.879394 15.800768
Oh hey! It tells us there’s a problem with interactions. I like performance
’s solution even more.
check_collinearity(keeley_mlr_int)
Model has interaction terms. VIFs might be inflated.
You may check multicollinearity among predictors of a model without
interaction terms.
# Check for Multicollinearity
Low Correlation
Term VIF VIF 95% CI Increased SE Tolerance Tolerance 95% CI
firesev 4.88 [ 3.52, 6.98] 2.21 0.20 [0.14, 0.28]
Moderate Correlation
Term VIF VIF 95% CI Increased SE Tolerance Tolerance 95% CI
age 7.97 [ 5.62, 11.51] 2.82 0.13 [0.09, 0.18]
High Correlation
Term VIF VIF 95% CI Increased SE Tolerance Tolerance 95% CI
age:firesev 15.80 [10.95, 23.01] 3.98 0.06 [0.04, 0.09]
Let’s look at the correlation between the predictors to see if there really is a problem.
%>% select(firesev, age) %>%
keeley mutate(int = firesev*age) %>%
cor()
firesev age int
firesev 1.0000000 0.4538654 0.7743624
age 0.4538654 1.0000000 0.8687952
int 0.7743624 0.8687952 1.0000000
OK, the interaction is pretty highly correlated with age. But, well, again, not really a problem unless it is for the algorithm you are working with. We can instead look at the age-firesev correlation, which is only 0.45. We’re fine.
4.3 Centering?
Well, if we did want to worry about this - or were using an algorithm that choked - one way to break that correlation is to center our predictors before using them in the relationship. It changes the meaning of the coefficients, but, the model results are the same. Let’s try that here.
<- keeley %>%
keeley mutate(firesev_c = firesev - mean(firesev),
age_c = age - mean(age))
<- lm(rich ~ age_c * firesev_c, data=keeley)
keeley_mlr_int_c
vif(keeley_mlr_int_c)
there are higher-order terms (interactions) in this model
consider setting type = 'predictor'; see ?vif
age_c firesev_c age_c:firesev_c
1.261883 1.261126 1.005983
We see the vif is now small. Moreover…
%>% select(firesev_c, age_c) %>%
keeley mutate(int = firesev_c*age_c) %>%
cor()
firesev_c age_c int
firesev_c 1.00000000 0.45386536 -0.06336909
age_c 0.45386536 1.00000000 -0.06792114
int -0.06336909 -0.06792114 1.00000000
Look at that correlation melt away. We’re still explaining variation in the same way. Let’s see how the coefficients differ.
coef(keeley_mlr_int)
(Intercept) age firesev age:firesev
42.9352358 0.8268119 3.1056379 -0.2302444
coef(keeley_mlr_int_c)
(Intercept) age_c firesev_c age_c:firesev_c
51.3790453 -0.2242540 -2.7809451 -0.2302444
So, in the no interaction model, the intercept is the richness when age and firesev are 0. The age and firesev effects are their effect when the other is 0, respectively. And the interaction is how they modify one another. Note that in the centered model, the interaction is the same. But, now the intercept is the richness at the average level of both predictors, and the additive predictors are the effects of each predictor at the average level of the other.
4.4 Evaluate
Given the lack of difference between the two models, we’re going to work with the original one. We’ve seen the coefficients and talked about their interpretation. The rest of the output can be interpreted as usual for a linear model.
tidy(keeley_mlr_int)
# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 42.9 7.85 5.47 0.000000437
2 age 0.827 0.313 2.64 0.00981
3 firesev 3.11 1.86 1.67 0.0992
4 age:firesev -0.230 0.0641 -3.59 0.000548
Not that it doesn’t look like fire severity has an effect when age is 0. But what does age = 0 really mean? This is one reason by often the centered model is more interpretable.
So, how do we understand the interaction?
We can do things like calculate out the effect of one predictor at different levels of the other…
tibble(age = 0:10) %>%
mutate(fire_effect = coef(keeley_mlr_int)[3] + coef(keeley_mlr_int)[4]*age)
# A tibble: 11 × 2
age fire_effect
<int> <dbl>
1 0 3.11
2 1 2.88
3 2 2.65
4 3 2.41
5 4 2.18
6 5 1.95
7 6 1.72
8 7 1.49
9 8 1.26
10 9 1.03
11 10 0.803
But this can be unsatisfying, and we have to propogate error and - ECH. Why not try and create something more intuitive…
4.5 Visualize Results
Fundamentally, we have to create visualizations that look at different levels of both predictors somehow. There are a number of strategies.
4.5.1 Visreg
We can begin by looking at the effect of one variable at different levels of the other.
visreg(keeley_mlr_int, "age", "firesev", gg=TRUE)
Here we see a plot where the points are adjusted for different levels of fireseverity - an even split between three levels - and then we see the resulting fit lines. We can of course reverse this, if we’re ore interested in the other effect.
visreg(keeley_mlr_int, "firesev", "age", gg=TRUE)
Both tell a compelling story of changing slopes that are easily understood. For example, in the later, fire has a stronger effect on older stands.
4.5.2 Fitted Model under Different Conditions
We might want to roll our own, however, and see the plot with different slices of the data. Fortunately, we can use the identical strategy as our MLR above - only now the slope changes. So, using the same strategy as above for making predicted data frames:
<- data_grid(keeley,
k_int_explore firesev = seq_range(firesev, 100),
age = seq_range(age, 4)) |>
augment(keeley_mlr_int, newdata = _, interval = "confidence") |>
rename(rich = .fitted)
Notice the lack of difference. We can now play with how we want to plot this. Here’s one, but you can come up with others easily.
ggplot(keeley,
aes(x = firesev, y = rich, color = age)) +
geom_point() +
geom_line(data = k_int_explore, aes(group = age)) +
geom_ribbon(data = k_int_explore, aes(ymin = .lower, ymax = .upper), alpha = 0.2, color = NA) +
facet_wrap(~cut_interval(age,4)) +
scale_color_viridis_c(option = "D")
4.5.3 Surface Plot of Model
Finally, a surface plot can be very helpful - particularly under a condition of many interactions. To make a surface plot, we can look at the model and exract values at the intersection of multiple predictors and then visualize a grid where color or some other property reflects predicted values. It lets us see more nuance about an interaction. Let’s start with getting our surface values for a 100x100 grid of predictors.
<- data_grid(keeley,
keeley_surface age = seq_range(age, 100),
firesev = seq_range(firesev, 100)) |>
augment(keeley_mlr_int, newdata = _) |>
rename(rich = .fitted)
Now, we can plot this grid using geom_raster
or geom_tile
(raster will smooth things a bit).
ggplot(keeley_surface,
aes(x = age, y = firesev, fill = rich)) +
geom_raster() +
scale_fill_viridis_c(option = "B")
Now we can see the landscape more clearly. old stands with high fire severity have low richness. However, either youth or low fire severity means higher richness. This kind of technique can be great with higher order interactions. For example, you can facet by up to two variables (or more) to examine higher order interactions, etc.
Also note, if you want to be lazy, you can do this with visreg for two variables
visreg2d(keeley_mlr_int, "age", "firesev")
Last, what about a REAL surface? We can use plotly
to help us here.
library(plotly)
<- plot_ly(x=~age, y=~firesev, z=~rich,
surf_plotly type="scatter3d", mode="surface",
data = keeley_surface, color = ~firesev)
surf_plotly
And if we want to add data points to it…
|>
surf_plotly add_markers(data = keeley,
x=~age, y=~firesev, z=~rich,
color = I("black"))
4.6 Examples
OK, here are two datasets to work with. Choose one, and play around with possible interactions using ggplot. Then fit and evaluate a model!
planktonSummary.csv
showing plankton from Lake Biakal (thanks, Stephanie Hampton). Evluate how Chlorophyll (CHLFa
) is affected by other predictors.Data on plant species richness in grazed salt marsh meadows in Finland. Meadows are grazed or ungrazed. see
?piecewiseSEM::meadows
to learn more. To load:
data(meadows, package = "piecewiseSEM")
- If you are tired of biology, see
data(tips, package = "reshape")
which looks at factors which might influence the tips given to one waiter over a few months. See?reshape:tips
.
# Load the data
# Perform a preliminary visualization. Play with this and choose two predictors
# Fit a MLR model with an interaction
# Test Asssumptions and modify model if needed
# Evaluate results
# Visualize results