Combining Multiple Categorical Variables
Analyzing a Multi-way Model
Replicating Categorical Variable Combinations: Factorial Models
The Implications of Interaction Effects
Units placed across a lake so that 1 set of each treatment was ’blocked’ together
yijk=β0+∑βixi+∑βjxj+ϵijk
ϵijk∼N(0,σ2)
xi=0,1 xj=0,1
i = treatment type 1, j = treatment type 2, k = replicate β0 = reference treatment combination (i.e, control block 1)
yijk=β0+∑βixi+∑βjxj+ϵijk
ϵijk∼N(0,σ2)
xi=0,1 xj=0,1
i = treatment type 1, j = treatment type 2, k = replicate β0 = reference treatment combination (i.e, control block 1)
Or, with matrices, a general linear model...
\boldsymbol{Y} = \boldsymbol{\beta X} + \boldsymbol{\epsilon}
Data
treatment | zooplankton | block |
---|---|---|
control | 4.1 | 1 |
low | 2.2 | 1 |
high | 1.3 | 1 |
control | 3.2 | 2 |
low | 2.4 | 2 |
high | 2.0 | 2 |
Data Prepped for Model
(Intercept) | treatmenthigh | treatmentlow | block2 | block3 | block4 | block5 |
---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 1 | 0 | 0 | 0 | 0 |
1 | 1 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 1 | 0 | 0 | 0 |
1 | 0 | 1 | 1 | 0 | 0 | 0 |
1 | 1 | 0 | 1 | 0 | 0 | 0 |
Least Squares
zoop_lm <- lm(zooplankton ~ treatment + block, data=zoop)
Likelihood
zoop_glm <- glm(zooplankton ~ treatment + block, data=zoop, family = gaussian(link = "identity"))
Bayesian
zoop_brm <- brm(zooplankton ~ treatment + block, data=zoop, family = gaussian(link = "identity"), chains = 2)
Independence of data points
Normality within groups (of residuals)
No relationship between fitted and residual values
Homoscedasticity (homogeneity of variance) of groups
Additivity of Treatments
We now want to look at both sets of categories to evaluate HOV
Levene Test for Treatment
term | df | statistic | p.value |
---|---|---|---|
group | 2 | 0.3315789 | 0.7241614 |
12 |
Levene Test for Block
term | df | statistic | p.value |
---|---|---|---|
group | 4 | 0.4241486 | 0.7880527 |
10 |
Our model is y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}
But, if A and B are non-additive, results are incorrect.
Our model is y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}
But, if A and B are non-additive, results are incorrect.
Our model is y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}
But, if A and B are non-additive, results are incorrect.
We don't have the DF with n=1 per treatment combination to calculate an interaction, so...
Assume a model of y_{ij} = \mu + \alpha_i + \beta_j + \lambda\alpha_i\beta_j
Our model is y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}
But, if A and B are non-additive, results are incorrect.
We don't have the DF with n=1 per treatment combination to calculate an interaction, so...
Assume a model of y_{ij} = \mu + \alpha_i + \beta_j + \lambda\alpha_i\beta_j
We can then test for SS_{AB} using \lambda\alpha_i\beta_j
Test stat Pr(>|Test stat|)block Tukey test 0.4742 0.6354
Combining Multiple Categorical Variables
Analyzing a Multi-way Model
Replicating Categorical Variable Combinations: Factorial Models
The Implications of Interaction Effects
Which categories are associated with variability in our response?
How much variability are we explaining?
Do we want to compare to simpler models for predictive ability?
Comparison of Means
TreatmentHo: \mu_{i1} = \mu{i2} = \mu{i3} = ...
Block Ho: \mu_{j1} = \mu{j2} = \mu{j3} = ...
i.e., The variane due to each treatment type is no different than noise
SS_{Total} = SS_{Between A} + SS_{Between B} + SS_{Within}
Factors are Orthogonal and Balanced, so, Model SS can be split
OR we compare y ~ a versus y ~ b to see if y ~ a + b has a higher likelihood
term | df | sumsq | meansq | F | p.value |
---|---|---|---|---|---|
treatment | 2 | 6.857333 | 3.428667 | 16.365951 | 0.0014881 |
block | 4 | 2.340000 | 0.585000 | 2.792363 | 0.1010308 |
Residuals | 8 | 1.676000 | 0.209500 |
R^2 = 0.846
term | df | sumsq | meansq | F | p.value |
---|---|---|---|---|---|
treatment | 2 | 6.857333 | 3.428667 | 16.365951 | 0.0014881 |
block | 4 | 2.340000 | 0.585000 | 2.792363 | 0.1010308 |
Residuals | 8 | 1.676000 | 0.209500 |
R^2 = 0.846
term | LR Chisq | df | p.value |
---|---|---|---|
treatment | 32.73190 | 2 | 0.0000001 |
block | 11.16945 | 4 | 0.0247242 |
zoop_only_trt <- lm(zooplankton ~ treatment, data = zoop)zoop_only_block <- lm(zooplankton ~ block, data = zoop)zoop_nuts_matter <- lm(zooplankton ~ I(treatment !="control")+ block, data = zoop)zoop_null <- lm(zooplankton ~ 1, data = zoop)
Modnames | K | AICc | Delta_AICc | AICcWt | |
---|---|---|---|---|---|
2 | Only trt | 4 | 34.80170 | 0.000000 | 0.9781236 |
5 | Null | 2 | 42.74210 | 7.940404 | 0.0184568 |
4 | Nuts or Control | 7 | 46.49203 | 11.690333 | 0.0028305 |
1 | All | 8 | 49.69355 | 14.891854 | 0.0005710 |
3 | Only Block | 6 | 56.60710 | 21.805405 | 0.0000180 |
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
treatmentcontrol | 3.42 | 0.3126766 | 10.9378182 | 0.0000043 |
treatmenthigh | 1.78 | 0.3126766 | 5.6927826 | 0.0004582 |
treatmentlow | 2.40 | 0.3126766 | 7.6756619 | 0.0000588 |
block2 | 0.00 | 0.3737200 | 0.0000000 | 1.0000000 |
block3 | -0.70 | 0.3737200 | -1.8730599 | 0.0979452 |
block4 | -1.00 | 0.3737200 | -2.6757998 | 0.0281084 |
block5 | -0.30 | 0.3737200 | -0.8027399 | 0.4453163 |
Component-Residual Plots take examine unique effect of one treatment after removing influence of the other.
contrast | estimate | lower.HPD | upper.HPD |
---|---|---|---|
control - high | 1.6222387 | 0.9075445 | 2.312269 |
control - low | 1.0123920 | 0.3131533 | 1.743134 |
high - low | -0.6112915 | -1.3635940 | 0.095609 |
Can use whatever contrast style you'd like, and can do this for any set of categories
Combining Multiple Categorical Variables
Analyzing a Multi-way Model
Replicating Categorical Variable Combinations: Factorial Models
The Implications of Interaction Effects
Until now, we have assumed factors combine additively - the effect of one is not dependent on the effect of the other
BUT - what if the effect of one factor depends on another?
Until now, we have assumed factors combine additively - the effect of one is not dependent on the effect of the other
BUT - what if the effect of one factor depends on another?
This is an INTERACTION and is quite common
Until now, we have assumed factors combine additively - the effect of one is not dependent on the effect of the other
BUT - what if the effect of one factor depends on another?
This is an INTERACTION and is quite common
Biology: The science of "It depends..."
Until now, we have assumed factors combine additively - the effect of one is not dependent on the effect of the other
BUT - what if the effect of one factor depends on another?
This is an INTERACTION and is quite common
Biology: The science of "It depends..."
This is challenging to think about and visualize, but if you can master it, you will go far!
Note: You can have as many treatment types or observed category combinations as you want (and then 3-way, 4-way, etc. interactions)
A Tukey Non-Additivity Test would Scream at us
y_{ijk} = \beta_{0} + \sum \beta_{i}x_{i} + \sum \beta_{j}x_{j} + \sum \beta_{ij}x_{ij} + \epsilon_{ijk}
\epsilon_{ijk} \sim N(0, \sigma^{2} ) x_{i} = 0,1, x_{j} = 0,1, x_{ij} = 0,1
Note the new last term
Deviation due to treatment combination
y_{ijk} = \beta_{0} + \sum \beta_{i}x_{i} + \sum \beta_{j}x_{j} + \sum \beta_{ij}x_{ij} + \epsilon_{ijk}
\epsilon_{ijk} \sim N(0, \sigma^{2} ) x_{i} = 0,1, x_{j} = 0,1, x_{ij} = 0,1
Note the new last term
Deviation due to treatment combination
\boldsymbol{Y} = \boldsymbol{\beta X} + \boldsymbol{\epsilon}
(Intercept) | heightmid | herbivoresplus | heightmid:herbivoresplus |
---|---|---|---|
1 | 0 | 0 | 0 |
1 | 0 | 1 | 0 |
1 | 1 | 0 | 0 |
1 | 1 | 1 | 1 |
Least Squares
graze_int <- lm(sqrtarea ~ height + herbivores + height:herbivores, data=algae)## ORgraze_int <- lm(sqrtarea ~ height*herbivores, data=algae)
Least Squares
graze_int <- lm(sqrtarea ~ height + herbivores + height:herbivores, data=algae)## ORgraze_int <- lm(sqrtarea ~ height*herbivores, data=algae)
Likelihood
graze_int_glm <- glm(sqrtarea ~ height*herbivores, data=algae, family = gaussian(link = "identity"))
Bayes
graze_int_brm <- brm(sqrtarea ~ height*herbivores, data=algae, family = gaussian(link = "identity"), chains = 2)
Combining Multiple Categorical Variables
Analyzing a Multi-way Model
Replicating Categorical Variable Combinations: Factorial Models
The Implications of Interaction Effects
SS_{Total} = SS_{A} + SS_{B} + SS_{AB} +SS_{Error}
SS_{AB} = n\sum_{i}\sum_{j}(\bar{Y_{ij}} - \bar{Y_{i}}- \bar{Y_{j}} - \bar{Y})^{2} df=(i-1)(j-1)
SS_{Total} = SS_{A} + SS_{B} + SS_{AB} +SS_{Error}
SS_{AB} = n\sum_{i}\sum_{j}(\bar{Y_{ij}} - \bar{Y_{i}}- \bar{Y_{j}} - \bar{Y})^{2} df=(i-1)(j-1)
SS_{Total} = SS_{A} + SS_{B} + SS_{AB} +SS_{Error}
SS_{AB} = n\sum_{i}\sum_{j}(\bar{Y_{ij}} - \bar{Y_{i}}- \bar{Y_{j}} - \bar{Y})^{2} df=(i-1)(j-1)
SS_{Total} = SS_{A} + SS_{B} + SS_{AB} +SS_{Error}
SS_{AB} = n\sum_{i}\sum_{j}(\bar{Y_{ij}} - \bar{Y_{i}}- \bar{Y_{j}} - \bar{Y})^{2} df=(i-1)(j-1)
LR Chisq | Df | Pr(>Chisq) | |
---|---|---|---|
height | 0.3740858 | 1 | 0.5407855 |
herbivores | 6.3579319 | 1 | 0.0116858 |
height:herbivores | 11.0029142 | 1 | 0.0009097 |
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 32.91450 | 3.855532 | 8.536955 | 0.0000000 |
heightmid | -10.43090 | 5.452546 | -1.913034 | 0.0605194 |
herbivoresplus | -22.51075 | 5.452546 | -4.128484 | 0.0001146 |
heightmid:herbivoresplus | 25.57809 | 7.711064 | 3.317064 | 0.0015486 |
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 32.91450 | 3.855532 | 8.536955 | 0.0000000 |
heightmid | -10.43090 | 5.452546 | -1.913034 | 0.0605194 |
herbivoresplus | -22.51075 | 5.452546 | -4.128484 | 0.0001146 |
heightmid:herbivoresplus | 25.57809 | 7.711064 | 3.317064 | 0.0015486 |
Intercept chosen as basal condition (low, herbivores -)
Changing height to high is associated with a loss of 10 units of algae relative to low/-
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 32.91450 | 3.855532 | 8.536955 | 0.0000000 |
heightmid | -10.43090 | 5.452546 | -1.913034 | 0.0605194 |
herbivoresplus | -22.51075 | 5.452546 | -4.128484 | 0.0001146 |
heightmid:herbivoresplus | 25.57809 | 7.711064 | 3.317064 | 0.0015486 |
Intercept chosen as basal condition (low, herbivores -)
Changing height to high is associated with a loss of 10 units of algae relative to low/-
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 32.91450 | 3.855532 | 8.536955 | 0.0000000 |
heightmid | -10.43090 | 5.452546 | -1.913034 | 0.0605194 |
herbivoresplus | -22.51075 | 5.452546 | -4.128484 | 0.0001146 |
heightmid:herbivoresplus | 25.57809 | 7.711064 | 3.317064 | 0.0015486 |
Intercept chosen as basal condition (low, herbivores -)
Changing height to high is associated with a loss of 10 units of algae relative to low/-
Adding herbivores is associated with a loss of 22 units of algae relative to low/-
BUT - if you add herbivores and mid, that's also associated with an increase of 25 units of algae relative to low/-
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 32.91450 | 3.855532 | 8.536955 | 0.0000000 |
heightmid | -10.43090 | 5.452546 | -1.913034 | 0.0605194 |
herbivoresplus | -22.51075 | 5.452546 | -4.128484 | 0.0001146 |
heightmid:herbivoresplus | 25.57809 | 7.711064 | 3.317064 | 0.0015486 |
Intercept chosen as basal condition (low, herbivores -)
Changing height to high is associated with a loss of 10 units of algae relative to low/-
Adding herbivores is associated with a loss of 22 units of algae relative to low/-
BUT - if you add herbivores and mid, that's also associated with an increase of 25 units of algae relative to low/-
NEVER TRY AND INTERPRET ADDITIVE EFFECTS ALONE WHEN AN INTERACTION IS PRESENT
that way lies madness
contrast estimate SE df t.ratio p.value minus - plus 9.72 3.86 60 2.521 0.0144 Results are averaged over the levels of: height
contrast | estimate | SE | df | t.ratio | p.value |
---|---|---|---|---|---|
low minus - mid minus | 10.430905 | 5.452546 | 60 | 1.913034 | 0.0605194 |
low minus - low plus | 22.510748 | 5.452546 | 60 | 4.128484 | 0.0001146 |
low minus - mid plus | 7.363559 | 5.452546 | 60 | 1.350481 | 0.1819337 |
mid minus - low plus | 12.079843 | 5.452546 | 60 | 2.215450 | 0.0305355 |
mid minus - mid plus | -3.067346 | 5.452546 | 60 | -0.562553 | 0.5758352 |
low plus - mid plus | -15.147189 | 5.452546 | 60 | -2.778003 | 0.0072896 |
contrast | estimate | SE | df | t.ratio | p.value |
---|---|---|---|---|---|
low minus - mid minus | 10.430905 | 5.452546 | 60 | 1.913034 | 0.0605194 |
low minus - low plus | 22.510748 | 5.452546 | 60 | 4.128484 | 0.0001146 |
low minus - mid plus | 7.363559 | 5.452546 | 60 | 1.350481 | 0.1819337 |
mid minus - low plus | 12.079843 | 5.452546 | 60 | 2.215450 | 0.0305355 |
mid minus - mid plus | -3.067346 | 5.452546 | 60 | -0.562553 | 0.5758352 |
low plus - mid plus | -15.147189 | 5.452546 | 60 | -2.778003 | 0.0072896 |
It Depends is a rule in biology
Context dependent interactions everywhere
Using categorical predictors in a factorial design is an elegant way to see interactions without worrying about shapes of relationships
BUT - it all comes down to a general linear model! And the same inferential frameworks we have been dealing with since day 1
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |