ANOVA

For this lab, see the etherpad at https://etherpad.wikimedia.org/p/607-anova

1.One-Way ANOVA Model

We’ll start today with the dataset 15e1KneesWhoSayNight.csv about an experiment to help resolve jetlag by having people shine lights at different parts of themselves to try and shift their internal clocks.

knees <- read.csv("./data/10/15e1KneesWhoSayNight.csv")

We can see the outcomes with ggplot2

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.3.2

ggplot(knees, mapping=aes(x=treatment, y=shift)) +
  stat_summary(color="red", size=1.3) +
    geom_point(alpha=0.7) +
  theme_bw(base_size=17)

1.1 LM, AOV, and Factors

As the underlying model of ANOVA is a linear one, we fit ANOVAs using lm() just as with linear regression.

knees <- read.csv("./data/10/15e1KneesWhoSayNight.csv")

knees_lm <- lm(shift ~ treatment, data=knees)

Now, there are two things to notice here. One, note that treatment is a factor. Well, it could be a factor or a character (for those of you that used read_csv() from readr). If it is not, because we are using lm(), it will be fit like a linear regression. So, beware!

There is an ANOVA-specific model fitting function - aov.

knees_aov <- aov(shift ~ treatment, data=knees)

It’s ok, I guess, and works with a few functions that lm() objects do not. But, in general, I find it too limiting. You can’t see coefficients, etc. Boooooring.

1.2 Assumption Evaluation

Because this is an lm, we can check our assumptions as before - with one new one. First, some oldies but goodies.

#The whole par thing lets me make a multi-panel plot
par(mfrow=c(2,2))
plot(knees_lm, which=c(1,2,5))
par(mfrow=c(1,1))

Now, the residuals v. fitted lets us see how the residuals are distributed by treatment, but I often find it insufficient. I could roll my own plot of resudials versus treatment, but, there’s a wonderful package called car - which is from the book Companion to Applied Regression by John Fox. I recommend it highly! It has a function in it called residualPlots() which is useful here.

library(car)
residualPlots(knees_lm)

## Warning in residualPlots.default(model, ...): No possible lack-of-fit tests

Note how it both does fitted v. residuals but also a boxplot by treatment. Handy, no?

1.3 Assumption Evaluation

OK, so, let’s see the ANOVA table! With the function….anova()!

anova(knees_lm)

## Analysis of Variance Table
## 
## Response: shift
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## treatment  2 7.2245  3.6122  7.2894 0.004472 **
## Residuals 19 9.4153  0.4955                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now….this is a type I sums of squares test. Which is fine for a 1-way ANOVA. If you want to start getting into the practice of using type II, car provides a function Anova() - note the capital A - which defaults to type II and I use instead. In fact, I use it all the time, as it handles a wide set of different models.

Anova(knees_lm)

## Anova Table (Type II tests)
## 
## Response: shift
##           Sum Sq Df F value   Pr(>F)   
## treatment 7.2245  2  7.2894 0.004472 **
## Residuals 9.4153 19                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here it matters not a whit as you get the same table.

Note, if you had done this using likelihood, you could have done all of this with a LR Chisq also using Anova()

knees_glm <- glm(shift ~ treatment, data=knees,
                 family=gaussian())

Anova(knees_glm)

## Analysis of Deviance Table (Type II tests)
## 
## Response: shift
##           LR Chisq Df Pr(>Chisq)    
## treatment   14.579  2  0.0006827 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

1.4 Post-hoc Tests

So, there are a lot of things we can do with a fit model

1.4.0 Summary Output

summary(knees_lm)

## 
## Call:
## lm(formula = shift ~ treatment, data = knees)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.27857 -0.36125  0.03857  0.61147  1.06571 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   -0.30875    0.24888  -1.241  0.22988   
## treatmenteyes -1.24268    0.36433  -3.411  0.00293 **
## treatmentknee -0.02696    0.36433  -0.074  0.94178   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7039 on 19 degrees of freedom
## Multiple R-squared:  0.4342, Adjusted R-squared:  0.3746 
## F-statistic: 7.289 on 2 and 19 DF,  p-value: 0.004472

Ew. That’s the treatment contrasts. Not super useful. To not get an intercept term, you need to refit the model without the intercept. You can fit a whole new model with -1 in the model formulation. Or, as I like to do to ensure I don’t frak anything up, you can update() your model. Just use . to signify what was there before.

knees_lm_no_int <- update(knees_lm, formula = . ~ . -1)

summary(knees_lm_no_int)

## 
## Call:
## lm(formula = shift ~ treatment - 1, data = knees)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.27857 -0.36125  0.03857  0.61147  1.06571 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## treatmentcontrol  -0.3087     0.2489  -1.241    0.230    
## treatmenteyes     -1.5514     0.2661  -5.831 1.29e-05 ***
## treatmentknee     -0.3357     0.2661  -1.262    0.222    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7039 on 19 degrees of freedom
## Multiple R-squared:  0.6615, Adjusted R-squared:  0.6081 
## F-statistic: 12.38 on 3 and 19 DF,  p-value: 0.0001021

OK - that makes more sense. We see means and if they are different from 0. But….what about post-hoc tests

1.4.1 A Priori Contrasts

If you have a priori contrasts, you can use the constrat library to test them. You give contrast an a list and a b list. Then we get all comparisons of a v. b, in order. It’s not great syntactically, but, it lets you do some pretty creative things.

library(contrast)
contrast(knees_lm, 
         a = list(treatment = "control"), 
         b = list(treatment = "eyes"))

## lm model parameter contrast
## 
##   Contrast      S.E.     Lower    Upper    t df Pr(>|t|)
## 1 1.242679 0.3643283 0.4801306 2.005227 3.41 19   0.0029

1.4.2 Tukey’s HSD

Meh. 9 times out of 10 we want to do something more like a Tukey Test. There is a TukeyHSD function that works on aov objects, but, if you’re doing anything with an lm, it borks on you. There’s a wonderful package called lsmeans, which stands for Least Square Means. It is wonderful as it’s designed to work with ANOVA and ANCOVA models with complicated structures such that, for post-hocs, it adjusts to the mean or median level of all other factors. Very handy. One merely tells it what

library(lsmeans)
contrast(lsmeans(knees_lm, specs ="treatment"),
        method = "tukey")

##  contrast          estimate        SE df t.ratio p.value
##  control - eyes  1.24267857 0.3643283 19   3.411  0.0079
##  control - knee  0.02696429 0.3643283 19   0.074  0.9970
##  eyes - knee    -1.21571429 0.3762767 19  -3.231  0.0117
## 
## P value adjustment: tukey method for comparing a family of 3 estimates

We don’t need to worry about many of the fancier things that lsmeans does for the moment - those will become more useful with other models. But for now, Tukey test!

1.4.2 Dunnet’s Test

We can similarly use this to look at a Dunnett’s test, which compares against the control

contrast(lsmeans(knees_lm, specs ="treatment"),
        method = "dunnett")

##  contrast          estimate        SE df t.ratio p.value
##  eyes - control -1.24267857 0.3643283 19  -3.411  0.0057
##  knee - control -0.02696429 0.3643283 19  -0.074  0.9927
## 
## P value adjustment: dunnettx method for 2 tests

Note, if the “control” had not been the first treatment, you can either re-order the factor using forcats or just specify which of the levels is the control. For example, eyes is the second treatment. Let’s make it our new reference.

contrast(lsmeans(knees_lm, specs ="treatment"),
        method = "dunnett", ref=2)

##  contrast       estimate        SE df t.ratio p.value
##  control - eyes 1.242679 0.3643283 19   3.411  0.0057
##  knee - eyes    1.215714 0.3762767 19   3.231  0.0085
## 
## P value adjustment: dunnettx method for 2 tests

1.4.2 Bonferroni Correction and FDR

Let’s say you wanted to do all pairwise tests, but, compare using a Bonferroni correction or FDR. Or none! No problem! There’s an adjust argument

contrast(lsmeans(knees_lm, specs ="treatment"),
        method = "tukey", adjust="bonferroni")

##  contrast          estimate        SE df t.ratio p.value
##  control - eyes  1.24267857 0.3643283 19   3.411  0.0088
##  control - knee  0.02696429 0.3643283 19   0.074  1.0000
##  eyes - knee    -1.21571429 0.3762767 19  -3.231  0.0132
## 
## P value adjustment: bonferroni method for 3 tests

contrast(lsmeans(knees_lm, specs ="treatment"),
        method = "tukey", adjust="fdr")

##  contrast          estimate        SE df t.ratio p.value
##  control - eyes  1.24267857 0.3643283 19   3.411  0.0066
##  control - knee  0.02696429 0.3643283 19   0.074  0.9418
##  eyes - knee    -1.21571429 0.3762767 19  -3.231  0.0066
## 
## P value adjustment: fdr method for 3 tests

contrast(lsmeans(knees_lm, specs ="treatment"),
        method = "tukey", adjust="none")

##  contrast          estimate        SE df t.ratio p.value
##  control - eyes  1.24267857 0.3643283 19   3.411  0.0029
##  control - knee  0.02696429 0.3643283 19   0.074  0.9418
##  eyes - knee    -1.21571429 0.3762767 19  -3.231  0.0044

1.4.4 Bayesian Post-Hocs

So you want ot be fancy and go Bayesian? Cool. BANOVA is a great way to do post-hocs without worrying about type S error. We can fit our BANOVA model simply using stan_glm. Since we’re going to want to work with the treatment means, it behooves us to remove the intercept for easier manipulation later. But you can do what you’d like.

library(rstanarm)
knees_banova <- stan_glm(shift ~ treatment -1,
                         data = knees,
                         family=gaussian())

knees_chains <- as.data.frame(knees_banova)

We can now do two things. First, see the SD between treatment relative to the residual SD. Let’s use dplyr to help us out

library(dplyr)

#The first three columns are are treatments
head(knees_chains)

##   treatmentcontrol treatmenteyes treatmentknee     sigma
## 1      -0.10916165     -1.871153   -0.63523132 0.6762152
## 2       0.01663867     -1.327950   -0.07396708 1.0970584
## 3      -0.37744111     -1.621097    0.08668951 0.6594176
## 4      -0.25766107     -1.448610   -0.56602960 0.6400642
## 5      -0.11669737     -1.676854   -0.23338487 0.7713831
## 6      -0.43990681     -1.346885   -0.54422339 0.8428178

#now get a column that is between treatment variability
knees_chains <- knees_chains %>% 
  rowwise() %>%
  mutate(trt_sigma = sd(c(treatmentcontrol, treatmenteyes, treatmentknee))) %>%
  ungroup()

#Between Treatment variability
quantile(knees_chains$trt_sigma)

##         0%        25%        50%        75%       100% 
## 0.07864555 0.60236715 0.72361976 0.84910819 1.43273017

#Now, the residual SD
quantile(knees_chains$sigma)

##        0%       25%       50%       75%      100% 
## 0.4476528 0.6605394 0.7359900 0.8223701 1.5042217

You can plot this and do much more here as well.

For posthocs, simply look at the difference btween columns

#A Dunnett's test
quantile(knees_chains$treatmentcontrol - knees_chains$treatmenteyes, prob=c(0.1, 0.9))

##       10%       90% 
## 0.7410063 1.6896455

quantile(knees_chains$treatmentcontrol - knees_chains$treatmentknee, prob=c(0.1, 0.9))

##        10%        90% 
## -0.4497099  0.4759009

We can see that eyes might be different from the control, but not so much knees.

Bayesians, get on it. The rest - enjoy.

1.5 Faded Examples

Let’s try three ANOVAs! First - do landscape characteristics affect the number of generations plant species can exist before local extinction?

plants <- read.csv("./data/10/15q01PlantPopulationPersistence.csv")

#Visualize
qplot(treatment, generations, data=plants, geom="boxplot")

#fit
plant_lm <- lm(generations ~ treatment, data=plants)

#assumptions
plot(plant_lm, which=c(1,2,4,5))

#ANOVA
anova(plant_lm)

#Tukey's HSD
contrast(lsmeans(plant_lm, spec = "treatment"), method = "tukey")

Second, how do different host types affect nematode longevity?

worms <- read.csv("./data/10/15q19NematodeLifespan.csv")

#Visualize
qplot(treatment, lifespan, data=____, geom="____")

#fit
worm_lm <- lm(______ ~ ______, data=worms)

#assumptions
plot(______, which=c(1,2,4,5))

#ANOVA
anova(______)

#Tukey's HSD
contrast(lsmeans(______, spec = "______"), method = "tukey")

And last, how about how number of genotypes affect eelgrass productivity. Note, THERE IS A TRAP HERE. Look at your dataset before you do ANYTHING.

eelgrass <- read.csv("./data/10/15q05EelgrassGenotypes.csv")

#Visualize
________(treatment.genotypes, shoots, data=____, geom="____")

#fit
eelgrass_lm <- __(______ ~ ______, data=________)

#assumptions
________(______, which=c(1,2,4,5))

#ANOVA
________(______)

#Tukey's HSD
contrast(________(______, spec = "______"), method = "________")

2. Two-Way ANOVA

We’ll work with the zooplankton depdredation dataset for two-way ANOVA. This is a blocked experiment, so, each treatment is in each block just once.

zooplankton <- read.csv("./data/10/18e2ZooplanktonDepredation.csv")

qplot(treatment, zooplankton, data=zooplankton, geom="boxplot")

qplot(block, zooplankton, data=zooplankton, geom="boxplot")

## Warning: Continuous x aesthetic -- did you forget aes(group=...)?

Oh. That’s odd. What is up with block? AH HA! It’s continuous. We need to make it discrete to work with it.

zooplankton$block  <- factor(zooplankton$block)
qplot(block, zooplankton, data=zooplankton, geom="boxplot")

There we go. Always check!

2.1 Fit and Assumption Evaluation

Fit is quite easy. We just add one more factor to an lm model!

zooplankton_lm <- lm(zooplankton ~ treatment + block,
                     data = zooplankton)

We then evaluate residuals almost as usual…

par(mfrow=c(2,2))
plot(zooplankton_lm, which=c(1,2,5))

We want to look more deeply by treatment and block. For which we use car’s residualPlots()

residualPlots(zooplankton_lm)

##            Test stat Pr(>|t|)
## treatment         NA       NA
## block             NA       NA
## Tukey test     0.474    0.635

Notice that this pops out a Tukey test, and we are looking…GOOD!

2.2 Type II Sums of Squares

Given that we now have multiple factors, in case of unbalance, we should use type II sums of squares.

Anova(zooplankton_lm)

## Anova Table (Type II tests)
## 
## Response: zooplankton
##           Sum Sq Df F value   Pr(>F)   
## treatment 6.8573  2 16.3660 0.001488 **
## block     2.3400  4  2.7924 0.101031   
## Residuals 1.6760  8                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.3 Post-Hocs

Here, lsmeans gets interesting.

contrast(lsmeans(zooplankton_lm, spec="treatment"), "tukey")

##  contrast       estimate        SE df t.ratio p.value
##  control - high     1.64 0.2894823  8   5.665  0.0012
##  control - low      1.02 0.2894823  8   3.524  0.0190
##  high - low        -0.62 0.2894823  8  -2.142  0.1424
## 
## Results are averaged over the levels of: block 
## P value adjustment: tukey method for comparing a family of 3 estimates

Note the message that we’ve averaged over the levels of block. Now, because this is a balanced design, this should produce nothing untoward. Let’s look at the explicit comparison of the lsmeans results and the raw results from summary

lsmeans(zooplankton_lm, spec="treatment")

##  treatment lsmean        SE df  lower.CL upper.CL
##  control     3.02 0.2046949  8 2.5479727 3.492027
##  high        1.38 0.2046949  8 0.9079727 1.852027
##  low         2.00 0.2046949  8 1.5279727 2.472027
## 
## Results are averaged over the levels of: block 
## Confidence level used: 0.95

coef(summary(update(zooplankton_lm, .~.-1)))

##                       Estimate Std. Error       t value     Pr(>|t|)
## treatmentcontrol  3.420000e+00  0.3126766  1.093782e+01 4.330286e-06
## treatmenthigh     1.780000e+00  0.3126766  5.692783e+00 4.581786e-04
## treatmentlow      2.400000e+00  0.3126766  7.675662e+00 5.875004e-05
## block2            7.166459e-16  0.3737200  1.917601e-15 1.000000e+00
## block3           -7.000000e-01  0.3737200 -1.873060e+00 9.794523e-02
## block4           -1.000000e+00  0.3737200 -2.675800e+00 2.810839e-02
## block5           -3.000000e-01  0.3737200 -8.027399e-01 4.453163e-01

Note that the treatment results here are for within block 1 with coef. So lsmeans lets us get the whole block thing out of the equation.

2.4 Faded Examples

Given then similarity with 1-way ANOVA, let’s just jump right into two examples, noting a key difference or two here and there.

To start with, let’s look at gene expression by different types of bees.

bees <- read.csv("./data/10/18q07BeeGeneExpression.csv")

#Visualize
________(type, Expression, data=____, geom="____")

#fit
bee_lm <- __(______ ~ ______ + _____, data=________)

#assumptions
________(______, which=c(1,2,4,5))

residualPlots(bee_lm)

#ANOVA
________(______)

#Tukey's HSD
contrast(________(______, spec = "______"), method = "________")

Wow, not that different, save adding one more term and the residualPlots.

OK, one more…. repeating an experiment in the intertidal?

intertidal <- read.csv("./data/10/18e3IntertidalAlgae.csv")

#Visualize
________(herbivores, sqrtarea, data=____, geom="____")

#fit
intertidal_lm <- __(______ ~ ______ + _____, data=________)

#assumptions
________(______, which=c(1,2,4,5))

residualPlots(intertidal_lm)

#ANOVA
________(______)

#Tukey's HSD
________(________(______, spec = "______"), method = "________")

Did that last one pass the test of non-additivity?

3. Factorial ANOVA

Going with that last mouse example, if you really looked, it was a factorial design, with multiple treatments and conditions.

qplot(herbivores, sqrtarea, data=intertidal, fill=height, geom="boxplot")

3.1 Fit and Assumption Evaluation

We fit factorial models using one of two different notations - both expand to the same thing

intertidal_lm <- lm(sqrtarea ~ herbivores + height + herbivores:height, data=intertidal)

intertidal_lm <- lm(sqrtarea ~ herbivores*height, data=intertidal)

Both mean the same thing as : is the interaction. * just means, expand all the interactions.

But, after that’s done…all of the assumption tests are the same. Try them out.

3.2 Type II and III Sums of Squares

Now, we can choose type II or III SS once we have >n=1 for simple effects. Let’s see the difference. Both are from Anova() from the car package.

Anova(intertidal_lm)

## Anova Table (Type II tests)
## 
## Response: sqrtarea
##                    Sum Sq Df F value   Pr(>F)   
## herbivores         1512.2  1  6.3579 0.014360 * 
## height               89.0  1  0.3741 0.543096   
## herbivores:height  2617.0  1 11.0029 0.001549 **
## Residuals         14270.5 60                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova(intertidal_lm, method="III")

## Anova Table (Type II tests)
## 
## Response: sqrtarea
##                    Sum Sq Df F value   Pr(>F)   
## herbivores         1512.2  1  6.3579 0.014360 * 
## height               89.0  1  0.3741 0.543096   
## herbivores:height  2617.0  1 11.0029 0.001549 **
## Residuals         14270.5 60                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.3 Post-Hocs

Post-hocs are a bit funnier. But not by much. As we have an interaction, let’s look at the simple effects:

contrast(lsmeans(intertidal_lm, spec=c("herbivores", "height")), "tukey")

##  contrast                estimate       SE df t.ratio p.value
##  minus,low - plus,low   22.510748 5.452546 60   4.128  0.0006
##  minus,low - minus,mid  10.430905 5.452546 60   1.913  0.2336
##  minus,low - plus,mid    7.363559 5.452546 60   1.350  0.5350
##  plus,low - minus,mid  -12.079843 5.452546 60  -2.215  0.1307
##  plus,low - plus,mid   -15.147189 5.452546 60  -2.778  0.0357
##  minus,mid - plus,mid   -3.067346 5.452546 60  -0.563  0.9427
## 
## P value adjustment: tukey method for comparing a family of 4 estimates

Well, that’s it, actually. You could begin to look at one of the main effects or the other, but as we know, that’s not going to get you very far. You can of course come up with other contrast structures so that you don’t lose as much power with respect to your p-values, but, there you go.

And, yeah, it works the same in BANOVA as all the way back in 1-way ANOVA. Only you have to hand-code it.

3.3 A Kelpy example

Let’s just jump right in with an example, as you should have all of this well in your bones by now. This was from a kelp, predator-diversity experiment I ran ages ago. Note, some things that you want to be factors might be loaded as

kelp <- read.csv("./data/10/kelp_pred_div_byrnesetal2006.csv")

## Check and correct for non-factors
____________
_________

#Visualize
________(Treatment, Porp_Change, data=____, geom="____", fill=Trial)

#fit
kelp_lm <- __(______ ~ ______ * _____, data=________)

#assumptions
________(______, which=c(1,2,4,5))

residualPlots(_________)

#ANOVA
________(______)

#Tukey's HSD
________(________(______, spec = "______"), method = "________")

3.3.1 The Cost of Tukey

So, the kelp example is an interesting one, as this standard workflow is not what I wanted when I ran this experiment. I was not interested in a Tukey test of all possible treatments. Run it with no adjustement - what do you see?

#Pariwise Comparison without P-Value adjustment - The LSD test
________(________(______, spec = "______"), method = "________", adjust="_____")

Instead, I was interested in asking whether predator diversity - having a mixture versus only one species of predator - led to less kelp loss than any of the other treatments. There are a few ways to assess the answer to that question.

First, a Dunnet’s test with the Predator Mixture as the control. Try that out. Note, the default “control” is Dungeness crabs, so, you might want to revisit that.

#Dunnet's Test
________(________(______, spec = "______"), method = "________", ref=____)

What did you learn?

3.3.2 Replicated Regression

So…. this was actually a replicated regression design. There are a few ways to deal with this. Note the column Predator_Diversity

Try this whole thing as a regression. What do you see?

Make a new column that is Predator_Diversity as a factor. Refit the factorial ANOVA with this as your treatment. NOW try a Tukey test. What do you see?

3.3.3 A Priori Contrast F tests

OK, one more way to look at this. What we’re actually asking in comparing monocultures and polycultures is, do we explain more variation with a monoculture v. poyculture split than if not?

kelp_contr <- lm(Change_g ~ C(factor(Predator_Diversity), c(0,1,-1))*Trial, data=kelp)

Anova(kelp_contr)

## Anova Table (Type II tests)
## 
## Response: Change_g
##                                                  Sum Sq Df F value
## C(factor(Predator_Diversity), c(0, 1, -1))        87424  2  5.0689
## Trial                                            109084  1 12.6494
## C(factor(Predator_Diversity), c(0, 1, -1)):Trial   4440  2  0.2574
## Residuals                                        353570 41        
##                                                     Pr(>F)    
## C(factor(Predator_Diversity), c(0, 1, -1))       0.0107863 *  
## Trial                                            0.0009646 ***
## C(factor(Predator_Diversity), c(0, 1, -1)):Trial 0.7742620    
## Residuals                                                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now we see that, yes, we explain variation when we partition things into monoculture versus polyculture than when we do not.

Setting up a priori ways of partitioning your sums of squares (that must be orthogonal) is a powerful way to test grouping hypotheses and worth keeping in your back pocket for future explorations.