Comparing means are among the most frequently used tests in data analysis. Traditionally, they were done as a T-test, but, really, t-tests are just subsets of linear models. So, why not fit the appropriate linear model, and go from there! They’re delightfully simple, and provide a robust example of how to examine the entire workflow of a data analysis. These are steps you’ll take with any analysis you do in the future, no matter how complex the model!
When we have one set of values we want to estimate the mean for and look at whether it is equivalent to a pre-specified hypothesis, we’re talking about fitting an intercept only model. We might be interested in a single variable - like range shifts. Or we might be interested in a difference between pairs of values. These would traditionally all fall under the rubric of one-sample t-tests or paired t-tests (e.g., a one sample t-test where the one-sample is the difference between pairs).
As a linear model, we want to look at
\[y_i = \beta_0 + N(0, \sigma)\] and see if \(\beta_0\) is different from a hypothesized value (NHST), if this model predicts out of sample variability (CV), or establish the probability distribution of \(\beta_0\) (Bayes).
Let’s look at the W&S data on blackbird immunoresponse before and after testosterone implants. So, first, load the data and visualize the change in immunoresponse.
library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
blackbird <- read_csv("data/05/12e2BlackbirdTestosterone.csv") %>%
mutate(Bird = 1:n())
b_tidy <- blackbird %>%
pivot_longer(-Bird,
names_to = "When",
values_to = "Antibody") %>%
filter((When %in% c("Before", "After"))) %>%
mutate(When = forcats::fct_rev(When))
ggplot(data=b_tidy, aes(x=When, y=Antibody, group=Bird)) +
geom_point(color="red") +
geom_line() +
theme_bw(base_size=18)
The distribution of differences
ggplot(data = blackbird, mapping=aes(x=dif)) +
geom_histogram(binwidth = 10) +
geom_vline(xintercept=0, lty=2)
So, right away we can see the problem with the distribution of the data.
Let’s proceed and ignore it for the moment. Or see if it shows up after fitting a model. We fit this with an intercept only model, so we estimate a mean, SE of the mean, and can then look at the coefficients to see if the intercept (estimated mean) is different from zero. Let’s also fit this using a Bayesian model, just to see the difference in inference.
diff_mod <- lm(dif ~ 1, data = blackbird)
library(brms)
diff_brm <- brm(dif ~ 1, data = blackbird, chains = 1)
Great! Let’s look at the coefficient output
summary(diff_mod)
##
## Call:
## lm(formula = dif ~ 1, data = blackbird)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.462 -12.461 -2.462 10.539 26.538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.538 4.336 -1.508 0.157
##
## Residual standard error: 15.63 on 12 degrees of freedom
From this, we can see that there is a fiarly low chance that the average difference is not null. We can look at the Bayesian fit to see the probability distribution of the fit value.
fixef(diff_brm, probs = c(0.025, 0.1, 0.9, 0.975))
## Estimate Est.Error Q2.5 Q10 Q90 Q97.5
## Intercept -6.826094 4.493277 -15.97363 -12.52304 -1.348547 1.873803
plot(diff_brm)
We can see that while a lot of the weight is less than 0, a good bit of the distribution is >0.
This is all well and good, but, what about model assumptions? We’re still assuming a single mean, and a normal error distribution. Here are two classic diagnostics -
plot(diff_mod, which = 1)
Right! Everything is on the same fitted value. It looks ok - doesn’t look like things are biased towards one side or the other. What about the residual distribution.
hist(residuals(diff_mod), breaks = 10)
Oof! That is not good looking.
shapiro.test(residuals(diff_mod))
##
## Shapiro-Wilk normality test
##
## data: residuals(diff_mod)
## W = 0.93148, p-value = 0.3563
Eh. Our sample size is low enough that this looks a-ok. If we were worried about it, we could log transform the difference, or look at difference in logs.
For two means, we want to fit a model where our categorical variable is translated into 0s and 1s. That corresponds to the following model:
\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i\] Here, \(\beta_0\) corresponds to the mean for the group that is coded as 0 and \(\beta_1\) is the difference between the two groups. If \(\beta_1\) is different than 0, then the two groups are different, under a frequentist framework.
To see this, let’s look at a data set examining the survival of juvenile chinook salmon (yum) in rivers with versus without brook trout.
First, we’ll load and plot the data.
chinook <- read_csv("data/05/12e4BrookTrout.csv")
ggplot(data=chinook, mapping=aes(x=`brook trout?`, y=`Mean chinook survival`)) +
geom_boxplot()
OK, let’ fit the corresponding model and look at the coefficients.
surv_mod <- lm(`Mean chinook survival` ~ `brook trout?`,
data = chinook)
broom::tidy(surv_mod)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.234 0.0269 8.70 0.00000563
## 2 `brook trout?`+ -0.0712 0.0381 -1.87 0.0907
Great! There is a 9% chance of obtaining this or a more extreme difference under a null hypothesis.
So, here is our one new twist. Let’s look at our residuals by group. We assume in our model that the residual variance is the same for each group.
plot(surv_mod, which = 1)
Oof. This is not good. We can evaluate that a bit more formally with a Levene Test.
library(car)
leveneTest(surv_mod)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 33.187 0.0001825 ***
## 10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Yes. They are different. Now, we have options here. We can move to a non-parametric test using ranks. Or, we can fit model where variance is allowed to vary by group. We can do this with the gls()
- generalized least squares - function in {nlme}. It doesn’t like spaces, so, we have to make a few changes
library(nlme)
chinook <- chinook %>%
mutate(trout = `brook trout?`,
survival = `Mean chinook survival`)
surv_gls <- gls(survival ~ trout,
data = chinook,
weights = varIdent(form = ~ 1|trout),
method = "ML")
broom.mixed::tidy(surv_gls)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.234 0.0367 6.37 0.0000814
## 2 trout+ -0.0712 0.0381 -1.87 0.0907
Note how similar the results are? This shows the robustness, in many ways, of the linear model. Granted, with the difference in variances, if we used the model without the different variances, our prediction intervals would be inaccurate. But, NHST tests are fairly robust here.
We need to expand the formula a bit to fit this using Bayes. What I like is that {brms} lets us easily model the sigma
term as a function of predictors.
surv_brm_mod <- brmsformula(survival ~ trout,
sigma ~ trout)
surv_brm <- brm(surv_brm_mod,
data = chinook)
summary(surv_brm)
Note that we use a similar formula here.
\[y_i = \beta_0 + \beta_1 x_i + N(0, \sigma_i)\] \[\sigma_i = \gamma_0 + \gamma_1 x_i\]
Our coefficients so far have shown us the mean of only one group. While we can add them together, to get the SE right, we can use extra formulae, but, doing all of this by hand is tedious. Fortunatelu, the {emmeans} package, which stands for expected means, can take a fit model and get run the calculations for you.
library(emmeans)
emmeans(surv_gls, ~trout)
## trout emmean SE df lower.CL upper.CL
## - 0.234 0.03675 8.31 0.150 0.318
## + 0.163 0.00993 6.00 0.139 0.187
##
## Degrees-of-freedom method: satterthwaite
## Confidence level used: 0.95
#or if we want to go bayesian
emmeans(surv_brm, ~trout)
## trout emmean lower.HPD upper.HPD
## - 0.233 0.147 0.325
## + 0.163 0.138 0.190
##
## Point estimate displayed: median
## HPD interval probability: 0.95
We can use the emmeans object as a data frame for plotting, or, it even has a plot method itself that uses ggplot2.
plot(emmeans(surv_gls, ~trout))