In an experiment, we want to isolate effects between pairs of variables.
Experimental manipulation (done right) severs the link between a driver and its causes. We can now test the causal effect of changing one this driver on a response variable.
Properly designed experiments will have a distribution of other variables effecting our response variable. We want to reduce BIAS due to biological processes
How is your population defined?
What is the scale of your inference?
What might influence the inclusion of a environmental variability?
How important are external factors you know about?
How important are external factors you cannot assess?
AND - this term also includes observer error. We must minimize OBSERVER BIAS as well.
(Hurlbert 1984)
CONTROL
A treatment against which others are compared
Separate out causal v. experimental effects
Techniques to remove spurious effects of time, space, gradients, etc.
REPLICATION
How many points to fit a probability distribution?
Ensure that your effect is not a fluke10
i.e.,\(sim\) 5-10 samples per paramter (1 treatment = 1 parameter, but this is total # of samples)
Treatments can be continuous - or grouped into discrete categories
What is the variance between groups v. within groups?
Underlying linear model with control = intercept, dummy variable for bipolar
Underlying linear model with control = intercept, dummy variable for bipolar
Underlying linear model with control = intercept, dummy variable for schizo
\[1) y_{ij} = \bar{y} + (\bar{y}_{i} - \bar{y}) + ({y}_{ij} - \bar{y}_{i})\]
\[2) y_{ij} = \mu + \alpha_{i} + \epsilon_{ij}, \qquad \epsilon_{ij} \sim N(0, \sigma^{2} )\]
\[3) y_{j} = \beta_{0} + \sum \beta_{i}x_{i} + \epsilon_{j}, \qquad x_{i} = 0,1\]
\[\large y_{ij} = \beta_{0} + \sum \beta_{i}x_{i} + \epsilon_{ij}, \qquad x_{i} = 0,1\]
\[\epsilon_{ij} \sim N(0, \sigma^{2})\]
Does your model explain variation in the data?
Are your coefficients different from 0?
How much variation is retained by the model?
How confident can you be in model predictions?
Ho = The model predicts no variation in the data.
Ha = The model predicts variation in the data.
\[H_{0} = \mu_{1} = \mu{2} = \mu{3} = ...\]
OR
\[\beta_{0} = \mu, \qquad \beta_{i} = 0\]
Data Generating Process:\[\beta_{0} + \sum \beta_{i}x_{i}\]
VERSUS
Error Generating Process \[\epsilon_i \sim N(0,\sigma)\]
\(SS_{Total} = SS_{Between} + SS_{Within}\)
(Regression: \(SS_{Total} = SS_{Model} + SS_{Error}\) )
\(SS_{Between} = \sum_{i}\sum_{j}(\bar{Y_{i}} - \bar{Y})^{2}\), df=k-1
\(SS_{Within} = \sum_{i}\sum_{j}(Y_{ij} - \bar{Y_{i}})^2\), df=n-k
To compare them, we need to correct for different DF. This is the Mean Square.
MS = SS/DF, e.g, \(MS_{W} = \frac{SS_{W}}{n-k}\)
\(F = \frac{MS_{B}}{MS_{W}}\) with DF=k-1,n-k
(note similarities to \(SS_{R}\) and \(SS_{E}\) notation of regression)
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
group | 2 | 0.5402533 | 0.2701267 | 7.823136 | 0.0012943 |
Residuals | 42 | 1.4502267 | 0.0345292 | NA | NA |
Is using ANOVA valid?
Independence of data points
Normality within groups (of residuals)
No relationship between fitted and residual values
Homoscedasticity (homogeneity of variance) of groups
Df | F value | Pr(>F) | |
---|---|---|---|
group | 2 | 1.006688 | 0.3740735 |
42 | NA | NA |
Levene’s test robust to departures from normality
Nonparametric Kruskal-Wallace (uses ranks)
log(x+1) or otherwise transform
GLM with ANODEV (two weeks!)
statistic | p.value | parameter | method |
---|---|---|---|
13.1985 | 0.0014 | 2 | Kruskal-Wallis rank sum test |