Note: Datasets are available at http://whitlockschluter.zoology.ubc.ca/data so you don’t have to type anything in (and have to load it!)
Data at https://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16q15LanguageGreyMatter.csv
Data at https://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16q19LiverPreparation.csv
Consider the following dataset:
cats | happiness_score |
---|---|
-0.30 | -0.57 |
0.42 | -0.10 |
0.85 | -0.04 |
-0.45 | -0.29 |
0.22 | 0.42 |
-0.12 | -0.92 |
1.46 | 0.99 |
-0.79 | -0.62 |
0.40 | 1.14 |
-0.07 | 0.33 |
Are these two variables correlated? What is the output of cor()
here. What does a test show you?
What is the SE of the correlation based on the info from cor.test()
Now, what is the SE via simulation? To do this, you’ll need to use cor()
and get the relevant parameter from the output (remember - you get a matrix back, so, what’s the right index!), replicate()
, and sample()
or dplyr::sample_n()
with replace=TRUE
to get, let’s say, 1000 correlations. How does this compare to your value above?
## [1] 0.1608964
https://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter17/chap17q25BeetleWingsAndHorns.csv
https://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter17/chap17q30NuclearTeeth.csv
predict()
and geom_ribbon()
in ggplot2, reproduce the above plot showing data, fit, fit interval, and prediction interval.Fit the deet and bites model from lab.
Now, look at vcov()
applied to your fit. For example:
deet_mod <- lm(bites ~ dose, data = deet)
vcov(deet_mod)
## (Intercept) dose
## (Intercept) 0.09929780 -0.025986850
## dose -0.02598685 0.007437073
What you have here is the variance-covariance matrix of the parameters of the model. In essence, every time you larger slopes in this case will have smaller intercepts, and vice-verse. This maintains the best fit possible, despite deviations in the slope and intercept. BUT - what’s cool about this is that it also allows us to produce simulations (posterior simulations for anyone interested) of the fit. We can use a package like mnormt
that let’s us draw from a multivariate normal distribution when provided with a vcov matrix. For example…
library(mnormt)
rmnorm(4, mean = coef(deet_mod), varcov = vcov(deet_mod))
## (Intercept) dose
## [1,] 3.541639 -0.3524774
## [2,] 4.061716 -0.5296708
## [3,] 4.010105 -0.4772817
## [4,] 4.112800 -0.4790570
produces a number of draws of the variance and the covariance!
Using geom_abline()
make a plot that has the following layers and shows that these simulated lines match up well with the fit CI. 1) the data, 2) the lm fit with a CI, and 3) simulated lines. You might have to much around to make it look as good as possible.
That’s all well and good, but what about the prediction intervals? To each line, we can add some error drawn from the residual standard deviation. That residual can either be extracted from summary()
or you can get the sd
of residuals
.
Now, visualize the simulated prediction interval around the fit versus the calculated prediction interval around the fit via predict
. +1 extra credit for a clever visualization of all elements on one figure - however you would like