Correlation And Regression Homework

Note: Datasets are available at http://whitlockschluter.zoology.ubc.ca/data so you don’t have to type anything in (and have to load it!)

1. Correlation - W&S Chapter 16

Data at https://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16q15LanguageGreyMatter.csv

2. Correlation - W&S Chapter 16

Data at https://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16q19LiverPreparation.csv

3. Correlation SE

Consider the following dataset:

cats	happiness_score
-0.30	-0.57
0.42	-0.10
0.85	-0.04
-0.45	-0.29
0.22	0.42
-0.12	-0.92
1.46	0.99
-0.79	-0.62
0.40	1.14
-0.07	0.33

3a.

Are these two variables correlated? What is the output of cor() here. What does a test show you?

3b.

What is the SE of the correlation based on the info from cor.test()

3c.

Now, what is the SE via simulation? To do this, you’ll need to use cor() and get the relevant parameter from the output (remember - you get a matrix back, so, what’s the right index!), replicate(), and sample() or dplyr::sample_n() with replace=TRUE to get, let’s say, 1000 correlations. How does this compare to your value above?

## [1] 0.1608964

4. W&S Chapter 17

Data at https://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter17/chap17q19GrasslandNutrientsPlantSpecies.csv

5. W&S Chapter 17-25

https://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter17/chap17q25BeetleWingsAndHorns.csv

Do any other diagnostics misbehave?

6. W&S Chapter 17-30

https://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter17/chap17q30NuclearTeeth.csv

Using predict() and geom_ribbon() in ggplot2, reproduce the above plot showing data, fit, fit interval, and prediction interval.

EC. Intervals and simulation

Fit the deet and bites model from lab.

Now, look at vcov() applied to your fit. For example:

deet_mod <- lm(bites ~ dose, data = deet)

vcov(deet_mod)

##             (Intercept)         dose
## (Intercept)  0.09929780 -0.025986850
## dose        -0.02598685  0.007437073

What you have here is the variance-covariance matrix of the parameters of the model. In essence, every time you larger slopes in this case will have smaller intercepts, and vice-verse. This maintains the best fit possible, despite deviations in the slope and intercept. BUT - what’s cool about this is that it also allows us to produce simulations (posterior simulations for anyone interested) of the fit. We can use a package like mnormt that let’s us draw from a multivariate normal distribution when provided with a vcov matrix. For example…

library(mnormt)

rmnorm(4, mean = coef(deet_mod), varcov = vcov(deet_mod))

##      (Intercept)       dose
## [1,]    3.541639 -0.3524774
## [2,]    4.061716 -0.5296708
## [3,]    4.010105 -0.4772817
## [4,]    4.112800 -0.4790570

produces a number of draws of the variance and the covariance!

ECa. Fit simulations!

Using geom_abline() make a plot that has the following layers and shows that these simulated lines match up well with the fit CI. 1) the data, 2) the lm fit with a CI, and 3) simulated lines. You might have to much around to make it look as good as possible.

ECb. Prediction simulations!

That’s all well and good, but what about the prediction intervals? To each line, we can add some error drawn from the residual standard deviation. That residual can either be extracted from summary() or you can get the sd of residuals.

Now, visualize the simulated prediction interval around the fit versus the calculated prediction interval around the fit via predict. +1 extra credit for a clever visualization of all elements on one figure - however you would like