Consider the following vasopressin levels in voles.
vole_vaso <- c(98,96,94,88,86,82,77,74,70,60,
59,52,50,47,40,35,29,13,6,5)
1a. Say “Vole vasopressin” 10 times as fast as you can. How many times did you trip up?
1b. What is the mean, median, sd, and interquartile range of the sample?
1c. What is the standard error of the mean (do this with a formula!)?
1d. What does the standard error of the mean tell you about our estimate of the mean values of the population of vole vassopressin?
We can get the upper quartile value of vole vassopressin with
quantile(vole_vaso, probs = 0.75)
Let’s assume the sample is representative of the popultion.
2a. Use sample()
to get just one resample with a sample size of 10. What is its upper quartile?
2b. Build an initial data frame for simulations with the sample sizes 5 through 20.
2c. Use this data frame to get simulated upper quartiles at each sample size 1,000 times (i.e., for 1,000 simulations).
2d. With a ggplot, make a guesstimate as to the best sample size for estimating the upper quartile of the population. Use whatever geom you feel makes things most easy to see. E.C. Add a red dashed line using geom_vline()
or geom_hline()
to show where that should be, perhaps.
2e. Plot the SE of the estimate of the upper quantile by sample size. Again, what it the best way to see this? Does it level off? Is there a level you feel acceptable? Justify your answer. Does this match with what you put in 3d?
A little while back, Dave Curran using some of the code I’d posted from a previous 607 lab made a wonderful animation of change in arctic sea ice.
He used data from
ftp://sidads.colorado.edu/DATASETS/NOAA/G02135/north/daily/data/NH_seaice_extent_final_v2.csv ftp://sidads.colorado.edu/DATASETS/NOAA/G02135/north/daily/data/NH_seaice_extent_nrt_v2.csv
I’m providing you with a cleaned form of his data (his code is here) for you to work with in a few plots. The data file is called NH_seaice_extent_monthly_1978_2016.csv
3a. Some setup. Run the code below. For extra credit, look up the packages and functions used and explain what is going on here. But, that’s EC.
#libraries
library(dplyr)
library(readr)
library(ggplot2)
library(forcats)
theme_set(theme_bw(base_size=12))
ice <- read_csv("http://biol607.github.io/homework/data/NH_seaice_extent_monthly_1978_2016.csv") %>%
mutate(Month_Name = factor(Month_Name),
Month_Name = fct_reorder(Month_Name, Month))
3b. Make a boxplot showing the variability in sea ice extent every month.
*3c. Use dplyr
to get the annual minimum sea ice extent. Plot minimum ice by year. What do you observe?
3d. One thing that’s really cool about faceting is that you can use cut_*()
functions on continuos variables to make facets by groups of continuous variables. To see what I mean, try cut_interval(1:10, n = 5)
See how it makes five bins of even width? We use cut_interval()
or other cut functions with faceting like so facet_wrap(~cut_interval(some_variable))
.
With the original data, plot sea ice by year, with different lines (oh! What geom will you need for that?) for different months. Then, use facet_wrap
and cut_interval(Month, n=4)
to split the plot into seasons.
3e. Last, make a line plot of sea ice by month with different lines as different years. Gussy it up with colors by year, a different theme, critical values, and whatever other annotations, changes to axes, etc., you think best show the story of this data. For ideas, see the lab, and look at various palettes around. Extra credit for using colorfindr to make a palette.
3f. Extra Credit. Make it animated with gganimate. Just like above.
3g. Extra Credit. Use the data and make something wholly new and awesome. Even extra extra credit for something amazing animated.
4. Extra Credit x 3 Participate in this week or last week’s tidy tuesday (and see that link to learn waht it is). See here for data and schedule. Not only include what you do and your code here, but also include a link to where you tweet our your entry with a link to the code and the #tidytuesday
hashtag.