https://etherpad.wikimedia.org/p/607-intro-2018
This class will use collaborative note-taking
Research shows that this enhances learning!
It’s also a way to ask me a question during class
#------------
# Split Data into Train/Test
#------------
keeley_train <- keeley[1:80,]
keeley_test <- keeley[81:90,]
#------------
# Random Forest Models
#------------
rf1 <- randomForest(rich ~ cover + firesev +
hetero, data = keeley_train)
rf2 <- randomForest(cover ~ firesev + age +
abiotic + elev, data = keeley_train)
Code Forces You to Be Explicit About Theory
#------------
# Split Data into Train/Test
#------------
keeley_train <- keeley[1:80,]
keeley_test <- keeley[81:90,]
#------------
# Random Forest Models
#------------
rf1 <- randomForest(rich ~ cover + firesev +
hetero, data = keeley_train)
rf2 <- randomForest(cover ~ firesev + age +
abiotic + elev, data = keeley_train)
Coding is power
#------------
# Split Data into Train/Test
#------------
keeley_train <- keeley[1:80,]
keeley_test <- keeley[81:90,]
#------------
# Random Forest Models
#------------
rf1 <- randomForest(rich ~ cover + firesev +
hetero, data = keeley_train)
rf2 <- randomForest(cover ~ firesev + age +
abiotic + elev, data = keeley_train)
Repeatable Research
Data (acquisition)
How do I get good data here?
##
## Call:
## lm(formula = shoots ~ treatment.genotypes, data = eelgrass)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.473 -10.723 -1.299 8.955 35.701
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.664 5.324 5.760 2.73e-06 ***
## treatment.genotypes 4.635 1.401 3.308 0.00245 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.39 on 30 degrees of freedom
## Multiple R-squared: 0.2672, Adjusted R-squared: 0.2428
## F-statistic: 10.94 on 1 and 30 DF, p-value: 0.002449
What is your model(s)?
THEN decide on statistical approach
Can you get data to paramaterize that model?
How does biology inform your modeled results?
Name
Lab
Brief research description
Why are you here?
Whitlock, W.C. and Schluter, D. (2014) The Analysis of Biological Data, 2nd Edition.
http://whitlockschluter.zoology.ubc.ca/
Chapter 1 this week!
Grolemund, G., and Wickham, W. 2016. R for Data Science.
http://r4ds.had.co.nz
Before and After Class
Measures understanding - and attendance!
Will drop lowest two
10% of your grade
Advanced problem set
Due Nov 2nd
20% of your grade
Does seagrass genetic diversity increase productivity?
Literature
Observation
Disciplinary History
Fit a model(s), chosen to suit data & error generating process!
## `geom_smooth()` using formula 'y ~ x'
Many Methods of Sharing Data, Methods, and Results Beyond Publication
GitHub - public code repository
FigShare - share key figures, get a doi
Blog - open ‘notebook’
Dryad or Other Repository - post-publication data sharing