class: center, middle background-image: url(images/01/Sonoma_coast.jpg) background-position: center background-size: cover # .large[Introduction to Computational Data Analysis for Biology] ## .center[2025 Spring Edition] ### .center[Jarrett Byrnes] UMass Boston https://biol607.github.io/ --- class: center # Why are we here? data:image/s3,"s3://crabby-images/5ab53/5ab53da31ba96cde54ab867d5880a094eec18462" alt=":scale 50%" --- # Who are You? 1. Name 2. Lab 3. Brief research description 4. Why are you here? -- .center[Write it here: https://etherpad.wikimedia.org/p/607-intro-2025] --- # Course Goals 1. Learn how to think about your research in a systematic way to design efficient observational & experimental studies. <br><br> -- 2. Understand how to get the most bang for your buck from your data. <br><br> -- 3. Make you effective collaborators with statisticians. <br><br> -- 4. Learn how to program to expand your scientific toolkit. <br><br> -- 5. Make you comfortable enough to learn and grow beyond this class. --- # What are we doing here? ## Course divided into blocks -- 1. Introduction to computation and reproducibility -- 2. Linear Models for Data Analysis -- 3. Experimental and Observational Study Design and Causal Inference -- 4. Drawing Inference from Studies --- # Block 1: Computation ``` r # Load the library #### library(ggplot2) # Load the data #### eelgrass <- read.csv("./data/15q05EelgrassGenotypes.csv") # Plot #### ggplot(eelgrass, aes(y = shoots, x = treatment.genotypes)) + geom_point() + stat_smooth(method = "lm") + theme_classic(base_size = 17) + labs(x = "No. of Genotypes", y = "No. of Shoots per sq. m.") ``` -- .center[.large[.red[Coding is power!]]] -- .center[.large[.red[Code Forces You to Be Explicit About Biology]]] --- class:center # Block 1: Reproducibility data:image/s3,"s3://crabby-images/54372/543729c547f9b389b138fbc98e6a58375e8b4576" alt="" --- class:center # Furthering Open Science data:image/s3,"s3://crabby-images/2000f/2000f92fcc18aaaa84419ffd613a246e7561935f" alt="https://www.4open-sciences.org/component/content/article/11-news/276-four-pillars-of-open-science-open-code" --- # Block 2: Linear Models for Data Analysis data:image/s3,"s3://crabby-images/f881b/f881bb230062b8294260e2dc31bea8d140d57598" alt=""<!-- --> --- # Block 2: Linear Models for Data Analysis data:image/s3,"s3://crabby-images/73cb0/73cb060c1cc2121c23de1b5c87a0a732099d96d5" alt=""<!-- --> -- .large[.center[Yes, this is also a line!]] --- # Block 2: Linear Models for Data Analysis data:image/s3,"s3://crabby-images/c2a8e/c2a8e43632b1a78805caff1d772e6993f002c8ba" alt="" -- .large[.center[Also a line!]] --- class:center # Block 3: Causal Inference & Study Design data:image/s3,"s3://crabby-images/b36cb/b36cb100e56f2fd0a19b1b9b9fb0ec14d903db44" alt="" --- # Block 3: Causal Inference & Study Design data:image/s3,"s3://crabby-images/fc152/fc152f549f96fb24e3f71110306e73b2e4d97779" alt="" --- # Block 4: Inference .center[data:image/s3,"s3://crabby-images/b67cf/b67cf95d1f378be5532c2cd9effa834582d1cb50" alt=":scale 50%"] - What is the probability of a hypothesis? Or data given a hypothesis? - What's the predictive power of your model? - How can we generalize from our models to the world? --- # Lecture and Lab - T/Th Lecture on Concepts - Also Paper Discussion, Shiny Apps, etc. - Please bring your most interactive self! - I will try and make it easy for folk on Zoom - F Lab - Live coding! - I will screw up - don't take me as gospel! - Be generous with feedback/pace comments - Invite your friends! --- # Yes, Lectures are Coded R Markdown sometimes with Reveal.js or Xarnigan or Quarto .center[<img src="images/01/lecture_code.jpg">] http://github.com/biol607/biol607.github.io --- # Some Old Technology .center[ data:image/s3,"s3://crabby-images/750cc/750cc3900a0b4ebd029ff82d6c05ef1bb4eeee8e" alt="" ] - Green: Party on, Wayne -- - Red: I fell off the understanding wagon -- - Blue: Write a question/Other --- class: center # Readings for Class: Fieberg data:image/s3,"s3://crabby-images/54a33/54a33777a70a29b286736732d8b6d6d99c915a24" alt=":scale 45%" .left[Feiberg, J. 2022. Statistics for Ecologists.] ### https://statistics4ecologists-v3.netlify.app --- class: center # Help John Out! Annotate His Book! data:image/s3,"s3://crabby-images/54a33/54a33777a70a29b286736732d8b6d6d99c915a24" alt=":scale 50%" ## https://hypothes.is/signup --- class: center # Readings for Class:<br>Wickham & Grolemund data:image/s3,"s3://crabby-images/bae81/bae81f3c2de206518c60df7e43eed954e3478957" alt=":scale 35%" .left[Wickham, H. Çetinkaya-Rundel, M. and Grolemund, G., 2023. R for Data Science.] https://r4ds.hadley.nz/ --- class:center # There will be memes data:image/s3,"s3://crabby-images/8a485/8a485946960d741f4472a682ca87dbf656596041" alt=":scale 50%" -- .large[please feed my #statsmeme addiction] --- # And Now, A Pop Quiz! (I kid! I kid!) <br><br><center> <div style="font-size: 2em;font-weight: bold;">http://tinyurl.com/firstPopQuiz</div> </center><br><br> --- # My Actual Policy on Grading .center[ data:image/s3,"s3://crabby-images/ad165/ad16582096fc2348a73c9cb6afc7e75406892631" alt="" ] --- # Problem Sets - THE MOST IMPORTANT THING YOU DO - Adapted from many sources - Will often require R - Complete them using Quarto/Rmarkdown - Submit via Canvas --- # Midterm - Advanced problem set - After Regression. Probably. --- # Final Project - Topic of your choosing - Your data, public data, any data! - Make it dissertation relevant! - If part of submitted manuscript, I will retroactively raise your grade - Dates - Proposal Due March 14 - Presentations on May 16th - Paper due May 22nd (but earlier fine!) --- # Impress Yourself: Use Github .center[data:image/s3,"s3://crabby-images/1f13c/1f13c6c688f8d0ca4b6b88a56afd301b4fa36b8e" alt=":scale 50%"] - This whole class is a github repo - Having a github presence is becoming a real advantage - So.... create a class repository! - folder for homework, folder for exams, folder for labs - If you submit a link to your homework in a repo, +1 per homework! - I am happy to hold a github tutorial outside of class hours --- # Life La Vida Data Science - Check out http://www.r-bloggers.com/ and https://rweekly.org/ - Listen to podcasts like https://itunes.apple.com/us/podcast/not-so-standard-deviations/ - Start going to local R User Groups like https://www.meetup.com/Boston-useR/ - Follow data science greats on BlueSky (see the [#rstats feed](https://bsky.app/profile/andrew.heiss.phd/feed/aaaeckvqc3gzg)) - Bring up cool things in the UMBRug slack --- # Help your fellow students data:image/s3,"s3://crabby-images/2773f/2773fb8bbcadce6fea7b5a99063cf541d44b84a1" alt="" - Having a problem during homework/exam/etc? - First, try and solve it yourself (google, stackoverflow, etc.) - Post a REPRODUCIBLE EXAMPLE to our slack channel - I notice if you post before I do! --- # Become Part of the Conversation ~~Stats and R on Twitter: https://bit.ly/stats_r_twitter~~ Stats and R on Bluesky: https://bit.ly/bsky_rstats .center[data:image/s3,"s3://crabby-images/f6bb2/f6bb2b78a891861543102cbfb304f6c255dad473" alt=":scale 50%"] --- # Welcome! <br><br> .center[.middle[data:image/s3,"s3://crabby-images/a0665/a066508bc980e88e93b55a76d27d6889bc3e3637" alt=""]]