class: middle, center background-color: #a95c68 # Sampling Estimates, Precision, and Simulation  ### Biol 607 --- # Estimation and Precision .large[ 1. Probability Distributions and Population Parameter Estimates 2. Simulation, Precision, and Sample Size Estimation 3. Bootstrapping our Way to Confidence ] --- # Last Time: Sample Versus Population <img src="04_simulation_estimation_x_files/figure-html/samp_pop_plot-1.png" style="display: block; margin: auto;" /> --- # Sample Properties: **Mean** `$$\bar{x} = \frac{ \displaystyle \sum_{i=1}^{n}{x_{i}} }{n}$$` `\(\large \bar{x}\)` - The average value of a sample `\(x_{i}\)` - The value of a measurement for a single individual n - The number of individuals in a sample `\(\mu\)` - The average value of a population (Greek = population, Latin = Sample) --- class: center, middle # Our goal is to get samples representative of a population, and estimate population parameters. We assume a **distribution** of values to the population. --- # Probability Distributions <img src="04_simulation_estimation_x_files/figure-html/normplot-1.png" style="display: block; margin: auto;" /> --- # Probability Distributions Come in Many Shapes <img src="04_simulation_estimation_x_files/figure-html/dists-1.png" style="display: block; margin: auto;" /> --- # The Normal (Gaussian) Distribution <img src="04_simulation_estimation_x_files/figure-html/normplot-1.png" style="display: block; margin: auto;" /> - Arises from some deterministic value and many small additive deviations - VERY common --- class: center # Understanding Gaussian Distributions with a Galton Board (Quinqunx) <video controls loop><source src="04_simulation_estimation_x_files/figure-html/quincunx.webm" /></video> --- # We see this pattern everywhere - the Random or Drunkard's Walk ```r one_path <- function(steps){ each_step <- c(0, runif(steps, min = -1, max = 1)) path <- cumsum(each_step) return(path) } ``` -- 1. Input some number of steps to take 2. Make a vector of 0 and a bunch of random numbers from -1 to 1 3. Take the cummulative sum across the vector to represent the path 4. Return the path vector --- # 1000 Simulated Random Walks to Normal Homes <img src="04_simulation_estimation_x_files/figure-html/all_walks-1.png" style="display: block; margin: auto;" /> --- # A Normal Result for Final Position <img src="04_simulation_estimation_x_files/figure-html/final_walk-1.png" style="display: block; margin: auto;" /> --- # Normal distributions <img src="04_simulation_estimation_x_files/figure-html/normplot-1.png" style="display: block; margin: auto;" /> - Results from additive accumulation of many small errors - Defined by a mean and standard deviation: `\(N(\mu, \sigma)\)` - 2/3 of data is within 1 SD of the mean - Values are peaked without **skew** (skewness = 0) - Tails are neither too thin nor too fat (**kurtosis** = 0) --- # Estimation and Precision .large[ 1. Probability Distributions and Population Parameter Estimates 2. .red[Simulation, Precision, and Sample Size Estimation] 3. Bootstrapping our Way to Confidence ] --- class: middle, center # The Eternal Question: What should my sample size be? --- # Let's find out .large[ 1. Get in groups of 3 <br><br> 2. Ask each other your age. Report the mean to me.<br><br> 3. Now get into another group of five, and do the same.<br><br> 4. Now get into another group of ten, and do the same.<br><br> ] --- class: center, middle # We simulated sampling from our class population! --- # What if We Could Pretend to Sample? .large[ - Assume the distribution of a population - Draw simulated 'samples' from the population at different sample sizes - Examine when an estimated property levels off or precision is sufficient - Here we define Precision as 1/variance at a sample size ] --- background-image: url(images/04/is-this-a-simulation.jpg) background-position: center background-size: cover --- background-image: url(images/04/firefly-ship.jpg) background-position: center background-size: cover class: center # .inverse[Let's talk Firefly] --- background-image: url(images/04/fireflies-1500x1000.jpg) background-position: center background-size: cover --- # Start With a Population... Mean of Firefly flashing times: 95.9428571 SD of Firefly flasing times: 10.9944982 -- So assuming a normal distribution... -- <img src="04_simulation_estimation_x_files/figure-html/fireflydist-1.png" style="display: block; margin: auto;" /> --- # Choose a Random Sample - n=5? Mean of Firefly flashing times: 95.9428571 SD of Firefly flasing times: 10.9944982 So assuming a normal distribution... <img src="04_simulation_estimation_x_files/figure-html/fireflydistPoints-1.png" style="display: block; margin: auto;" /> --- # Calculate Sample Mean Mean of Firefly flashing times: 95.9428571 SD of Firefly flasing times: 10.9944982 So assuming a normal distribution... <img src="04_simulation_estimation_x_files/figure-html/fireflydistMean-1.png" style="display: block; margin: auto;" /> -- Rinse and repeat... --- # How Good is our Sample Size for Estimating a Mean? <img src="04_simulation_estimation_x_files/figure-html/plot_dist_sim-1.png" style="display: block; margin: auto;" /> --- # Where does the variability level off? <img src="04_simulation_estimation_x_files/figure-html/dist_sim_stop-1.png" style="display: block; margin: auto;" /> --- # Many Ways to Visualize <img src="04_simulation_estimation_x_files/figure-html/bin2d-1.png" style="display: block; margin: auto;" /> --- # Note the Decline in SE with Increasing Sample Size <img src="04_simulation_estimation_x_files/figure-html/se_and_n-1.png" style="display: block; margin: auto;" /> --- # Where does the variability level off? <table class=" lightable-minimal" style='font-family: "Trebuchet MS", verdana, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:right;"> sampSize </th> <th style="text-align:right;"> mean_sim_sd </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 7.936883 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 6.327210 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 5.474186 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 4.840796 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 4.507274 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 4.160736 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 3.851500 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 3.686278 </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 3.586899 </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 3.208227 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 3.291734 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 3.021824 </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> 3.106807 </td> </tr> <tr> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 2.809582 </td> </tr> <tr> <td style="text-align:right;"> 16 </td> <td style="text-align:right;"> 2.629675 </td> </tr> <tr> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 2.687936 </td> </tr> <tr> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 2.689179 </td> </tr> <tr> <td style="text-align:right;"> 19 </td> <td style="text-align:right;"> 2.491364 </td> </tr> <tr> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 2.445601 </td> </tr> <tr> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 2.387049 </td> </tr> <tr> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 2.311978 </td> </tr> <tr> <td style="text-align:right;"> 23 </td> <td style="text-align:right;"> 2.329286 </td> </tr> <tr> <td style="text-align:right;"> 24 </td> <td style="text-align:right;"> 2.288746 </td> </tr> <tr> <td style="text-align:right;"> 25 </td> <td style="text-align:right;"> 2.150137 </td> </tr> <tr> <td style="text-align:right;"> 26 </td> <td style="text-align:right;"> 2.120694 </td> </tr> <tr> <td style="text-align:right;"> 27 </td> <td style="text-align:right;"> 2.115545 </td> </tr> <tr> <td style="text-align:right;"> 28 </td> <td style="text-align:right;"> 2.019858 </td> </tr> <tr> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 2.026664 </td> </tr> <tr> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 2.020684 </td> </tr> <tr> <td style="text-align:right;"> 31 </td> <td style="text-align:right;"> 1.981103 </td> </tr> <tr> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> 1.948123 </td> </tr> <tr> <td style="text-align:right;"> 33 </td> <td style="text-align:right;"> 1.916272 </td> </tr> <tr> <td style="text-align:right;"> 34 </td> <td style="text-align:right;"> 1.827955 </td> </tr> <tr> <td style="text-align:right;"> 35 </td> <td style="text-align:right;"> 1.865422 </td> </tr> <tr> <td style="text-align:right;"> 36 </td> <td style="text-align:right;"> 1.865438 </td> </tr> <tr> <td style="text-align:right;"> 37 </td> <td style="text-align:right;"> 1.783530 </td> </tr> <tr> <td style="text-align:right;"> 38 </td> <td style="text-align:right;"> 1.877738 </td> </tr> <tr> <td style="text-align:right;"> 39 </td> <td style="text-align:right;"> 1.774608 </td> </tr> <tr> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 1.719316 </td> </tr> <tr> <td style="text-align:right;"> 41 </td> <td style="text-align:right;"> 1.662770 </td> </tr> <tr> <td style="text-align:right;"> 42 </td> <td style="text-align:right;"> 1.764975 </td> </tr> <tr> <td style="text-align:right;"> 43 </td> <td style="text-align:right;"> 1.635123 </td> </tr> <tr> <td style="text-align:right;"> 44 </td> <td style="text-align:right;"> 1.655396 </td> </tr> <tr> <td style="text-align:right;"> 45 </td> <td style="text-align:right;"> 1.631216 </td> </tr> <tr> <td style="text-align:right;"> 46 </td> <td style="text-align:right;"> 1.701323 </td> </tr> <tr> <td style="text-align:right;"> 47 </td> <td style="text-align:right;"> 1.591383 </td> </tr> <tr> <td style="text-align:right;"> 48 </td> <td style="text-align:right;"> 1.608923 </td> </tr> <tr> <td style="text-align:right;"> 49 </td> <td style="text-align:right;"> 1.525732 </td> </tr> <tr> <td style="text-align:right;"> 50 </td> <td style="text-align:right;"> 1.529716 </td> </tr> <tr> <td style="text-align:right;"> 51 </td> <td style="text-align:right;"> 1.544192 </td> </tr> <tr> <td style="text-align:right;"> 52 </td> <td style="text-align:right;"> 1.525335 </td> </tr> <tr> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 1.502529 </td> </tr> <tr> <td style="text-align:right;"> 54 </td> <td style="text-align:right;"> 1.518598 </td> </tr> <tr> <td style="text-align:right;"> 55 </td> <td style="text-align:right;"> 1.456144 </td> </tr> <tr> <td style="text-align:right;"> 56 </td> <td style="text-align:right;"> 1.457945 </td> </tr> <tr> <td style="text-align:right;"> 57 </td> <td style="text-align:right;"> 1.486405 </td> </tr> <tr> <td style="text-align:right;"> 58 </td> <td style="text-align:right;"> 1.436291 </td> </tr> <tr> <td style="text-align:right;"> 59 </td> <td style="text-align:right;"> 1.447924 </td> </tr> <tr> <td style="text-align:right;"> 60 </td> <td style="text-align:right;"> 1.376146 </td> </tr> <tr> <td style="text-align:right;"> 61 </td> <td style="text-align:right;"> 1.395309 </td> </tr> <tr> <td style="text-align:right;"> 62 </td> <td style="text-align:right;"> 1.405929 </td> </tr> <tr> <td style="text-align:right;"> 63 </td> <td style="text-align:right;"> 1.381348 </td> </tr> <tr> <td style="text-align:right;"> 64 </td> <td style="text-align:right;"> 1.391418 </td> </tr> <tr> <td style="text-align:right;"> 65 </td> <td style="text-align:right;"> 1.388627 </td> </tr> <tr> <td style="text-align:right;"> 66 </td> <td style="text-align:right;"> 1.371785 </td> </tr> <tr> <td style="text-align:right;"> 67 </td> <td style="text-align:right;"> 1.295773 </td> </tr> <tr> <td style="text-align:right;"> 68 </td> <td style="text-align:right;"> 1.351971 </td> </tr> <tr> <td style="text-align:right;"> 69 </td> <td style="text-align:right;"> 1.333114 </td> </tr> <tr> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 1.259326 </td> </tr> <tr> <td style="text-align:right;"> 71 </td> <td style="text-align:right;"> 1.305291 </td> </tr> <tr> <td style="text-align:right;"> 72 </td> <td style="text-align:right;"> 1.259287 </td> </tr> <tr> <td style="text-align:right;"> 73 </td> <td style="text-align:right;"> 1.293420 </td> </tr> <tr> <td style="text-align:right;"> 74 </td> <td style="text-align:right;"> 1.285016 </td> </tr> <tr> <td style="text-align:right;"> 75 </td> <td style="text-align:right;"> 1.309955 </td> </tr> <tr> <td style="text-align:right;"> 76 </td> <td style="text-align:right;"> 1.254518 </td> </tr> <tr> <td style="text-align:right;"> 77 </td> <td style="text-align:right;"> 1.197130 </td> </tr> <tr> <td style="text-align:right;"> 78 </td> <td style="text-align:right;"> 1.274403 </td> </tr> <tr> <td style="text-align:right;"> 79 </td> <td style="text-align:right;"> 1.248012 </td> </tr> <tr> <td style="text-align:right;"> 80 </td> <td style="text-align:right;"> 1.245567 </td> </tr> <tr> <td style="text-align:right;"> 81 </td> <td style="text-align:right;"> 1.226677 </td> </tr> <tr> <td style="text-align:right;"> 82 </td> <td style="text-align:right;"> 1.233833 </td> </tr> <tr> <td style="text-align:right;"> 83 </td> <td style="text-align:right;"> 1.211915 </td> </tr> <tr> <td style="text-align:right;"> 84 </td> <td style="text-align:right;"> 1.212577 </td> </tr> <tr> <td style="text-align:right;"> 85 </td> <td style="text-align:right;"> 1.171858 </td> </tr> <tr> <td style="text-align:right;"> 86 </td> <td style="text-align:right;"> 1.157911 </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 1.175309 </td> </tr> <tr> <td style="text-align:right;"> 88 </td> <td style="text-align:right;"> 1.175240 </td> </tr> <tr> <td style="text-align:right;"> 89 </td> <td style="text-align:right;"> 1.136571 </td> </tr> <tr> <td style="text-align:right;"> 90 </td> <td style="text-align:right;"> 1.119858 </td> </tr> <tr> <td style="text-align:right;"> 91 </td> <td style="text-align:right;"> 1.069883 </td> </tr> <tr> <td style="text-align:right;"> 92 </td> <td style="text-align:right;"> 1.168005 </td> </tr> <tr> <td style="text-align:right;"> 93 </td> <td style="text-align:right;"> 1.131141 </td> </tr> <tr> <td style="text-align:right;"> 94 </td> <td style="text-align:right;"> 1.131014 </td> </tr> <tr> <td style="text-align:right;"> 95 </td> <td style="text-align:right;"> 1.119265 </td> </tr> <tr> <td style="text-align:right;"> 96 </td> <td style="text-align:right;"> 1.079255 </td> </tr> <tr> <td style="text-align:right;"> 97 </td> <td style="text-align:right;"> 1.109223 </td> </tr> <tr> <td style="text-align:right;"> 98 </td> <td style="text-align:right;"> 1.111187 </td> </tr> <tr> <td style="text-align:right;"> 99 </td> <td style="text-align:right;"> 1.075744 </td> </tr> <tr> <td style="text-align:right;"> 100 </td> <td style="text-align:right;"> 1.071547 </td> </tr> </tbody> </table> --- # Visualize Variability in Estimate of Mean <img src="04_simulation_estimation_x_files/figure-html/sim_prec-1.png" style="display: block; margin: auto;" /> -- .large[What is acceptable to you? And/or relative to the Mean?] --- class: center, middle # **Central Limit Theorem** The distribution of means of a sufficiently large sample size will be approximately normal https://istats.shinyapps.io/sampdist_cont/ https://istats.shinyapps.io/SampDist_discrete/ --- class: center, middle # The Standard Error <br><br> .large[ A standard error is the standard deviation of an estimated parameter if we were able to sample it repeatedly. ] --- # But, I only Have One Sample? How can I know my SE for my mean other parameters? <img src="04_simulation_estimation_x_files/figure-html/firehist-1.png" style="display: block; margin: auto;" /> --- # Estimation and Precision .large[ 1. Probability Distributions and Population Parameter Estimates 2. Simulation, Precision, and Sample Size Estimation 3. .red[Bootstrapping our Way to Confidence] ] --- # The Bootstrap .large[ - We can resample our sample some number of times with replacement - This resampling with replacement is called **bootstrapping** - One replicate simulation is one **bootstrap** ] --- # One Bootstrap Sample in R ```r # One bootstrap sample(firefly$flash.ms, size = nrow(firefly), replace = TRUE) ``` ``` [1] 86 94 82 113 101 89 112 96 94 103 101 98 85 98 86 119 116 96 98 [20] 85 86 101 86 86 103 118 116 82 90 95 87 109 89 96 88 ``` --- # Boostrapped Estimate of a SE - We can calculate the Standard Deviation of a sample statistic from replicate bootstraped samples - This is called the botstrapped **Standard Error** of the estimate ```r one_boot <- function(){ sample(firefly$flash.ms, size = nrow(firefly), replace = TRUE) } boot_means <- replicate(1000, mean(one_boot())) sd(boot_means) ``` ``` [1] 1.770777 ``` --- class: large # So I always have to boostrap Standard Errors? -- ## .center[**No**] -- Many common estimates have formulae, e.g.: `$$SE_{mean} = \frac{s}{\sqrt(n)}$$` -- Boot SEM: Boot SEM: 1.771, Est. SEM: 2 -- .center[(but for medians, etc., yes )<br><br>https://istats.shinyapps.io/Boot1samp/] --- # Bootstrapping to Estimate Precision of a Non-Standard Metric <img src="04_simulation_estimation_x_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> SE of the SD = 1.107 --- class: large # Standard Error as a Measure of Confidence -- .center[.red[**Warning: this gets weird**]] -- - We have calculated our SE from a **sample** - not the **population** -- - Our estimate ± 1 SE tells us 2/3 of the *means* we could get by **resampling this sample** -- - This is **not** 2/3 of the possible **true parameter values** -- - BUT, if we *were* to sample the population many times, 2/3 of the time, the sample-based SE will contain the "true" value --- # Confidence Intervals .large[ - So, 1 SE = the 66% Confidence Interval - ~2 SE = 95% Confidence Interval - Best to view these as a measure of precision of your estimate - And remember, if you were able to do the sampling again and again and again, some fraction of your intervals would *not* contain a true value ] --- class: large, middle # Let's see this in action ### .center[.middle[https://istats.shinyapps.io/ExploreCoverage/]] --- # Frequentist Philosophy .large[The ideal of drawing conclusions from data based on properties derived from theoretical resampling is fundamentally **frequentist** - i.e., assumes that we can derive truth by observing a result with some frequency in the long run.] <img src="04_simulation_estimation_x_files/figure-html/CI_sim-1.png" style="display: block; margin: auto;" /> --- # To Address some Confusion: SE (sample), CI (sample), and SD (population)....  .bottom[.small[.left[[Cumming et al. 2007 Table 1](http://byrneslab.net/classes/biol-607/readings/Cumming_2007_error.pdf)]]] --- # SE, SD, CIs....  .bottom[.small[.left[[Cumming et al. 2007 Table 1](http://byrneslab.net/classes/biol-607/readings/Cumming_2007_error.pdf)]]]