class: middle, center background-image: url(images/03/mice_men.jpg) background-position: center background-size: cover <br><br> <div style="align:center; background:black; color:white; font-size: 3em;font-weight: bold;">Sampling</div> --- # Outline 1. Sampling Nature 2. Describing a Sample 3. Using a Sample to Describe a Population --- # Today's quiz: <br><br><br> .center[.large[http://tinyurl.com/sampling-pre]] --- # Today's Etherpad <br><br><br>.center[.large[https://etherpad.wikimedia.org/p/sampling-2020]] --- class:center # What is a population? <img src="03_sampling_lecture_x_files/figure-html/population-1.png" style="display: block; margin: auto;" /> -- **Population** = All Individuals --- class: center, middle # Population .pull-left[ <img src="03_sampling_lecture_x_files/figure-html/population-1.png" style="display: block; margin: auto;" /> ] .pull-right[ <img src="03_sampling_lecture_x_files/figure-html/normplot-1.png" style="display: block; margin: auto;" /> ] --- class: center # What is a sample? <img src="03_sampling_lecture_x_files/figure-html/sample-1.png" style="display: block; margin: auto;" /> -- A **sample** of individuals in a randomly distributed population. --- class: center, middle # Population .pull-left[ <img src="03_sampling_lecture_x_files/figure-html/sample-1.png" style="display: block; margin: auto;" /> ] .pull-right[ <img src="03_sampling_lecture_x_files/figure-html/normplotsamp-1.png" style="display: block; margin: auto;" /> ] --- # Properties of a good sample 1. Validity - Yes, this is a measure of what I am interested in 2. Reliability - If I sample again, I'll get something similar 3. Representative - Sample reflects the population - Unbiased --- # Validity: Is it measuring what I think it's measuring? .center[![:scale 75%](./images/03/23andme_health_ancestry_kit.jpg)] --- # Reliability: Is my sample/measure repeatable? .center[![](./images/03/painscale.jpg)] --- # Bias from Unequal Representation <img src="03_sampling_lecture_x_files/figure-html/colorSize-1.png" style="display: block; margin: auto;" /> If you only chose one color, you would only get one range of sizes. --- # Unrepresentative Bias from Unequal Change of Sampling <img src="03_sampling_lecture_x_files/figure-html/spatialBias-1.png" style="display: block; margin: auto;" /> -- Spatial gradient in size --- # Unrepresentative bias from Unequal Choice of Sampling <img src="03_sampling_lecture_x_files/figure-html/spatialSample-1.png" style="display: block; margin: auto;" /> Oh, I'll just grab those individuals closest to me... --- # Solutions 1. Validity: Can you connect your measurement and question? 2. Reliability: Sample a standard or known population 3. Representative: Sampling design! --- # A more representative sample from Random Sampling <img src="03_sampling_lecture_x_files/figure-html/spatialSample2-1.png" style="display: block; margin: auto;" /> -- Two sampling schemes -- - Random - samples chosen using random numbers -- - Haphazard - samples chosen without any system (careful!) --- # A more representative sample from **Stratified** Sampling <img src="03_sampling_lecture_x_files/figure-html/stratified-1.png" style="display: block; margin: auto;" /> Sample over a known gradient, aka **cluster sampling** Can incorporate multiple gradients --- # Stratified or Random? - How is your population defined? - What is the scale of your inference? - What might influence the inclusion of a replicate? - How important are external factors you know about? - How important are external factors you cannot assess? --- # Stratified Random Sampling .center[.middle[![](./images/03/stratified_random_sampling_meme.jpg)]] --- class: # Exercise: ### 1. What is a population you sample? ### 2. How do you ensure validity, reliability, and representativeness of a sample? --- # Outline 1. Sampling Nature 2. .red[Describing a Sample] 3. Using a Sample to Describe a Population --- # Taking a Descriptive <img src="03_sampling_lecture_x_files/figure-html/sample-1.png" style="display: block; margin: auto;" /> <center>How big are individuals in this population? --- # Our 'Sample' <table class="table" style="margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:right;"> 41.11041 </td> <td style="text-align:right;"> 41.85113 </td> <td style="text-align:right;"> 46.56152 </td> <td style="text-align:right;"> 60.69390 </td> <td style="text-align:right;"> 47.13263 </td> </tr> <tr> <td style="text-align:right;"> 42.11062 </td> <td style="text-align:right;"> 43.02022 </td> <td style="text-align:right;"> 51.24369 </td> <td style="text-align:right;"> 40.91309 </td> <td style="text-align:right;"> 47.22189 </td> </tr> <tr> <td style="text-align:right;"> 48.31628 </td> <td style="text-align:right;"> 46.86495 </td> <td style="text-align:right;"> 40.79575 </td> <td style="text-align:right;"> 34.11458 </td> <td style="text-align:right;"> 48.15706 </td> </tr> <tr> <td style="text-align:right;"> 45.44011 </td> <td style="text-align:right;"> 47.03553 </td> <td style="text-align:right;"> 46.86209 </td> <td style="text-align:right;"> 55.74212 </td> <td style="text-align:right;"> 44.90765 </td> </tr> <tr> <td style="text-align:right;"> 51.28539 </td> <td style="text-align:right;"> 38.87441 </td> <td style="text-align:right;"> 31.41510 </td> <td style="text-align:right;"> 49.46008 </td> <td style="text-align:right;"> 34.67087 </td> </tr> </tbody> </table> --- # Visualizing Our Sample as Counts with 20 Bins <img src="03_sampling_lecture_x_files/figure-html/sampPlot-1.png" style="display: block; margin: auto;" /> --- # Visualizing Our Sample as Frequencies <img src="03_sampling_lecture_x_files/figure-html/sampPlotFreq-1.png" style="display: block; margin: auto;" /> -- Frequency = % of Sample with that Value --- # Visualizing Our Sample as Frequencies <img src="03_sampling_lecture_x_files/figure-html/sampPlotFreq-1.png" style="display: block; margin: auto;" /> Frequency = Probability of Drawing that Value from a Sample --- # Visualizing Our Sample as Frequencies <img src="03_sampling_lecture_x_files/figure-html/sampPlotFreq-1.png" style="display: block; margin: auto;" /> Frequency = Probability Density, Sums to One. --- # Ways to Describe Our Sample - Describing it as a unique thing - What is the value smack in the middle? - What are large and small values like? - Is our sample clustered, or waaaay spread out? - Describing it with reference to a population - If I were to draw from this sample, or it's like, what value would I most likely get? - If I assume a normal distribution, what's the range of 66% or 95% of the variability? - Is this sample peaked, flat, shifted one way or another? --- # The Empirical Cummulative Distribution Plot of our Sample <img src="03_sampling_lecture_x_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- # The Box Plot (or Box-and-Whisker Plot) <img src="03_sampling_lecture_x_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- # Two ways of seeing the same data <img src="03_sampling_lecture_x_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- # Median: Middle of the Sample <img src="03_sampling_lecture_x_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- # Quartiles: Upper and Lower Quarter <img src="03_sampling_lecture_x_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> 1<sup>st</sup> and 3<sup>rd</sup> **Quartile** = 25<sup>th</sup> and 75<sup>th</sup> **Percentile** --- # Reasonable Range of Data (Beyond which are Outliers) <img src="03_sampling_lecture_x_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- # Where's the action in R? ```r # Quantiles quantile(samp) ``` ``` 0% 25% 50% 75% 100% 31.41510 41.11041 46.56152 48.15706 60.69390 ``` ```r # Interquartile range IQR(samp) ``` ``` [1] 7.046652 ``` --- # Outline 1. Sampling Nature 2. Describing a Sample 3. .red[Using a Sample to Describe a Population] --- # This is just a sample <img src="03_sampling_lecture_x_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> -- class: center, middle # We chose it to be representative of the population. --- # But - how does my sample compare to a population? <img src="03_sampling_lecture_x_files/figure-html/samp_pop_plot-1.png" style="display: block; margin: auto;" /> --- # Expected Value: the Mean <img src="03_sampling_lecture_x_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- # Sample Properties: **Mean** `$$\bar{x} = \frac{ \displaystyle \sum_{i=1}^{n}{x_{i}} }{n}$$` `\(\large \bar{x}\)` - The average value of a sample `\(x_{i}\)` - The value of a measurement for a single individual n - The number of individuals in a sample `\(\mu\)` - The average value of a population (Greek = population, Latin = Sample) --- # Sample versus Population - Latin characters (e.g., `\(\bar{x}\)`) for **sample ** - Greek chracters (e.g., `\(\mu\)`) for **population** .center[https://istats.shinyapps.io/sampdist_cont/] --- # Mean versus Median .center[https://istats.shinyapps.io/MeanvsMedian/] --- # How Variable is the Population <img src="03_sampling_lecture_x_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- # Sample Properties: **Variance** How variable was that population? `$$\large s^2= \frac{\displaystyle \sum_{i=1}^{n}{(X_i - \bar{X})^2}} {n-1}$$` * Sums of Squares over n-1 * n-1 corrects for both sample size and sample bias * `\(\sigma^2\)` if describing the population * Units in square of measurement... --- # Sample Properties: Standard Deviation $$ \large s = \sqrt{s^2}$$ * Units the same as the measurement * If distribution is normal, 67% of data within 1 SD * 95% within 2 SD * `\(\sigma\)` if describing the population --- class: middle, center # Post-Quiz http://tinyurl.com/sampling-post