Often in R, what we kludge together different pieces of code to generate a sequence of events. For example, let’s say I wanted to take a vector , find out it’s length, and get the square root. Then get the log of that. There are a few ways I could do this.
vec <- 1:10
len <- length(vec)
sq <- sqrt(len)
log(sq)
## [1] 1.151293
This makes sense, and is very logical, but I keep saving new variables as I go along. I could write some nested piece of code that does the same thing with no variables.
log(sqrt(length(1:10)))
## [1] 1.151293
Which is efficient, but, man, I hope I can keep track of parentheses. It’s also ugly as hell. And super difficult for future-me to understand.
This idea of doing one thing, then the next, then the next is often how we treat our data. We take a piece of data, sift it, transform it, summarize it, etc. And this step-by-step style of coding has a name - functional programming. It lies at the core of much of what we do in R.
So, is there a better way? Yes! What if we could take the output of one function, and pass it to the next? This is known as piping. And it is implemented in R by the magrittr
library. (Pause. Get the joke. Laugh at it. Move on.) magrittr
introduces a funny new operator %>%
which says, take the output of the thing on the right hand side, and pass it to the next function as its first argument.
As a simple example, let’s sum one through ten using pipes.
#If you haven't, install.packages("magrittr") first
library(magrittr)
1:10 %>% sum()
## [1] 55
Notice that we supplied no argument to sum. Now, what if wanted to take the square root of the result?
1:10 %>%
sum() %>%
sqrt()
## [1] 7.416198
Notice how I separated different steps on different lines? This is a more readable code. Future me can track what’s going on, line by line, instead of having to read a looooong single line of code.
Let’s try a few exercises with pipes.
Use pipes to sum the log of 100:200.
Use pipes to take the square root of the mean of 100 random uniform numbers.
Let’s dive into the guts of R. Using the mtcars
data frame, get it’s summary and str that summary. What do you get back?
The last basic thing we’ll need today is plotting, as we’ll want to see the results of our simulations. Next week we’ll learn ggplot2
and get fancy. For the moment, I just want you to know a few things in baseplot.
First, here’s how to make a histogram.
#just some random numbers
vals <- runif(50, 5, 10)
#the histogram
hist(vals)
Easy, no?
Now, what if I wanted to plot x by y? A scatterplot. R uses an equation-like syntax for this, where I plot y ~ x
. We’ll use this again and again as we fit linear models. For now, here’s an example scatterplot.
#first, a data frame
my_df <- data.frame(x=1:10, y=10:1)
#note how data is used
#as well as column names in the equation
plot(y ~ x, data=my_df)
Now, you can supply all sorts of arguments on plot - look at the help file. We’re not really going to worry about those today.
Let’s try some example exercises. And, uh, then this is the last you will ever see of baseplot again in this class.
Using mtcars
, plot mpg by cyl. What happens if you set pch to 19 or 5?
boxplot()
works like plot. Use it for mpg ~ vs
. Now try the col
argument. Give it a vector of two colors, like c("red", "blue")
.
dplyr
will change your life. It is the data munging tool you didn’t know you needed until now. To learn our way around it, we’ll use it with the mtcars
data frame. dplyr
is a collection of functions that manipulate data frames (or data tables, or tibbles, or other objects that work like data frames - more on those later). It’s functions largely take the data frame object as the first argument, and then return data frames as the output, so pipes are a natural fit for dplyr
. Indeed, if you load dplyr
, magrittr
is loaded by default.
mutate()
image from Allison Horst https://github.com/allisonhorst/
Often, we want to make a column that uses information from other columns. For this, we have mutate. For example, what if we wanted to make a log transformed column of mpg
? Note, I’m going to wrap the whole thing in head, so we only get the first 6 lines of the result
library(dplyr)
mtcars2 <- mutate(mtcars, log_mpg = log(mpg))
head(mtcars2)
## mpg cyl disp hp drat wt qsec vs am gear carb log_mpg
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 3.044522
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3.044522
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3.126761
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3.063391
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2.928524
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 2.895912
OK, the first argument was the data frame. Then, we just specified a new column from whole cloth, but used one of the columns from mtcars to make it. How easy! It’s even easier with pipes.
mtcars2 <- mtcars %>%
mutate(log_mpg = log(mpg))
head(mtcars2)
## mpg cyl disp hp drat wt qsec vs am gear carb log_mpg
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 3.044522
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3.044522
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3.126761
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3.063391
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2.928524
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 2.895912
group_by()
One of the ways we use dplyr
the most is group_by()
. This lets us group by different unique values of a column - think treatments or categories - and do something to those sub-data-frames. For example, let’s say we wanted to group by number of gears, and then use mutate to make a new column that’s the average of the mpg values.
mtcars_group <- mtcars %>%
group_by(gear) %>%
mutate(avg_mpg = mean(mpg)) %>%
ungroup()
head(mtcars_group)
## # A tibble: 6 x 12
## mpg cyl disp hp drat wt qsec vs am gear carb avg_mpg
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 24.5
## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 24.5
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 24.5
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 16.1
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 16.1
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 16.1
Notice the ungroup()
at the end. That’s provided as some functions bork if you give them a grouped data frame. We want to remove that structure.
Now see that final column - each row with the same number of gears has the same avg_mpg value.
summarize()
Often, you don’t want to have a data frame with summarized information repeated. You just want to reduce your data frame down to summarized information. In Excel, this is pivot tables, basically. For that, we use a combination of group_by()
and summarize()
Let’s do the same thing as the previous example, only let’s add the standard deviation, and return just the summarized information.
mtcars %>%
group_by(gear) %>%
summarize(avg_mpg = mean(mpg),
sd_mpg = sd(mpg)) %>%
ungroup()
## # A tibble: 3 x 3
## gear avg_mpg sd_mpg
## <dbl> <dbl> <dbl>
## 1 3 16.1 3.37
## 2 4 24.5 5.28
## 3 5 21.4 6.66
Whoah! Cool!
You try it, but group by gear and vs, and look at how that alters weight. Yes, you can group by multiple things in the same group_by
statement - just separate them with a comma!
select()
Often when working with a data frame, you end up with some column you no longer want. Maybe it was intermediate to some calculation. Or maybe you just want only a few columns to save memory. Either way, select is a way to include or exclude columns.
head(select(mtcars2, -mpg))
## cyl disp hp drat wt qsec vs am gear carb log_mpg
## 1 6 160 110 3.90 2.620 16.46 0 1 4 4 3.044522
## 2 6 160 110 3.90 2.875 17.02 0 1 4 4 3.044522
## 3 4 108 93 3.85 2.320 18.61 1 1 4 1 3.126761
## 4 6 258 110 3.08 3.215 19.44 1 0 3 1 3.063391
## 5 8 360 175 3.15 3.440 17.02 0 0 3 2 2.928524
## 6 6 225 105 2.76 3.460 20.22 1 0 3 1 2.895912
Uh oh! Where did mpg go? Gone!
Or
mtcars3 <- mtcars %>%
select(wt, am, gear)
head(mtcars3)
## wt am gear
## Mazda RX4 2.620 1 4
## Mazda RX4 Wag 2.875 1 4
## Datsun 710 2.320 1 4
## Hornet 4 Drive 3.215 0 3
## Hornet Sportabout 3.440 0 3
## Valiant 3.460 0 3
One can even use some more imprecise matches if need be.
head(select(mtcars2, contains("mpg")))
## mpg log_mpg
## 1 21.0 3.044522
## 2 21.0 3.044522
## 3 22.8 3.126761
## 4 21.4 3.063391
## 5 18.7 2.928524
## 6 18.1 2.895912
filter()
image from Allison Horst https://github.com/allisonhorst/
We can also exclude by row values using filter. This function takes all manner of comparisons, and returns only those rows for which the comparison is true. For example, to get rid of 3 cylinder cars:
mtcars_filter <- mtcars %>%
filter(cyl != 3)
head(mtcars_filter)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
I’ll often use this to remove rows with NA
values for some column, such as
mtcars %>%
filter(!is.na(cyl))
Exercises
1. Add some columns to mtcars
to plot the log of mpg
by the square root of hp
.
Get the average hp
per gear
and plot them against each other.
Make a data fame for only 6 cylinder engines with only the disp
and carb
columns. Create a boxplot. of how carb influences disp
.
To simulate things, we want to repeat an action again and again. There are a few ways to do this in R. The first is the function replicate()
replicate(n = 5, sum(1:10))
## [1] 55 55 55 55 55
Well, that’s a hair boring, but, you can see it summed something 5 times. We can of course create fancier things to do, but, in general, replicate()
is a solid workhorse.
There is also a beautiful library in R called purrr. This library takes the dplyr logic of mapping a function to many inputs and puts it on a different level. We’ll talk more about purrr at another time, but, for now, there are a few functions worth knowing.
image from Allison Horst https://github.com/allisonhorst/
The core purrr functions are the map_*()
functions. Basically, a whole family of functions where you can change what is in the * to specify the output type. For example, if we want to replicate the replicate()
action of above, we’d be returning a double and would use map_dbl()
.
library(purrr)
map_dbl(1:5, ~sum(1:10))
## [1] 55 55 55 55 55
Note our use of ~
to say “Hey, we’re about to call some stuff.” There are other ways to use map, but they involve writing your own functions, and, we’ll leave that for another time.
Other useful versions of map include map_chr()
for character returns, map_df()
for a data frame, map_lgl
for true/false, and just plain olde map()
which returns a list of whatever you want.
Let’s see map_df()
in action, as we’re going to use it today. Let’s say we want to take some input, and return a data frame with the input and the sum of all numbers up to that input. To do so, you need to know that after the ~
, map functions call the individual argument being sent to the function .x
.
map_df(1:10, ~data.frame(x = .x, y = sum(1:.x)))
## x y
## 1 1 1
## 2 2 3
## 3 3 6
## 4 4 10
## 5 5 15
## 6 6 21
## 7 7 28
## 8 8 36
## 9 9 45
## 10 10 55
Nifty! What if we wanted to return not only the final sum, but all numbers up until that point?
map_df(1:10, ~data.frame(x = .x, y = cumsum(1:.x)))
## x y
## 1 1 1
## 2 2 1
## 3 2 3
## 4 3 1
## 5 3 3
## 6 3 6
## 7 4 1
## 8 4 3
## 9 4 6
## 10 4 10
## 11 5 1
## 12 5 3
## 13 5 6
## 14 5 10
## 15 5 15
## 16 6 1
## 17 6 3
## 18 6 6
## 19 6 10
## 20 6 15
## 21 6 21
## 22 7 1
## 23 7 3
## 24 7 6
## 25 7 10
## 26 7 15
## 27 7 21
## 28 7 28
## 29 8 1
## 30 8 3
## 31 8 6
## 32 8 10
## 33 8 15
## 34 8 21
## 35 8 28
## 36 8 36
## 37 9 1
## 38 9 3
## 39 9 6
## 40 9 10
## 41 9 15
## 42 9 21
## 43 9 28
## 44 9 36
## 45 9 45
## 46 10 1
## 47 10 3
## 48 10 6
## 49 10 10
## 50 10 15
## 51 10 21
## 52 10 28
## 53 10 36
## 54 10 45
## 55 10 55
Exercises
Use replicate()
to repeatedly average the numbers 1:10 seven times.
Do the same thing with map_dbl()
- also what happens if you use other map functions?
Start with a vector:
my_vec <- c(11, 10, 10, 9, 10, 11, 10, 9, 10,
12, 9, 11, 8, 11, 11, 10, 11, 10,
11, 9)
Use map_df()
to make a data frame that, for the numbers 3 through 15, returns two columns. One is the the average of the element of the vector 1 through the chosen number, the second is the standard deviation.
e.g. 10.2 for a mean - but the 10 will be .x
for you!
There are a number of ways to get random numbers in R from a variety of distributions. For our simulations, let’s start with a sample of 40 individuals that’s from a population with a mean value of 10, a SD of 3.
set.seed(323)
samp <- rnorm(n = 40, mean = 10, sd = 3)
Note the set.seed()
function. Random number generators on computers aren’t REALLY random - they use some nugget of information that changes constantly to be pseudo-random. By setting that nugget - the seed - we make sure that all random numbers generated from this point on are the same.
Unsettling, no?
There are a wide variety of different random number generators in R. rnorm()
is exceedingly comming, but, we also have the following. Let’s flip a coin 30 times and see how many heads we get if there’s a 50:50 change of getting heads. And let’s do it 10 times!
rbinom(10, size = 30, prob = 0.5)
## [1] 10 18 16 11 16 17 14 14 15 17
We can also draw random numbers from a uniform distribution (all numbers are equally likely to be drawn) with a minimum and maximum bound.
runif(10, min = -1, max = 1)
## [1] 0.06789064 -0.41526715 -0.16290617 0.87500096 0.82789850 0.86382277
## [7] -0.88316031 0.81005091 0.99624116 0.41923706
So from this, we can draw a random sample from a population with a distribution of our choice!
rowwise()
and map_dbl()
for simulationOK, let’s begin our journey towards figuring out what sample size we want. We now know how to generate random measurements from an assumed population. The other tool we’ll need to simulate sampling a population is some output. Let’s go through this step by step. First, let’s say we are going to try out sample sizes from 3 through 50 for a population with a mean of 45 and a standard deviation of 20. To start, we’ll need to setup a data frame with the different sample sizes we want and some parameters we can use.
# our assumptions
mean_pop <- 45
sd_pop <- 20
# use rep to repeat the sample sizes 3 through 50
samp_sim <- data.frame(samp_size = 3:50)
So, how do we create those replicate populations? Let’s say we wanted to just make one simulated draw for each sample size. For that, we need to 1) look at each sample size, and then 2) draw a random sample. For the latter, we already have a function. For the former, we need a function to go row by row. dplyr::rowwise()
. We feed it samp_size
as an argument so that the variable is preserved in the resulting data frame. We could also have done dplyr::group_by(samp_size)
as well. But I often find rowwise()
more intuitive in this case. For the later, we actually use summarize()
, as it lets us create resulting outputs that are not the same size as the input. Note, this is new in dplyr 1.0, and I could not be more excited.
Let’s do this and visualize the results.
samp_sim_1 <- samp_sim %>%
rowwise(samp_size) %>%
summarize(samp = rnorm(samp_size, mean_pop, sd_pop))
plot(samp ~ samp_size, data = samp_sim_1)
OK, that’s nice. I have a lot of samples at each sample size. You can see as our sample size increases, there are, indeed, more points. However, we want to simulate a lot of sampling evens and calculate sample statistics. This is where map_dbl()
or replicate()
becomes very useful. Let’s say, instead of just drawing from a normal population, we want to modify the above to calculate a mean. And do it, oh, let’s say twice.
samp_sim_means_2 <- samp_sim %>%
rowwise(samp_size) %>%
summarize(samp_mean =
replicate(2, rnorm(samp_size, mean_pop, sd_pop) %>% mean()))
samp_sim_means_2
## # A tibble: 96 x 2
## # Groups: samp_size [48]
## samp_size samp_mean
## <int> <dbl>
## 1 3 12.9
## 2 3 71.2
## 3 4 45.6
## 4 4 54.8
## 5 5 55.1
## 6 5 54.4
## 7 6 55.0
## 8 6 52.4
## 9 7 45.0
## 10 7 32.7
## # … with 86 more rows
Oh! Now we’re getting somewhere. Before we fully sally forth, let’s outline what we’re doing. You’ll see it’s a common structure for simulations.
# Take a data frame with a bunch of sample sizes
# For each row
# Replicate calculating the mean from a random draw some number of times
Reasonable, no? You could have written this out. Now, let’s operationalize this for 1000 simulations.
# Take a data frame with a bunch of sample sizes
sim_results <- samp_sim %>%
# For each row
rowwise(samp_size) %>%
# Replicate calculating the mean from a random draw some number of times
summarize(samp_mean = map_dbl(1:1000,
~ rnorm(samp_size, mean_pop, sd_pop) %>%
mean()))
What’s great about this (and using a map function) is that we can also add additional information - say, the sd - using map_df()
Note how we don’t even use = in summarize()
# Take a data frame with a bunch of sample sizes
sim_results <- samp_sim %>%
# For each row
rowwise(samp_size) %>%
# Replicate calculating the mean
# and Sdfrom a random draw some number of times
summarize(map_df(1:1000,
~data.frame(sim = .x,
samp_mean = rnorm(samp_size,
mean_pop,
sd_pop) %>%
mean(),
samp_sd = rnorm(samp_size,
mean_pop,
sd_pop) %>%
sd())))
sim_results
## # A tibble: 48,000 x 4
## # Groups: samp_size [48]
## samp_size sim samp_mean samp_sd
## <int> <int> <dbl> <dbl>
## 1 3 1 30.7 5.32
## 2 3 2 59.1 18.4
## 3 3 3 57.8 18.1
## 4 3 4 64.3 40.9
## 5 3 5 35.4 32.5
## 6 3 6 38.6 14.9
## 7 3 7 40.2 11.7
## 8 3 8 39.4 10.9
## 9 3 9 40.0 25.8
## 10 3 10 49.9 13.6
## # … with 47,990 more rows
Let’s try this out, and have you fill in what’s missing in these faded examples.
#Some preperatory material
set.seed(42)
mean_pop <- 10
sd_pop <- 3
nsim <- 100
sampSim <- data.frame(samp_size = 3:50)
#Mean simulations
sampSim %>%
rowwise(samp_size) %>%
summarize(samp_mean =
replicate(nsim,
rnorm(samp_size, mean_pop, sd_pop) %>% mean()))
#Now the faded examples! Fill in the ___
#Median simulations
samp_sim %>%
rowwise(samp_size) %>%
summarize(samp_median =
____(nsim,
rnorm(samp_size, mean_pop, sd_pop) %>% median()))
#SD simulations
samp_sim %>%
rowwise(samp_size) %>%
____(samp_sd =
____(nsim,
____(samp_size, mean_pop, ____) %>% sd()))
#IQR simulations
#function for interquartile range is IQR
samp_sim %>%
____(____) %>%
____(samp_iqr =
____(nsim,
____(____, mean_pop, ____) %>% IQR()))
plot
and summarize
Great, so we’ve done the simulation! How do we determine sample size? The first way is a plot.
plot(samp_mean ~ samp_size, data=sim_results)
We can eyeball this result and see a leveling out > 20 or so. OK, that’s great, but…totally imprecise.
Better might be to see where the SD of the mean levels off. Let’s pseudocode that out in comments to see what we might need to do.
# Take our sampSim data
# Group it by sample size
# Take the SD of the sample size
# ungroup the resulting data
# Sort it by SD
# Look at the top 5 entries.
A few new things here. First, we’re grouping by sample size, not sim number. Second, we’re summarizing. We are reducing our data down - boiling it to its essence. For this, dplyr
has a function called - wait for it - summarize()
. Second, we’re going to have to do some arranging. With a dplyr
function called - and this is a shocker - arrange
. sort
was already taken by the base package. OK, let’s walk through the resulting pipeline in two pieces. First, the summarization
# Take our sampSim data
sampSim_summary <- sim_results %>%
# Group it by sample size
group_by(samp_size) %>%
# Take the SD of the sample size
summarize(pop_mean_sd = sd(samp_mean))
Neat - this gives us a much reduced piece of data to work with. But - what about the arranging?
sampSim_summary <- sampSim_summary %>%
# Sort it by SD
arrange(pop_mean_sd)
sampSim_summary
## # A tibble: 48 x 2
## samp_size pop_mean_sd
## <int> <dbl>
## 1 49 2.73
## 2 47 2.86
## 3 50 2.88
## 4 45 2.92
## 5 48 2.99
## 6 46 3.02
## 7 43 3.06
## 8 44 3.13
## 9 42 3.15
## 10 41 3.17
## # … with 38 more rows
Now, notice that I don’t have to use head
or chop of the top few rows (there’s a dplyr function for that - slice()
). That’s because dplyr creates a data frame-like object called a tibble
. We’ll talk about this in more detail later. For now, tibbles work just like data frames, but they print out a little differently. If we REALLY wanted a data frame, we could just use as.data.frame()
.
As for the result, we can see that, eh, something in the 40s has the lowest SD, but things bounce around. We could also plot this.
plot(pop_mean_sd ~ samp_size, data = sampSim_summary)
So, what is acceptable? At what point does adding another sample begin to have diminishing returns.
Look at the resulting simulations using the sample. Would you chose a different optimal sample size?
Repeat this whole process, but for the sample median. What is the optimal sample size? Feel free to reuse code or modify code in place (for the simulations).
So, as we launch into things like standard errors, I wanted to pause and hit a few key functions we need to describe a sample. Let’s use the samp
vector from above as adata source. First, mean, sd, and median.
mean(samp)
## [1] 10.28201
sd(samp)
## [1] 2.881881
median(samp)
## [1] 10.0977
This is pretty great. There are a ton of functions like these. Such as
IQR(samp)
## [1] 4.15341
Although there is no standard error of the mean or confidence interval function. There are functions for skew, kurtosis, and more in other packages. Last, for arbitrary percentiles, there’s the ‘quantile()’ function.
#quartiles by defeault
quantile(samp)
## 0% 25% 50% 75% 100%
## 5.648247 8.072854 10.097696 12.226264 18.916196
#or, get what you want
quantile(samp, probs = c(0.05, 0.1, 0.5, 0.9, 0.95))
## 5% 10% 50% 90% 95%
## 6.264685 7.177671 10.097696 13.676815 14.484173
You can get individual values by specifying the percentile you want with probs.
OK, but what about bootstrapped standard errors? Or, you know, bootstrapping in general. Let’s start by generating a ‘sample’ with a sample size of 40.
set.seed(2020)
samp <- rnorm(n = 40, mean = 10, sd = 3)
OK, how do we get one bootstrap sample from this? Why, with sample()
of course. We give sample()
a few arguments about how big of a sample we want as well as if we want to sample with or without replacement.
one_boot <- sample(samp, size = length(samp), replace = TRUE)
Let’s look at the bootstrapped standard error of a median from our samp vector. We’re going to do it using the same techniques of simulation - and in particular with replicate()
. So, let’s replicate taking the median of a sample with replacement of the samp vector. Then we can get the standard deviation of that.
median_sims <- replicate(n = 100,
median(sample(samp,length(samp), replace=TRUE)))
sd(median_sims)
## [1] 0.3784669
Simple, no? Now…. can you think about how you might use this in the context of trying different sample sizes?
OK, you’ve done simulations from populations. Now, let’s mesh those bootstraps with our concepts of simulation at different sample sizes and…. plot out how to look at change in bootstrapped SE for the IQR at different sample sizes.
First, make sure you feel comfortable calculating the bootstrapped SE of the IQR from samp. Repeat what we did above with IQR instead of median.
Now, write out in comments what you will do to end up with a data frame that has a column of sample sizes and a column of IQRs calculated from sampling our samp
vector.
Code it!
Now, write out in comments how you would go from that data frame to one that has the SE for the IQR at each sample size.