Writing Reusable Code with Functions

Let’s say you want to do this…

  1. Make a DF with 4 columns of random numbers.

  2. Standardize EACH variable relative to it’s minimum and maximum, after subtracting the minimum.

You could….

df <- tibble::tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

df$a <- (df$a - min(df$a, na.rm = TRUE)) / 
  (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) / 
  (max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) / 
  (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) / 
  (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))

Some wrong things

  1. Copy-pasting is error prone

  2. Repeated code can get long

  3. Not scalable

  4. You were doing 4 things in each line of code! Unreadable?

A Funcitonal Outline

  1. Intro to Functions

  2. Let’s Write Some Functions!

  3. More than One Argument

  4. How to Build a Function

When to write a function



“You should consider writing a function whenever you’ve copied and pasted a block of code more than twice”

- H. Wickham

Ugh…

df <- tibble::tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

df$a <- (df$a - min(df$a, na.rm = TRUE)) / 
  (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) / 
  (max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) / 
  (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) / 
  (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))

What is a function?

my_cool_function <- function(arguments){
  
  something_to_give_back <- do_things(arguments)
  
  return(something_to_give_back)
}

What is a function?

  1. A new object
  2. A call identifying this as a function
  3. One or more arguments
  4. The body of the function
  5. A return statement that gives something back to the user (if needed)

What is a function?

add_one <- function(x){
  
  ret_value <- x + 1
  
  return(ret_value)
  
}

What is a function?

add_one <- function(x){
  
  ret_value <- x + 1
  
  return(ret_value)
  
}

add_one(3)
[1] 4

Simpler ways of writing functions

add_one <- function(x){
  return(x+1)
}

Super short!

add_one <- function(x) x+1

A Funcitonal Outline

  1. Intro to Functions

  2. Let’s Write Some Functions!

  3. More than One Argument

  4. How to Build a Function

Let’s write some functions

add_one <- function(x){
  
  ret_value <- x + 1
  
  return(ret_value)
  
}

add_one(3)
[1] 4

Let’s write some functions

square_root <- function(___){
  
  ret_value <- sqrt(___)
  
  return(___)
  
}

square_root(16)

Should return 4

Let’s write some functions

max_minus_min <- _______(___){
  
  ret_value <- ___(___) - ___(___)
  
  ___(___)
  
}

max_minus_min(c(4,7,1,6,8))
[1] 7

A Funcitonal Outline

  1. Intro to Functions

  2. Let’s Write Some Functions!

  3. More than One Argument

  4. How to Build a Function

On arguments

Functions can take many arguments:

my_function <- function(x, y, z, q)

These can be of any object type

Arguments can have default values

add_values <- function(x, y=0){

  return(x+y)

}

add_values(3)
[1] 3

You can have … to pass many arguments

make_mean <- function(a_vector, ...){
  
  sum_vector <- sum(a_vector, ...)
  
  n <- length(a_vector)
  
  return(sum_vector/n)
  
  }

make_mean(c(4,5,6), na.rm=TRUE)
[1] 5

Exercises

Write a function and paste it into the etherpad that

  1. Takes a vector and sums it up after it squares it
    • use c(4,5,6) to test (= 77)
  2. Takes a number and combine it into a string with the word “elephants” using paste().
    • 1 elephants, 2 elephants, 15 elephants
  3. Takes a number, a string, and a separator and combines them
    • my_function(3, “hello”, “-”) makes “3 - hello”

Why are we doing this to ourselves?

A Funcitonal Outline

  1. Intro to Functions

  2. Let’s Write Some Functions!

  3. More than One Argument

  4. How to Build a Function

How did I build that function?

  1. OK, what’s normally going to change

  2. AND - what could change under some circumstances?

  3. Write some test code (START WITH COMMENTS!)

Test Code

# start with a vector

# subtract the min

# calculate the max - min

# divide the subracted value by the (max-min)

# return

Test Code

x <- rnorm(10)

# subtract the min
smol_value <- x - min(x, na.rm = TRUE)

# calculate the max - min

# divide the subracted value by the (max-min)

# return

Test Code

x <- rnorm(10)

# subtract the min
smol_value <- x - min(x, na.rm = TRUE)

# calculate the max - min
vec_range <- max(x, na.rm = TRUE) - min(x, na.rm = TRUE)

# divide the subracted value by the (max-min)

# return

Test Code

x <- rnorm(10)

# subtract the min
smol_value <- x - min(x, na.rm = TRUE)

# calculate the max - min
vec_range <- max(x, na.rm = TRUE) - min(x, na.rm = TRUE)

# divide the subracted value by the (max-min)
value <- smol_value / vec_range

# return

Turn test code into a function

std_values <- function(x){
  # subtract the min
  smol_value <- x - min(x, na.rm = TRUE)

  # calculate the max - min
  vec_range <- max(x, na.rm = TRUE) - min(x, na.rm = TRUE)

  # divide the subracted value by the (max-min)
  value <- smol_value / vec_range

  # return
  return(value)
}

Our Final Workflow - So Readable!

df <- tibble::tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

df <- df |>
  mutate(a = std_values(a),
         b = std_values(b),
         c = std_values(c),
         d = std_values(d))

dplyr says No Moar Copy Paste

df <- tibble::tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

df <- df |>
  mutate(
    across(.cols = everything(), 
           .fn = std_values)
    )

When to write a function



“You should consider writing a function whenever you’ve copied and pasted a block of code more than twice”

- H. Wickham