Power Analysis

class: center, middle, inverse

# Power Analysis and Null Hypothesis Testing

![](images/power/heman-i-have-the-power--80s-heman-tshirt-large_1.jpg)

---
# We've Talked About P-Values and Null Hypothesis Testing as a Means of Inference

- For Industrial Quality Control, NHST was introduced to establish cutoffs of reasonable p, called an `\(\alpha\)`

- This corresponds to Confidence intervals: 1 - `\(\alpha\)` = CI of interest

- Results with p `\(\le\)` `\(\alpha\)` are deemed **statistically significant**

---
# Alpha is Important as It Prevents us From Making Misguided Statements

![](images/power/why_alpha_meme.jpg)

---
# Although if P is Continuous, You Avoid This - Mostly

![](images/nht/muff_et_al_2022_pvalue.png)
Muff et al. 2022 TREE

---
class:middle

# Even So, You Can Still Make Mistakes

.pull-left[![](images/power/feeling_fine.jpg)]

.pull-right[![](images/power/positive_covid.jpg) ]

.bottom[ You reject that null that everything is OK - but you're wrong! False Positive! AKA **TYPE I ERROR**]

---
class:center, middle

# `\(\Large \alpha\)` is the probability of comitting a Type I error - getting a False Positive

---
# You Could Also Have a False Negative!

.pull-left[![](images/power/covid_symptoms.png)]

.pull-right[![](images/power/positive_covid.jpg)]

.bottom[ .center[You fail to reject your null incorrectly - **Type II Error**] ]

---
class:center, middle

# `\(\Large \beta\)` is the probability of comitting a Type II error - getting a False Negative

---
# Null Hypothesis: This is Not a Hotdog
.center[
.middle[
![](images/power/hotdog/hotdog_1.png)
]
]

---
# Null Hypothesis: This is Not a Hotdog
.center[
.middle[
![](images/power/hotdog/hotdog_2.png)
]
]

---
# Null Hypothesis: This is Not a Hotdog
.center[
.middle[
![](images/power/hotdog/hotdog_3.png)
]
]

---
# Null Hypothesis: This is Not a Hotdog
.center[
.middle[
![](images/power/hotdog/hotdog_4.png)
]
]

---
# Null Hypothesis: This is Not a Hotdog
.center[
.middle[
![](images/power/hotdog/hotdog_5.png)
]
]

---
# Types of Errors in a NHST framework

|            |Fail to Reject Ho |Reject Ho    |
|:-----------|:-----------------|:------------|
|Ho is True  |-                 |Type I Error |
|Ho is False |Type II Error     |-            |

- Possibility of Type I error regulated by choice of `\(\alpha\)`
  
- Probability of Type II error regulated by choice of `\(\beta\)`

---

# There Can Be Dire Consequences for Type II Error

.center[ ![:scale 70%](images/power/wolf-in-sheeps-clothing.jpg) ]

You have left real information on the table! You did not have the **power** to reject your null.

---

class:center, middle

# Power = 1 - `\(\beta\)`

---
# Power of a Test

-   If `\(\beta\)` is the probability of committing a type II error,
1 - `\(\beta\)` is the power of a test.

-   The higher the power, the less of a chance of committing a type II
error.

-   We often want a power of 0.8 or higher. (20% chance of failing to reject a false null)

---

class: center, middle

# `\(\alpha = 0.05\)` & `\(\beta = 0.20\)`

5% Chance of Falsely Rejecting the Null, 20% Chance of Falsely Failing to Reject the Null

Are you comfortable with this? Why or why not?

---

# What is Power, Anyway?

Given that we often begin by setting our acceptable `\(\alpha\)`, how do we
then determine `\(\beta\)` given our sample design?

-   Formula for a specific test, using sample size, effect size, etc.

-   Simulate many samples, and see how often we get the wrong answer
assuming a given `\(\alpha\)`!

---

# Power to the Puffers

- Let's assume a slope of 3 (as we saw in the experiment)

- Let's assume a SD of 5

- We can now simulate results from experiments, with, say, different levels of replication

---
# Puffer Simulation

```r
make_puffers <- function(n, slope = 3, predator_sd = 5){
  
  #make some simulated data
  tibble(resemblance = rep(1:4, n),
             predators = rnorm(n*4, 
                               mean = slope*resemblance, 
                               sd = predator_sd))
  
}

set.seed(31415)
make_puffers(2)
```

```
# A tibble: 8 × 2
  resemblance predators
        <int>     <dbl>
1           1    11.2  
2           2     0.440
3           3     1.26 
4           4     8.26 
5           1    10.6  
6           2     0.310
7           3    12.8  
8           4    16.5  
```

---
# Lots of Sample Sizes and Simulations of Data

```r
puffer_sims <- tibble(n = 2:10) |>
  
  #for each sample size
  group_by(n) |>
  
  #simulate some data
  summarize(  map_dfr(1:1000, ~make_puffers(n), .id = "sim"))

puffer_sims
```

```
# A tibble: 216,000 × 4
# Groups:   n [9]
       n sim   resemblance predators
   <int> <chr>       <int>     <dbl>
 1     2 1               1     -2.63
 2     2 1               2     12.1 
 3     2 1               3      5.40
 4     2 1               4     11.6 
 5     2 1               1      9.68
 6     2 1               2     16.9 
 7     2 1               3      5.90
 8     2 1               4      5.63
 9     2 2               1     -5.86
10     2 2               2     13.0 
# … with 215,990 more rows
```

---
# Fit Models, Get Coefficients, Get P....

```r
puffer_results <- puffer_sims |>

#for each simulation
  group_by(n, sim) |>
  nest() |>

#fit a model, get it's coefficients and p-values
  summarize(fit = map(data, ~lm(predators ~ resemblance, data = .)),
            coefs = map(fit, broom::tidy)) |>
  
  unnest(coefs) |>
  
  #filter to just our resemblance term
  filter(term == "resemblance")
```

---
# We Can See What Estimates Would Cause Us to Reject the Null at `\(\alpha\)`

.center[Useful in seeing if you have a bias problem]
---
# Get the Power!

```r
puffer_power <- puffer_results |>
  
  group_by(n) |>
  
  summarize(false_neg_rate = sum(p.value > 0.05) / n(),
            
            power = 1 - false_neg_rate)
```

---
# Get the Power!

---
class:middle, center
![](images/power/sample_size_power_wahlberg.jpg)
---

# Common Things to Tweak in a Power Analysis

- Sample Size

- Effect Size

- Residual Standard Deviation

- Omitted Variable Bias (OH NO! YOU SHOULD HAVE PLANNED FOR THIS!)

- And more, depending on complexity of the model

- Different levels of `\(\alpha\)`

---

# How `\(\alpha\)` Influences Power

---
# There is a Tradeoff Between `\(\alpha\)` and Power

---
# There is a Tradeoff Between `\(\alpha\)` and `\(\beta\)`

---
# This Can Lead to Some Sleepless Nights

.center[ ![](images/power/type_I_II_spongebob.jpg) ]

---
# Solutions to `\(\alpha\)` and `\(\beta\)` Tradeoff?

- Optimize for  `\(\alpha\)` or `\(\beta\)`
     - Which would be worse to get wrong?
     
--

- Choose a value that you can live with for both

- Mudge's optimal `\(\alpha\)`
     - Calculate `\(\omega = (\alpha + \beta)/2\)`
     - Use `\(\alpha\)` at minimum value of `\(\omega\)`
     - Can incorporate a cost for `\(\alpha\)` or `\(\beta\)` in calculation
     
---
# Mudge's Optimal Alpha for N = 6

|     alpha| false_neg_rate| power|
|---------:|--------------:|-----:|
| 0.0735367|          0.068| 0.932|

---
# Don't Think That's the Only Way You Can Mess Up!

|            |Fail to Reject Ho |Reject Ho                       |
|:-----------|:-----------------|:-------------------------------|
|Ho is True  |-                 |Type I Error                    |
|Ho is False |Type II Error     |Correct or Type III or IV error |

- Type III error: Correctly reject the **wrong null**
    - Use the wrong-tailed test  
    - Type S error: Reject the null, but your estimand has the wrong *sign*
    - Type M error: Reject the null, but your estimand has the wrong *magnitude*
    - Bad theory leading to a bad null from a bad model

- Type IV error: Correctly reject the null, but for the wrong reason
    - Wrong test for the data
    - Collinearity among predictors
    - Assuming what is true for the group (in this test) is true for the individual  
    - Omitted variable bias colors interpretation

---
# NHST is Great, but Think of All The Ways You Can Misuse it!
![](images/power/error_types_xkcd.png)

.bottom[xkcd]

---
class: center, middle
# But you have the power to plan around them!