Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Models with Categorical Variables

1 / 53

So, We've Done this Linear Deliciousness

yi=β0+β1xi+ϵi ϵiN(0,σ)

2 / 53

And We've Seen Many Predictors

yi=β0+Kj=1βjxij+ϵi

3 / 53

What if our X Variable Is Categorical?

Comparing Two Means

4 / 53

What if our X Variable Is Categorical?

Comparing Two Means

massi=β0+β1sexi+ϵi

4 / 53

What if it Has Many Levels

Comparing Many Means

5 / 53

What if it Has Many Levels

Comparing Many Means

massi=β1adeliei+β2chinstrapi+β3gentooi+ϵi

5 / 53

Different Types of Categories?

Comparing Many Means in Many categories

6 / 53

Different Types of Categories?

Comparing Many Means in Many categories

massi=β1adeliei+β2chinstrapi+β3gentooi+β4malei+ϵi

6 / 53

7 / 53

Dummy Codding for Dummy Models

  1. The Categorical as Continuous

  2. Many Levels of One Category

  3. Interpretation of Categorical Results.

  4. Querying Your Model to Compare Groups

8 / 53

We Know the Linear Model

yi=β0+β1xi+ϵi

ϵiN(0,σ)

But, what if xi was just 0,1?

9 / 53

Consider Comparing Two Means

Consider the Horned Lizard

Horns prevent these lizards from being eaten by birds. Are horn lengths different between living and dead lizards, indicating selection pressure?

10 / 53
11 / 53

The Data

12 / 53

Looking at Means and SE

13 / 53

What is Really Going On?

14 / 53

What is Really Going On?

What if we think of Dead = 0, Living = 1

14 / 53

Let's look at it a different way

15 / 53

First, Recode the Data with Dummy Variables

Squamosal horn length Survive Status
13.1 1 Living
15.2 0 Dead
15.5 0 Dead
15.7 1 Living
17.2 0 Dead
17.7 1 Living
16 / 53

First, Recode the Data with Dummy Variables

Squamosal horn length Survive Status StatusDead StatusLiving
13.1 1 Living 0 1
15.2 0 Dead 1 0
15.5 0 Dead 1 0
15.7 1 Living 0 1
17.2 0 Dead 1 0
17.7 1 Living 0 1
17 / 53

But with an Intercept, we don't need Two Dummy Variables

Squamosal horn length Survive Status (Intercept) StatusLiving
13.1 1 Living 1 1
15.2 0 Dead 1 0
15.5 0 Dead 1 0
15.7 1 Living 1 1
17.2 0 Dead 1 0
17.7 1 Living 1 1
18 / 53

But with an Intercept, we don't need Two Dummy Variables

Squamosal horn length Survive Status (Intercept) StatusLiving
13.1 1 Living 1 1
15.2 0 Dead 1 0
15.5 0 Dead 1 0
15.7 1 Living 1 1
17.2 0 Dead 1 0
17.7 1 Living 1 1

This is known as a Treatment Contrast structure

18 / 53

This is Just a Linear Regression

Lengthi=β0+β1Statusi+ϵi

19 / 53

You're Not a Dummy, Even If You Code a Dummy Variable

Lengthi=β0+β1Statusi+ϵi

  • Setting Statusi to 0 or 1 (Dead or Living) is called Dummy Coding
    • Or One Hot Encoding in the ML world.
20 / 53

You're Not a Dummy, Even If You Code a Dummy Variable

Lengthi=β0+β1Statusi+ϵi

  • Setting Statusi to 0 or 1 (Dead or Living) is called Dummy Coding
    • Or One Hot Encoding in the ML world.
  • We can always turn groups into "Dummy" 0 or 1
20 / 53

You're Not a Dummy, Even If You Code a Dummy Variable

Lengthi=β0+β1Statusi+ϵi

  • Setting Statusi to 0 or 1 (Dead or Living) is called Dummy Coding
    • Or One Hot Encoding in the ML world.
  • We can always turn groups into "Dummy" 0 or 1

  • We could even fit a model with no β0 and code Dead = 0 or 1 and Living = 0 or 1

20 / 53

You're Not a Dummy, Even If You Code a Dummy Variable

Lengthi=β0+β1Statusi+ϵi

  • Setting Statusi to 0 or 1 (Dead or Living) is called Dummy Coding
    • Or One Hot Encoding in the ML world.
  • We can always turn groups into "Dummy" 0 or 1

  • We could even fit a model with no β0 and code Dead = 0 or 1 and Living = 0 or 1

  • This approach works for any unordered categorical (nominal) variable

20 / 53
21 / 53

Homogeneity of Variance Important for CI Estimation

22 / 53

Dummy Codding for Dummy Models

  1. The Categorical as Continuous.

  2. Many Levels of One Category

  3. Interpretation of Categorical Results.

  4. Querying Your Model to Compare Groups

23 / 53

Categorical Predictors: Gene Expression and Mental Disorders

24 / 53

The data

25 / 53

Traditional Way to Think About Categories

What is the variance between groups v. within groups?

26 / 53

If We Only Had Control v. Bipolar...

27 / 53

If We Only Had Control v. Bipolar...

28 / 53

If We Only Had Control v. Bipolar...

Underlying linear model with control = intercept, dummy variable for bipolar

28 / 53

If We Only Had Control v. Bipolar...

Underlying linear model with control = intercept, dummy variable for bipolar

29 / 53

If We Only Had Control v. Schizo...

Underlying linear model with control = intercept, dummy variable for schizo

30 / 53

If We Only Had Control v. Schizo...

Underlying linear model with control = intercept, dummy variable for schizo

31 / 53

Linear Dummy Variable (Fixed Effect) Model

yij=β0+βjxij+ϵij,xi=0,1

  • i = replicate, j = group
32 / 53

Linear Dummy Variable (Fixed Effect) Model

yij=β0+βjxij+ϵij,xi=0,1

  • i = replicate, j = group

  • xij inidicates presence/abscence (1/0) of level j for individual i

    • A Dummy variable
32 / 53

Linear Dummy Variable (Fixed Effect) Model

yij=β0+βjxij+ϵij,xi=0,1

  • i = replicate, j = group

  • xij inidicates presence/abscence (1/0) of level j for individual i

    • A Dummy variable
  • This is the multiple predictor extension of a two-category model
32 / 53

Linear Dummy Variable (Fixed Effect) Model

yij=β0+βjxij+ϵij,xi=0,1

  • i = replicate, j = group

  • xij inidicates presence/abscence (1/0) of level j for individual i

    • A Dummy variable
  • This is the multiple predictor extension of a two-category model

  • All categories are orthogonal

32 / 53

Linear Dummy Variable (Fixed Effect) Model

yij=β0+βjxij+ϵij,xi=0,1

  • i = replicate, j = group

  • xij inidicates presence/abscence (1/0) of level j for individual i

    • A Dummy variable
  • This is the multiple predictor extension of a two-category model

  • All categories are orthogonal

  • One category set to β0 for ease of fitting, and other βs are different from it

32 / 53

A Simpler Way to Write: The Means Model

yij=αj+ϵij
ϵijN(0,σ2)

  • i = replicate, j = group
  • Different mean for each group
  • Focus is on specificity of a categorical predictor
33 / 53

Partioning Model to See What Varies

yij=ˉy+(ˉyjˉy)+(yijˉyj)

  • i = replicate, j = group
  • Shows partitioning of variation
    • Between group v. within group variation
34 / 53

Partioning Model to See What Varies

yij=ˉy+(ˉyjˉy)+(yijˉyj)

  • i = replicate, j = group
  • Shows partitioning of variation
    • Between group v. within group variation
  • Consider ˉy an intercept, deviations from intercept by treatment, and residuals
34 / 53

Partioning Model to See What Varies

yij=ˉy+(ˉyjˉy)+(yijˉyj)

  • i = replicate, j = group
  • Shows partitioning of variation
    • Between group v. within group variation
  • Consider ˉy an intercept, deviations from intercept by treatment, and residuals

  • Can Calculate this with a fit model to answer questions - it's a relic of a bygone era

    • That bygone era has some good papers, so, you should recognize this
34 / 53

Let's Fit that Model

Using Least Squares

brain_lm <- lm(PLP1.expression ~ group, data=brainGene)
tidy(brain_lm) |>
select(-c(4:5)) |>
knitr::kable(digits = 3) |>
kableExtra::kable_styling()
term estimate std.error
(Intercept) -0.004 0.048
groupschizo -0.191 0.068
groupbipolar -0.259 0.068
35 / 53

Dummy Codding for Dummy Models

  1. The Categorical as Continuous

  2. Many Levels of One Category

  3. Interpretation of Categorical Results

  4. Querying Your Model to Compare Groups

36 / 53

R Fits with Treatment Contrasts

yij=β0+βjxij+ϵij

term estimate std.error
(Intercept) -0.004 0.048
groupschizo -0.191 0.068
groupbipolar -0.259 0.068
37 / 53

R Fits with Treatment Contrasts

yij=β0+βjxij+ϵij

term estimate std.error
(Intercept) -0.004 0.048
groupschizo -0.191 0.068
groupbipolar -0.259 0.068

What does this mean?

37 / 53

R Fits with Treatment Contrasts

yij=β0+βjxij+ϵij

term estimate std.error
(Intercept) -0.004 0.048
groupschizo -0.191 0.068
groupbipolar -0.259 0.068

What does this mean?

  • Intercept ( β0 ) = the average value associated with being in the control group

  • Others = the average difference between control and each other group

  • Note: Order is alphabetical

37 / 53

Actual Group Means

yij=αj+ϵij

group estimate std.error
control -0.0040000 0.0479786
schizo -0.1953333 0.0479786
bipolar -0.2626667 0.0479786
38 / 53

Actual Group Means

yij=αj+ϵij

group estimate std.error
control -0.0040000 0.0479786
schizo -0.1953333 0.0479786
bipolar -0.2626667 0.0479786

What does this mean?

38 / 53

Actual Group Means

yij=αj+ϵij

group estimate std.error
control -0.0040000 0.0479786
schizo -0.1953333 0.0479786
bipolar -0.2626667 0.0479786

What does this mean?

Being in group j is associated with an average outcome of y.

38 / 53

What's the best way to see this?

39 / 53

Many Ways to Visualize

40 / 53

Many Ways to Visualize

41 / 53

Many Ways to Visualize

42 / 53

How Well Do Groups Explain Variation in Response Data?

We can look at fit to data - even in categorical data!

# R2 for Linear Regression
R2: 0.271
adj. R2: 0.237
43 / 53

How Well Do Groups Explain Variation in Response Data?

We can look at fit to data - even in categorical data!

# R2 for Linear Regression
R2: 0.271
adj. R2: 0.237

But, remember, this is based on the sample at hand.

43 / 53

How Well Do Groups Explain Variation in Response Data?

We can look at fit to data - even in categorical data!

# R2 for Linear Regression
R2: 0.271
adj. R2: 0.237

But, remember, this is based on the sample at hand.

Adjusted R2: adjusts for sample size and model complexity (k = # params = # groups)

R2adj=1(1R2)(n1)nk1

43 / 53

Dummy Codding for Dummy Models

  1. The Categorical as Continuous

  2. Many Levels of One Category

  3. Interpretation of Categorical Results.

  4. Querying Your Model to Compare Groups

44 / 53

Which groups are different from each other?

45 / 53

Which groups are different from each other?

Many mini-linear models with two means....multiple comparisons!

45 / 53

Post-Hoc Means Comparisons: Which groups are different from one another?

  • Each group has a mean and SE
46 / 53

Post-Hoc Means Comparisons: Which groups are different from one another?

  • Each group has a mean and SE

  • We can calculate a comparison for each

46 / 53

Post-Hoc Means Comparisons: Which groups are different from one another?

  • Each group has a mean and SE

  • We can calculate a comparison for each

  • BUT, we lose precision as we keep resampling the model

46 / 53

Post-Hoc Means Comparisons: Which groups are different from one another?

  • Each group has a mean and SE

  • We can calculate a comparison for each

  • BUT, we lose precision as we keep resampling the model

  • Remember, for every time we look at a system, we have some % of our CI not overlapping the true value

46 / 53

Post-Hoc Means Comparisons: Which groups are different from one another?

  • Each group has a mean and SE

  • We can calculate a comparison for each

  • BUT, we lose precision as we keep resampling the model

  • Remember, for every time we look at a system, we have some % of our CI not overlapping the true value

  • Each time we compare means, we have a chance of our CI not covering the true value

46 / 53

Post-Hoc Means Comparisons: Which groups are different from one another?

  • Each group has a mean and SE

  • We can calculate a comparison for each

  • BUT, we lose precision as we keep resampling the model

  • Remember, for every time we look at a system, we have some % of our CI not overlapping the true value

  • Each time we compare means, we have a chance of our CI not covering the true value

  • To minimize this possibility, we correct (widen) our CIs for this Family-Wise Error Rate

46 / 53

Solutions to Multiple Comparisons and Family-wise Error Rate?

  1. Ignore it -
    • Just a bunch of independent linear models
47 / 53

Solutions to Multiple Comparisons and Family-wise Error Rate?

  1. Ignore it -

    • Just a bunch of independent linear models
  2. Increase your CI given m = # of comparisons

    • If 1 - CI of interest = α
    • Bonferroni Correction α/=α/m
    • False Discovery Rate α/=kα/m where k is rank of test
47 / 53

Solutions to Multiple Comparisons and Family-wise Error Rate?

  1. Ignore it -

    • Just a bunch of independent linear models
  2. Increase your CI given m = # of comparisons

    • If 1 - CI of interest = α
    • Bonferroni Correction α/=α/m
    • False Discovery Rate α/=kα/m where k is rank of test
  3. Other multiple comparison corrections

    • Tukey's Honestly Significant Difference
    • Dunnett's Compare to Control
47 / 53

No Correction: Least Square Differences

contrast estimate conf.low conf.high
control - schizo 0.1913333 0.0544024 0.3282642
control - bipolar 0.2586667 0.1217358 0.3955976
schizo - bipolar 0.0673333 -0.0695976 0.2042642
48 / 53

Bonferroni Corrections

contrast estimate conf.low conf.high
control - schizo 0.1913333 0.0221330 0.3605337
control - bipolar 0.2586667 0.0894663 0.4278670
schizo - bipolar 0.0673333 -0.1018670 0.2365337
49 / 53

Tukey's Honestly Significant Difference

contrast estimate conf.low conf.high
control - schizo 0.1913333 0.0264873 0.3561793
control - bipolar 0.2586667 0.0938207 0.4235127
schizo - bipolar 0.0673333 -0.0975127 0.2321793
50 / 53

Visualizing Comparisons (Tukey)

51 / 53

Dunnett's Comparison to Controls

contrast estimate conf.low conf.high
schizo - control -0.1913333 -0.3474491 -0.0352176
bipolar - control -0.2586667 -0.4147824 -0.1025509

52 / 53

So, Categorical Models...

  • At the end of the day, they are just another linear model

  • We can understand a lot about groups, though

  • We can begin to see the value of queries/counterfactuals

ˆY=Xβ YN(ˆY,Σ)

53 / 53

So, We've Done this Linear Deliciousness

yi=β0+β1xi+ϵi ϵiN(0,σ)

2 / 53
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow