Believe it or not, despite all of the complexity under the hood, fitting a linear model in R with least squares is quite simple with a straightfoward workflow.
Let’s go through each step with an example of seals. Are older seals larger?
## ## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats': ## ## filter, lag
## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union
library(ggplot2) seals <- read.csv("./data/06/17e8ShrinkingSeals Trites 1996.csv") seal_base <- ggplot(seals, aes(x=age.days, y=length.cm)) + geom_point() + theme_grey(base_size=14) seal_base
Neat data set, no?
Now, looking at this from the get-go, we can see it’s likely nonlinear. Maybe even non-normal! Let’s ignore that for now, as it will make the results great fodder for our diagnostics!
OK, so, you have the data. And in your model, you want to see how age is a predictor of length. How do you fit it?
Nothing could be simpler. R has a function called
lm() which stands for linear model. It works just like base plot, with the
y ~ x syntax. And we create a fit model object.
seal_lm <- lm(length.cm ~ age.days, data=seals)
Now, if we want to just peak at the fit, before going forward, we can use
coef() which is pretty standard across all R fit model functions.
## (Intercept) age.days ## 1.157668e+02 2.370559e-03
But that’s getting ahead of ourselves…
R also provides a 1-stop shop for evaluating functions. Fit model objects can typically be plotted. Now, it uses base plot, so, we’ll use the
par function to setup a 2x2 plotting area.
par(mfrow = c(2,2)) #2 rows, 2 columns plot(seal_lm)