At its core, with likelihood we are searching parameter space and attempting to maximize our likelihood of a model given the data. Let’s begin with a simple one parameter model of how the world works and a single data point. Sure, it’s trivial, but it demonstrates the core properties of likelihood.
Let’s assume we are sampling a field of flowers. As this is count data, we have assumed a Poisson distribution. We throw our plot down, and find 10 flowers. What is the MLE of \(\lambda\), the mean (and variance) of our POisson distribution?
As we’ve assumed our data is Poisson distributed, our **likelihood function* is that of a Poisson distribution. Now, we could write out the formula for the probability of a data point given a Poisson distribution (note L(H|D = p(D|H))), but, hey, these are just the probability density functions of each distribution! So, in this case, ’dpois(x, lambda)`.
Now, how do we search parameter space? We’ve got a Poisson distribution and an observation of 10. A reasonable set of values could be from 0 to 20. Why not!
At this point, we can simply vectorize our search for a maximum likelihood.
dpois(x = 10, lambda = 0:20)
## [1] 0.000000e+00 1.013777e-07 3.818985e-05 8.101512e-04 5.292477e-03
## [6] 1.813279e-02 4.130309e-02 7.098327e-02 9.926153e-02 1.185801e-01
## [11] 1.251100e-01 1.193781e-01 1.048373e-01 8.587015e-02 6.628184e-02
## [16] 4.861075e-02 3.409770e-02 2.299958e-02 1.498516e-02 9.466247e-03
## [21] 5.816307e-03
Well, great. We can squint at that output and find our Maximum Likelihood. But that’s a pain in the butt. We can instead use R intelligently to grab estimates for us.
lik_pois_10 <- dpois(x = 10, lambda = 0:20)
#maximum likelihood
max(lik_pois_10)
## [1] 0.12511
Yay! We have a likelihood. But…what’s the estimate? Now, we can use which
to figure this all out to get the index of the value of the vector of lambdas. But this gets…clunky. Here’s an example, just for your delictation.
#Index of the MLE
idx_mle <- which(lik_pois_10 == max(lik_pois_10))
idx_mle
## [1] 11
#OK, what's the MLE?
c(0:20)[idx_mle]
## [1] 10
And, of course, it would have been better practice to have a vector of lambda values to work on, etc., but, woof. This gets ugly and code-y fast.
dplyr
Why not keep all of our information in one nice tidy place and use dplyr
to get easy answers? We can make a data frame with a set of lambda values, calculate the likelihood, and then filter down to the MLE for our answer.
library(dplyr)
pois_10 <- data.frame(lambda_vals = 0:20) %>%
mutate(likelihood = dpois(x = 10, lambda = lambda_vals))
#Get the MLE
pois_10 %>%
filter(likelihood == max(likelihood))
## lambda_vals likelihood
## 1 10 0.12511
We can even use this to make a nice plot.
ggplot(pois_10, aes(x=lambda_vals, y=likelihood)) +
geom_point()
So, notice that we have a lot of values close to 0? Lots of room for rounding error and weird peaks. And while this distribtion was nicely shaped, due to these issues, the log-likelihood often looks much nicer (and leads to good test statistic performance).
How do we add that? We simply add the log=TRUE
argument to any probability density function.
pois_10 <- pois_10 %>%
mutate(log_likelihood = dpois(x = 10, lambda = lambda_vals, log=TRUE))
ggplot(pois_10, aes(x=lambda_vals, y=log_likelihood)) +
geom_point()
Note, I’m not logging after the fact. That’s because we’ll get different estimates if we log within the function versus after a really really small value is spit out.
This is all well and good, but, as we move into more complex models, and more data, we’re not going to be able to include a simple dpois()
statement or otherwise in our mutate. Instead, we need to think of the model as a whole, and write out a likelihood generating function. This will have other advantages, too, as we do things like profile, etc. The general struction of a likelihood function should be something like as follows:
my_likelihood_function <- function(observations, predictors, parameters){
#a data generating process that makes
#fitted values of your response variable
#a likelihood function that compares observations
#to predicted values
}
For example, in the case of our Poisson data, we can write something like so:
pois_likelihood <- function(y, lambda){
# Data Generating Process
y_hat <- lambda
#likelihood function
sum(dpois(y, lambda = y_hat, log = TRUE))
}
Why include the sum? Well, in this case, it makes the function scalable to look at many data points, as we will see below! If we were looking at the straight likelihood, we’d use prod()
instead. But, as we’ve talked about, go with `dat log-likelihood!
Here’s the beauty of a data set. The only two differences between the workflow for 1 point and many is first, that you use either prod()
(for likelihood) or sum()
(for log-likelihood) to get the total value. Second, as the density functions don’t take kindly to a vector of data and a vector of parameters, we’ll use rowwise()
to iterate over rows, but ungroup()
after for other operations. As we’ve written our likelihood functoin, as well, we’re really in good shape to just nail this.
Let’s try this for a random dataset that we generate from a Poisson distribution with a lambda of 10.
Note the tiny tiny changes in the calculation of the likelihood and log-likelihood.
set.seed(607)
pois_data <- rpois(10, lambda=10)
pois_mle <- data.frame(lambda_vals = 0:20) %>%
rowwise() %>%
mutate(log_likelihood = pois_likelihood(y = pois_data, lambda = lambda_vals)) %>%
ungroup()
#Plot the surface
qplot(lambda_vals, log_likelihood, data=pois_mle)
#Get the MLE
pois_mle %>%
filter(log_likelihood == max(log_likelihood))
## # A tibble: 1 x 2
## lambda_vals log_likelihood
## <int> <dbl>
## 1 9 -23.3
OK, given our assumption that the CI of each parameter is 1.92 away from the Maximum Log-Likelhood. Again, pretty straightfoward to get with filtering.
pois_mle %>%
filter(log_likelihood > (max(log_likelihood) - 1.92) )
## # A tibble: 3 x 2
## lambda_vals log_likelihood
## <int> <dbl>
## 1 8 -23.7
## 2 9 -23.3
## 3 10 -24.1
So, our CI is between 8 and 10.
dbinom()
here for the binomial distribution.The great thing about multiple parameters is that, heck, it’s no different than one parameter. Let’s use the seals example from last week and look at estimating the mean and SD of the seal age distribution. To start with, what’s the range of values we should choose to test? Let’s look at the data!
seals <- read.csv("data/17e8ShrinkingSeals Trites 1996.csv")
hist(seals$age.days)
Eyeballing this, I’d say the mean is between 3720 and 3740 and the SD is…oh, between 1280 and 1310.
OK, we’re going to need a likelihood function for this - one that takes a mean and a SD. Let’s do this!
norm_likelihood <- function(obs, mean_est, sd_est){
#data generating process
est <- mean_est
#log likelihood
sum(dnorm(obs, mean = est, sd = sd_est, log = TRUE))
}
Now, we can use crossing
in tidyr
to generate a set of parameters to test, and then literally nothing is different from the workflow before. Let’s just look at the Log Likelihood.
seal_dist <- crossing(m = seq(3720, 3740, by = 0.1),
s=seq(1280, 1310, by = 0.1)) %>%
rowwise() %>%
mutate(log_lik = norm_likelihood(obs = seals$age.days, mean_est = m, sd_est = s)) %>%
ungroup()
Notice that took a bit That’s because we searched 60501 combinations of values. And we didn’t even do that with high precision! And I gave you a really narrow band of possible values! If anything can convince you that we need an algorithmic solution rather than something brute force - I hope this does!
What’s the MLE?
seal_dist %>%
filter(log_lik == max(log_lik))
## # A tibble: 1 x 3
## m s log_lik
## <dbl> <dbl> <dbl>
## 1 3730. 1293. -82964.
And can we plot a super-sexy contour plot of the surface?
ggplot(seal_dist, aes(x=m, y=s, z=log_lik)) +
geom_contour(aes(color = ..level..))
Note that odd color thing enables you to see the contours. Yeah, I don’t like it, either.
So, how do we get profile likelihoods from this mess?
Simply put, for each variable we want the profile of, we look at the MLE estimate at each value of the other parameter. So, group by the other parameter and filter out the MLE. So for the mean -
seal_sd_profile <- seal_dist %>%
group_by(s) %>%
filter(log_lik == max(log_lik)) %>%
ungroup()
qplot(s, log_lik, data=seal_sd_profile, geom = "line")
OK, nice profile.
So, what about the CI? Unfortunately, even after filtering, we have a lot of values. We want to just look at the first and last. Weirdly, filter
lets us use row_number
as something to work on. So, let’s take advantage of that here!
##See the CI
seal_sd_profile %>%
filter(log_lik > max(log_lik) - 1.92) %>%
filter(row_number()==1 | row_number()==n())
## # A tibble: 2 x 3
## m s log_lik
## <dbl> <dbl> <dbl>
## 1 3730. 1280 -82965.
## 2 3730. 1310 -82966.
So, now we see the CI of our SD is from 1280 to 1310 - at least from this grid.
mean
and sd
to derive reasonable parameter ranges likelihood surface so you don’t take forever.glm
So, you want to fit a model with multiple parameters using Likelihood? We’ve seen that brute force is not the way. We need algorithms to search parameter space. R has multiple optimizer functions that seek to fund the minimum value of a function (likelihood or otherwise). Many require you to write your own likelihood function in order to fit, and this are incredibly flexible. But, for many workaday needs, we will use the function glm()
which stands for Generalized Linear Model (more on that at the end of the course) and uses the iteratively reweighted least squares algorithm.
glm
The great thing about glm()
is that it is incredibly similar to lm()
only - now you have to specify what distribution you’re drawing from, in order to get the likelihood correct. You should also specify the shape of the curve, as many different curves are possible now. Let’s take a look.
seals <- read.csv("./data/17e8ShrinkingSeals Trites 1996.csv")
seal_mle <- glm(length.cm ~ age.days,
family = gaussian(link = "identity"),
data = seals)
This looks pretty similar! Only, notice now we have this family
argument for a distribution. We’re going to go all gaussian
all the time for now. We also specify what’s called a link
function. Basically, how do we translate linear predictors into the shape of a curve. For now, we can just use "identity"
which means a 1:1 translation between our linear predictors and the shape of the curve.
So now that we have this fit object, how do we test our assumptions. Let’s remember what they are:
Let’s break these down one by one. There is no nice plot()
method here, so we’ll have to do it by hand.
First, fitted and residual. We can extract both of these from the model object and plot them using predict()
and residuals()
. Note, using predict()
with no new data merely get the fitted values.
fitted <- predict(seal_mle)
res <- residuals(seal_mle)
qplot(fitted, res)
If we wanted to add a spline, we could have used stat_smooth()
with no method argument.
Second, qq! Well, we have our residuals, so we can use qqnorm()
and `qqline()
as we did with t-tests. Or plot a histogram.
qqnorm(res)
qqline(res)
hist(res)
Again, looks good!
Now, what about looking at the surface? We don’t have a nice way of visualizing a 4-D surface, so, instead we look at the profile likelihoods of each parameter. For that, we use the profile
function which gets a profile of any function that can be accessed with the logLik()
function. Note, this might take some time.
library(MASS)
plot(profile(seal_mle))
Well that’s not what we expect. What is going on here? In essence, profile.glm()
takes the profile of a parameter, subtracts the optimized deviance (thus centering it at 0), and takes a signed square root. This produces a straight line from which it’s much easier to see deviations. Which is great in terms of diagnostics!
It’s not great if you want to see CIs and such. For that, we can use the profileModel
library. Let’s take a look.
library(profileModel)
prof <- profileModel(seal_mle,
objective = "ordinaryDeviance")
## Profiling for parameter (Intercept) ... Done
## Profiling for parameter age.days ... Done
Here we see that we can request how we want the deviance transformed. In this case, not at all. Let’s see how this looks.
plot(prof)
Much nicer. To see the sampling grid used:
plot(prof, print.grid.points = TRUE)
We can also look at the CIs here naturally.
prof <- profileModel(seal_mle,
objective = "ordinaryDeviance",
quantile = qchisq(0.95, 1))
## Preliminary iteration .. Done
##
## Profiling for parameter (Intercept) ... Done
## Profiling for parameter age.days ... Done
plot(prof)
OK, now that we’ve done this and feel comfortable with our fit, what are our coefficients?
summary(seal_mle)
##
## Call:
## glm(formula = length.cm ~ age.days, family = gaussian(link = "identity"),
## data = seals)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -19.8700 -3.8279 0.0304 3.7541 21.6874
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.158e+02 1.764e-01 656.37 <2e-16 ***
## age.days 2.371e-03 4.467e-05 53.06 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 32.26809)
##
## Null deviance: 402668 on 9664 degrees of freedom
## Residual deviance: 311807 on 9663 degrees of freedom
## AIC: 61009
##
## Number of Fisher Scoring iterations: 2
#or
broom::tidy(seal_mle)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 116. 0.176 656. 0
## 2 age.days 0.00237 0.0000447 53.1 0
First, let’s see how that compares to an lm()
fit.
broom::tidy(lm(length.cm ~ age.days, data=seals))
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 116. 0.176 656. 0
## 2 age.days 0.00237 0.0000447 53.1 0
Not bad. Pretty close. Note that glm()
uses something called a dispersion parameter
which is a general name for additional factors related to spread in the data. For a Gaussian model, it’s just the residual variance.
Note, the SEs are based on the assumption that the profiles are well behaved. If we want to get into more detail and change that assumption, we use confint()
.
#based off of profile
confint(seal_mle)
## 2.5 % 97.5 %
## (Intercept) 1.154211e+02 1.161124e+02
## age.days 2.283001e-03 2.458117e-03
So, how do we compare models? Unfortunately, there’s not an automated way. This is likelihood, so you have to specify your null or alternate models. You have to think. In this case, a null model would only have an intercept - a mean - term.
seal_mle_null <- glm(length.cm ~ 1,
family = gaussian(link = "identity"),
data = seals)
We can now compare these models using a LRT. Note that we use the anova()
function, and now have to specify what kind of test we want
anova(seal_mle_null, seal_mle, test = "LRT")
## Analysis of Deviance Table
##
## Model 1: length.cm ~ 1
## Model 2: length.cm ~ age.days
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 9664 402668
## 2 9663 311807 1 90862 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Yes! They are different!
To plot, we can use the same strategy as before, only, now we need to specify two things to stat_smooth()
. First, our method is now glm
, but second, we need to tell it about the family and link we’re using. stat_smooth
takes a list of arguments that are passed to the method. So, let’s try it out.
ggplot(seals,
mapping = aes(x = age.days, y = length.cm)) +
geom_point() +
stat_smooth(method = "glm", method.args = list(family = gaussian(link="identity")))
Works!
Let’s revisit the examples from our lm lab the other day.
Fist, the relationship between how lean you are and how quickly you lose fat. Implement this to get a sense ot the general workflow for analysis
fat <- read.csv("../data/17q04BodyFatHeatLoss Sloan and Keatinge 1973 replica.csv")
#initial visualization to determine if lm is appropriate
fat_plot <- ggplot(data=fat, aes(x=leanness, y=lossrate)) +
geom_point()
fat_plot
#fit that model
fat_mod <- glm(lossrate ~ leanness,
family = gaussian(link = "identity"),
data=fat)
#assumptions
fat_fit <- predict(fat_mod)
fat_res <- residuals(fat_mod)
qplot(fat_fit, fat_res)
qqnorm(fat_res)
qqline(fat_res)
plot(profile(fat_mod))
#LRT test of model
fat_mod_null <- glm(lossrate ~ 1,
family = gaussian(link = "identity"),
data=fat)
anova(fat_mod_null, fat_mod, test = "LRT")
#t-tests of parameters
summary(fat_mod)
For your first faded example, let’s look at the relationship between DEET and mosquito bites.
deet <- read.csv("../data/17q24DEETMosquiteBites.csv")
deet_plot <- ggplot(data=___, aes(x=dose, y=bites)) +
geom_point()
deet_plot
#fit that model
deet_mod <- glm(bites ~ dose,
family = gaussian(link = "______")
data=___)
#assumptions
deet_fit <- predict(____)
deet_res <- residuals(____)
qplot(deet_fit, deet_res)
qqnorm(_____)
qqline(_____)
plot(profile(____))
#f-tests of model
deet_mod_null <- glm(bites ~ ___,
family = gaussian(link = "______")
data=___)
anova(___, _____, test = "LRT")
#t-tests of parameters
summary(___)
Do longer lived species also have larger home ranges? Let’s test this!
zoo <- read.csv("../data/17q02ZooMortality Clubb and Mason 2003 replica.csv")
zoo_plot <- ggplot(data=___, aes(x=mortality, y=homerange)) +
___()
#fit that model
zoo_mod <- glm(___~___,
family = ____(____ = _____),
data = ____)
#assumptions
zoo_fit <- ____(____)
zoo_res <- ____(____)
qplot(____, ____)
qqnorm(____)
qqline(____)
plot(_____(____))
#LRT-tests of model
zoo_mod_null <- <- glm(___~___,
family = ____(____ = _____),
data = ____)
anova(___, _____, ________)
#z-tests of parameters
_____(___)
For this last one, I want you to also also try something. Try changing link = "identity"
to link = "log"
. What do you see? How does this compare to the log transformed model? Try also to ggplot
the results with the two different links!