Null Hypothesis Testing: What's the probability that things are not influencing our data?
Model Comparison: Comparison of alternate hypotheses
Cross-Validation: How good are you at predicting new data?
Probabilistic Inference: What's our degree of belief in a data?
Model Comparison: Comparison of alternate hypotheses
Cross-Validation: How good are you at predicting new data?
Introduction to Likelihood
Maximum Likelihood Estimation
Maximum Likelihood and Linear Regression
Comparing Hypotheses with Likelihood
L(H|D)=p(D|H)
Where the D is the data and H is the hypothesis (model) including a both a data generating process with some choice of parameters (often called θ). The error generating process is inherent in the choice of probability distribution used for calculation.
First we have a Data Generating Process
This is our hypothesis about how the world works
ˆyi=β0+β1xi
First we have a Data Generating Process
This is our hypothesis about how the world works
ˆyi=β0+β1xi
Then we have a likelihood of the data given this hypothesis
This allows us to calculate the likelihood of observing our data given the hypothesis
Called the Likelihood Function
yi=N(ˆyi,σ)
Probability density functions are the most common
But, hey, ∑(yi−ˆyi)2 is one as well
Probability density functions are the most common
But, hey, ∑(yi−ˆyi)2 is one as well
Extremely flexible
Probability density functions are the most common
But, hey, ∑(yi−ˆyi)2 is one as well
Extremely flexible
The key is a function that can find a minimum or maximum value, depending on your parameters
What is the likelihood of a value of 1.5 given a hypothesized Normal distribution where the mean is 0 and the SD is 1?
What is the likelihood of a value of 1.5 given a hypothesized Normal distribution where the mean is 0 and the SD is 1.
What is the likelihood of a value of 1.5 given a hypothesized Normal distribution where the mean is 0 and the SD is 1.
L(μ=0,σ=1|Data=1.5)=dnorm(1.5,μ=0,σ=1)
Introduction to Likelihood
Maximum Likelihood Estimation
Maximum Likelihood and Linear Regression
Comparing Hypotheses with Likelihood
Let's say this is our data:
[1] 3.37697212 3.30154837 1.90197683 1.86959410 0.20346568 3.72057350 [7] 3.93912102 2.77062225 4.75913135 3.11736679 2.14687718 3.90925918[13] 4.19637296 2.62841610 2.87673977 4.80004312 4.70399588 -0.03876461[19] 0.71102505 3.05830349
Let's say this is our data:
[1] 3.37697212 3.30154837 1.90197683 1.86959410 0.20346568 3.72057350 [7] 3.93912102 2.77062225 4.75913135 3.11736679 2.14687718 3.90925918[13] 4.19637296 2.62841610 2.87673977 4.80004312 4.70399588 -0.03876461[19] 0.71102505 3.05830349
We know that the data comes from a normal population with a σ of 1.... but we want to get the MLE of the mean.
Let's say this is our data:
[1] 3.37697212 3.30154837 1.90197683 1.86959410 0.20346568 3.72057350 [7] 3.93912102 2.77062225 4.75913135 3.11736679 2.14687718 3.90925918[13] 4.19637296 2.62841610 2.87673977 4.80004312 4.70399588 -0.03876461[19] 0.71102505 3.05830349
We know that the data comes from a normal population with a σ of 1.... but we want to get the MLE of the mean.
p(D|θ)=∏p(Di|θ)
Let's say this is our data:
[1] 3.37697212 3.30154837 1.90197683 1.86959410 0.20346568 3.72057350 [7] 3.93912102 2.77062225 4.75913135 3.11736679 2.14687718 3.90925918[13] 4.19637296 2.62841610 2.87673977 4.80004312 4.70399588 -0.03876461[19] 0.71102505 3.05830349
We know that the data comes from a normal population with a σ of 1.... but we want to get the MLE of the mean.
p(D|θ)=∏p(Di|θ)
= ∏dnorm(Di,μ,σ=1)
MLE = 2.896
We use Log-Likelihood as it is not subject to rounding error, and approximately χ2 distributed.
Distribution of sums of squares of k data points drawn from N(0,1)
k = Degrees of Freedom
Measures goodness of fit
A large probability density indicates a match between the squared difference of an observation and expectation
The 68% CI of a χ2 distribution is 0.49, so....
The 95% CI of a χ2 distribution is 1.92, so....
Introduction to Likelihood
Maximum Likelihood Estimation
Maximum Likelihood and Linear Regression
Comparing Hypotheses with Likelihood
L(θ|Data)=n∏i=1N(Visitsi|β0+β1Resemblancei,σ)
where β0,β1,σ are elements of θ
nlm
and
optim
with the BFGS
method) uses derivativesoptim
’s default)nlm
and
optim
with the BFGS
method) uses derivativesoptim
’s default)
Likelihood:
Visitsi∼N(^Visitsi,σ)
Data Generating Process:
^Visitsi=β0+β1Resemblancei
puffer_glm <- glm(predators ~ resemblance, data = puffer, family = gaussian(link = "identity"))
puffer_glm <- glm(predators ~ resemblance, data = puffer, family = gaussian(link = "identity"))
puffer_glm <- glm(predators ~ resemblance, data = puffer, family = gaussian(link = "identity"))
GLM stands for Generalized Linear Model
We specify the error distribution and a 1:1 link between our data generating process and the value plugged into the error generating process
puffer_glm <- glm(predators ~ resemblance, data = puffer, family = gaussian(link = "identity"))
GLM stands for Generalized Linear Model
We specify the error distribution and a 1:1 link between our data generating process and the value plugged into the error generating process
If we had specified "log" would be akin to a log transformation.... sort of
To get a profile for a single paramter, we calculate the MLE of all other parameters at different estimates of our parameter of interest
This should produce a nice quadratic curve, as we saw before
This is how we get our CI and SE (although we usually assume a quadratic distribution for speed)
BUT - with more complex models, we can get weird valleys, multiple optima, etc.
Common sign of a poorly fitting model - other diagnostics likely to fail as well
tau = signed sqrt of difference from deviance
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1.925 | 1.506 | 1.278 | 0.218 |
resemblance | 2.989 | 0.571 | 5.232 | 0.000 |
Test Statistic is a Wald Z-Test Assuming a well behaved quadratic Confidence Interval
Introduction to Likelihood
Maximum Likelihood Estimation
Maximum Likelihood and Linear Regression
Comparing Hypotheses with Likelihood
Cross-Validation: How good are you at predicting new data?
Probabilistic Inference: What's our degree of belief in a data?
Compare p(D|θ1) versus p(D|θ2)
G=L(H1|D)L(H2|D)
Compare p(D|θ1) versus p(D|θ2)
G=L(H1|D)L(H2|D)
Compare p(D|θ1) versus p(D|θ2)
G=L(H1|D)L(H2|D)
Compare p(D|θ1) versus p(D|θ2)
G=L(H1|D)L(H2|D)
Most often, θ = MLE versus θ = 0
−2log(G) is χ2 distributed
A new test statistic: D=−2log(G)
=2[Log(L(H2|D))−Log(L(H1|D))]
A new test statistic: D=−2log(G)
=2[Log(L(H2|D))−Log(L(H1|D))]
It's χ2 distributed!
A new test statistic: D=−2log(G)
=2[Log(L(H2|D))−Log(L(H1|D))]
It's χ2 distributed!
We compare our slope + intercept to a model fit with only an intercept!
Note, models must have the SAME response variable
int_only <- glm(predators ~ 1, data = puffer)
We compare our slope + intercept to a model fit with only an intercept!
Note, models must have the SAME response variable
int_only <- glm(predators ~ 1, data = puffer)
Analysis of Deviance TableModel 1: predators ~ 1Model 2: predators ~ resemblance Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 19 422.95 2 18 167.80 1 255.15 1.679e-07 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Great for complex models (beyond lm)
Great for anything with an objective function you can minimize
AND, even lm has a likelihood!
Ideal for model comparison
As we will see, Deviance has many uses...
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |