Fitting Models with Likelihood!

Outline

Review of Likelihood
Comparing Models with Likelihood
Linear Regression with Likelihood

Deriving Truth from Data

Frequentist Inference: Correct conclusion drawn from repeated experiments
- Uses p-values and CIs as inferential engine
Likelihoodist Inference: Evaluate the weight of evidence for different hypotheses
- Derivative of frequentist mode of thinking
- Uses model comparison (sometimes with p-values…)
Bayesian Inference: Probability of belief that is constantly updated
- Uses explicit statements of probability and degree of belief for inferences

Likelihood: how well data support a given hypothesis.

Note: Each and every parameter choice IS a hypothesis

Likelihood Defined

\[\Large L(H | D) = p(D | H)\]

Where the D is the data and H is the hypothesis (model) including a both a data generating process with some choice of parameters (aften called $\theta$). The error generating process is inherent in the choice of probability distribution used for calculation.

Example of Maximum Likelihood Fit

Let’s say we have counted 10 individuals in a plot. Given that the population is Poisson distributed, what is the value of $\lambda$?

$$p(x) = \frac{\lambda^{x}e^{-\lambda}}{x!}$$
where we search all possible values of λ

Likelihood Function

\[\Large p(x) = \frac{\lambda^{x}e^{-\lambda}}{x!}\]

This is a Likelihood Function for one sample
- It is the Poisson Probability Density function
$Dpois = \frac{\lambda^{x}e^{-\lambda}}{x!}$

What is the probability of the data given the parameter?

p(a and b) = p(a)p(b)

$$p(D | \theta) = \prod_{i=1}^n p(d_{i} | \theta)$$

$$ = \prod_{i=1}^n \frac{\theta^{x_i}e^{-\theta}}{x_!}$$

Outline

Review of Likelihood
Comparing Models with Likelihood
Linear Regression with Likelihood

Can Compare p(data | H) for alternate Parameter Values

Compare $p(D|\theta_{1})$ versus $p(D|\theta_{2})$

Likelihood Ratios

\[\LARGE G = \frac{L(H_1 | D)}{L(H_2 | D)}\]

G is the ratio of Maximum Likelihoods from each model
Used to compare goodness of fit of different models/hypotheses
Most often, $\theta$ = MLE versus $\theta$ = 0
$-2 log(G)$ is $\chi^2$ distributed

Likelihood Ratio Test

A new test statistic: $D = -2 log(G)$
$= 2 [Log(L(H_2 | D)) - Log(L(H_1 | D))]$
It’s $\chi^2$ distributed!
- DF = Difference in # of Parameters
If $H_1$ is the Null Model, we have support for our alternate model

Likelihood Ratio at Work

\[G = \frac{L(\lambda = 14 | D)}{L(\lambda = 17 | D)}\]
=0.0494634

Likelihood Ratio Test at Work

\[D = 2 [Log(L(\lambda = 14 | D)) - Log(L(\lambda = 17 | D))]\] =6.0130449 with 1DF
p =0.01

Outline

Review of Likelihood
Comparing Models with Likelihood
Linear Regression with Likelihood

Putting Likelihood Into Practice with Pufferfish

Pufferfish are toxic/harmful to predators
Batesian mimics gain protection from predation
Evolved response to appearance?
Researchers tested with mimics varying in toxic pufferfish resemblance

Does Resembling a Pufferfish Reduce Predator Visits?

The Steps of Statistical Modeling

What is your question?
What model of the world matches your question?
Build a test
Evaluate test assumptions
Evaluate test results
Visualize

The World of Pufferfish

Data Generating Process:

\[Visits \sim Resemblance\]
Assume: Linearity (reasonable first approximation)

Error Generating Process:

Variation in Predator Behavior
Assume: Normally distributed error (also reasonable)

Quantiative Model of Process Using Likelihood

Likelihood:
$Visits_i \sim \mathcal{N}(\hat{Visits_i}, \sigma)$

Data Generating Process:
$\hat{Visits_i} = \beta_{0} + \beta_{1} Resemblance_i$

Likelihood Function for Linear Regression

Will often see:

$\large L(\theta | D) = \prod_{i=1}^n p(y_i\; | \; x_i;\ \beta_0, \beta_1, \sigma)$

Likelihood Function for Linear Regression

\[L(\theta | Data) = \prod_{i=1}^n \mathcal{N}(Visits_i\; |\; \beta_{0} + \beta_{1} Resemblance_i, \sigma)\]

where $\beta_{0}, \beta_{1}, \sigma$ are elements of $\theta$

Quantiative Model of Process Using Likelihood

Likelihood:
$Visits_i \sim \mathcal{N}(\hat{Visits_i}, \sigma)$

Data Generating Process:
$\hat{Visits_i} = \beta_{0} + \beta_{1} Resemblance_i$

Fit Your Model!

puffer_glm <- glm(predators ~ resemblance, 
                  data = puffer,
                  family = gaussian(link = "identity"))

The Same Diagnostics

But - What do the Likelihood Profiles Look Like?

Are these nice symmetric slices?

Sometimes Easier to See with a Straight Line

tau = signed sqrt of difference from deviance

Evaluate Coefficients

term	estimate	std.error	statistic	p.value
(Intercept)	1.925	1.506	1.278	0.218
resemblance	2.989	0.571	5.232	0.000

Test Statistic is a Wald Z-Test Assuming a well behaved quadratic Confidence Interval

Confidence Intervals

Quadratic Assumption

	2.5 %	97.5 %
(Intercept)	-1.028	4.877
resemblance	1.870	4.109

Spline Fit to Likelihood Surface

	2.5 %	97.5 %
(Intercept)	-1.028	4.877
resemblance	1.870	4.109

To test the model, need an alternate hypothesis

Put it to the Likelihood Ratio Test!

Analysis of Deviance Table

Model 1: predators ~ 1
Model 2: predators ~ resemblance
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1        19     422.95                          
2        18     167.80  1   255.15 1.679e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Compare to Linear Regression

Likelihood:

term	estimate	std.error	statistic	p.value
(Intercept)	1.925	1.506	1.278	0.218
resemblance	2.989	0.571	5.232	0.000

Least Squares

term	estimate	std.error	statistic	p.value
(Intercept)	1.925	1.506	1.278	0.218
resemblance	2.989	0.571	5.232	0.000

Compare to Linear Regression: F and Chisq

Likelihood:

Resid. Df	Resid. Dev	Df	Deviance	Pr(>Chi)
19	422.950
18	167.797	1	255.153	0

Least Squares

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
resemblance	1	255.1532	255.153152	27.37094	5.64e-05
Residuals	18	167.7968	9.322047