Assumptions: Is our fit valid?
How did we fit this model?
How do we draw inference from this model?
(sensu Richard McElreath)
Validity
Representativeness
Model captures features in the data
Additivity and Linearity
Independence of Errors
Equal Variance of Errors
Normality of Errors
Minimal Outlier Influence
What if predator approaches is not a good measure of recognition? Or mimics just don't look like fish?
Always question if you did a good job sampling
Use natural history and the literature to get the bounds of values
If experimenting, make sure your treatment levels are representative
If you realize post-hoc they are not, qualify your conclusions
Does the model seem to fit the data? Are there any deviations? Can be hard to see...
Is anything off?
Solutions: Nonlinear transformations or a better model!
Are all replicates TRULY independent
Did they come from the same space, time, etc.
Non-independence can introduce BIAS
Incoporate Non-independence into models (many methods)
Shapes (cones, footballs, etc.) with no bias in fitted v. residual relationship
A linear relationship indicates an additivity problem
Can solve with a better model (more predictors)
Can solve with weighting by X values, if source of heteroskedasticity known
Minor problem for coefficient estimates
Major problem for doing inference and prediction as it changes error
We assumed ϵi∼N(0,σ) - but is that right?
Can assess with a QQ-plot
Again, minor problem for coefficient estimates
Major problem for doing inference and prediction, as it changes error
Shapiro-Wilk normality testdata: residuals(wolf_mod)W = 0.9067, p-value = 0.02992
Want no values > 1
Are they real?
Do they indicate a problem or a nonlinearity?
Remove only as a dead last resort
If from a nonlinearity, consider transformation
Validity: only you know!
Representativeness: look at nature
Model captures features in the data: compare model v. data!
Additivity and Linearity: compare model v. data!
Independence of Errors: conisder sampling design
Equal Variance of Errors: evaluate res-fit
Normality of Errors: evaluate qq and levene test
Minimal Outlier Influence: evaluate Cook's D
Assumptions: Is our fit valid?
How did we fit this model?
How do we draw inference from this model?
Least Squares
Likelihood
Bayesian
ˆY=β0+β1X+ϵ where β0 = intercept, β1 = slope
Minimize Residuals defined as SSresiduals=∑(Yi−ˆY)2
b=sxys2x =cov(x,y)var(x)
b=sxys2x =cov(x,y)var(x)
=rxysysx
Least squares regression line always goes through the mean of X and Y
ˉY=β0+β1ˉX
Least squares regression line always goes through the mean of X and Y
ˉY=β0+β1ˉX
β0=ˉY−β1ˉX
Assumptions: Is our fit valid?
How did we fit this model?
How do we draw inference from this model?
Deductive Inference: A larger theory is used to devise
many small tests.
Inductive Inference: Small pieces of evidence are used to shape a larger theory and degree of belief.
Null Hypothesis Testing: What's the probability that things are not influencing our data?
Cross-Validation: How good are you at predicting new data?
Model Comparison: Comparison of alternate hypotheses
Probabilistic Inference: What's our degree of belief in a data?
Falsification of hypotheses is key!
A theory should be considered scientific if, and only if, it is falsifiable.
Falsification of hypotheses is key!
A theory should be considered scientific if, and only if, it is falsifiable.
Look at a whole research program and falsify auxilliary hypotheses
https://plato.stanford.edu/entries/lakatos/#ImprPoppScie
What if our hypothesis was that the resemblance-predator relationship was 2:1. We know our SE of our estimate is 0.57, so, we have a distribution of what we could observe.
BUT - our estimated slope is 3.
We want to know if we did this experiment again and again, what's the probability of observing what we saw or worse (frequentist!)
We want to know if we did this experiment again and again, what's the probability of observing what we saw or worse (frequentist!)
Probability = 0.04
We want to know if we did this experiment again and again, what's the probability of observing what we saw or worse (frequentist!)
Probability = 0.04
P-value: The Probability of making an observation or more extreme observation given that the null hypothesis is true.
We use our data to calculate a test statistic that maps to a value of the null distribution.
We can then calculate the probability of observing our data, or of observing data even more extreme, given that the null hypothesis is true.
P(X≤Data|H0)
Most people don't understand it.
Like SE, it gets smaller with sample size!
Most people don't understand it.
Like SE, it gets smaller with sample size!- Neyman-Pearson Null Hypothesis Significance Testing
Most people don't understand it.
Like SE, it gets smaller with sample size!- Neyman-Pearson Null Hypothesis Significance Testing
We don't know how to talk about it
Based on your experimental design, what is a reasonable range of p-values to expect if the null is false
Smaller p values indicate stronger support for rejection, larger ones weaker. Use that language.
Accumulate multiple lines of evidence so that the entire edifice of your research does not rest on a single p-value!!!!
There is a 6.1% chance of obtaining the observed data or more extreme data given that the null hypothesis is true.
If you choose to reject the null, you have a ~ 1 in 16 chance of being wrong
Are you comfortable with that?
OR - What other evidence would you need to make you more or less comfortable?
Does my model explain variability in the data?
Are my coefficients not 0?
Ho = The model predicts no variation in the data.
Ha = The model predicts variation in the data.
Ho = The model predicts no variation in the data.
Ha = The model predicts variation in the data.
To evaluate these hypotheses, we need to have a measure of variation explained by data versus error - the sums of squares!
Ho = The model predicts no variation in the data.
Ha = The model predicts variation in the data.
To evaluate these hypotheses, we need to have a measure of variation explained by data versus error - the sums of squares! SSTotal=SSRegression+SSError
Distance from ˆy to ˉy
SSR=∑(^Yi−ˉY)2, df=1
SSE=∑(Yi−ˆYi)2, df=n-2
SSR=∑(^Yi−ˉY)2, df=1
SSE=∑(Yi−ˆYi)2, df=n-2
To compare them, we need to correct for different DF. This is the Mean Square.
MS=SS/DF
e.g, MSE=SSEn−2
F=MSRMSE with DF=1,n-2
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
resemblance | 1 | 255.1532 | 255.153152 | 27.37094 | 5.64e-05 |
Residuals | 18 | 167.7968 | 9.322047 | NA | NA |
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
resemblance | 1 | 255.1532 | 255.153152 | 27.37094 | 5.64e-05 |
Residuals | 18 | 167.7968 | 9.322047 | NA | NA |
We reject the null hypothesis that resemblance does not explain variability in predator approaches
T-tests evaluate whether coefficients are different from 0
Often, F and T agree - but not always
xkcd
SEb=√MSESSX
(~ 1.96 when N is large)
tb=b−β0SEb
H0:β0=0, but we can test other hypotheses
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 1.924694 | 1.5064163 | 1.277664 | 0.2176012 |
resemblance | 2.989492 | 0.5714163 | 5.231724 | 0.0000564 |
We reject the hypothesis of no slope for resemblance, but fail to reject it for the intercept.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |