For this lab, see the etherpad at https://etherpad.wikimedia.org/p/607-cv_bayes-2022
Last week, we fit parameters with likelihood. To give you a hands-on feel for Bayesian data analysis let’s do the same thing with Bayes. We’re going to fit a single parameter - lambda from a poisson distribution - with Bayes instead of likelihood (although as you’ll see likelihood is a part of it!).
Let’s say you start with this data:
pois_data <- c(6, 6, 13, 7, 11, 10, 9, 7, 7, 12)
Now, how do you get the Bayesian estimate and credible intervals for this?
Now, in Bayesian data analysis, according to Bayes theorem
\[p(\lambda | data) = \frac{p(data | \lambda)p(\lambda)}{p(data)}\]
To operationalize this, we can see three things we need to either provide or calculate
The likelihood of each choice of lambda.
The prior probability of each choice of lambda.
The marginal distribution - i.e., the sum of the product of p(D|H)p(H), as this is a discrete distribution.
What’s great is that we can fold 1 and 2 into a function! We already
wrote a log likelihood function for this very scenario last week, and
now all we need to do is include a prior. In essence, any function for
Baysian grid sampling includes
1) A Data Generating Process
2) A Set of Priors
3) A Likelihood
Then we either multiple the likelihood and prior - or sum the log likelihood and log prior.
bayes_pois <- function(y, lambda_est){
#our DGP
lambda <- lambda_est
#Our Prior
prior <- dunif(lambda_est, 6, 12)
#Our Likelihood
lik <-prod(dpois(y, lambda))
#The numerator of our posterior
return(lik * prior)
}
Great, so, what range of lambdas should we test? Well, since our range of data is from 6-12, we can reasonably assume lambda must be between 6 and 12. Could be outside of that, but it’s a reasonable suggestion. We can then sample using our function and dplyr.
library(dplyr)
library(tidyr)
bayes_analysis <- data.frame(lambda = 6:13) %>%
rowwise() %>%
mutate(posterior_numerator = bayes_pois(pois_data, lambda)) %>%
ungroup()
OK, now let’s get our posterior! To do that, we just divide the numerator of our posterior by the marginal probability - which is just the sum of that numerator!
bayes_analysis <- bayes_analysis %>%
mutate(posterior = posterior_numerator/sum(posterior_numerator))
Let’s take a look!
library(ggplot2)
ggplot(data=bayes_analysis, mapping = aes(x = lambda, y = posterior)) +
geom_bar(stat="identity")
Or we can look at a table
lambda | posterior_numerator | posterior |
---|---|---|
6 | 0 | 0.0014172 |
7 | 0 | 0.0500958 |
8 | 0 | 0.2885012 |
9 | 0 | 0.4155515 |
10 | 0 | 0.2006038 |
11 | 0 | 0.0399899 |
12 | 0 | 0.0038406 |
13 | 0 | 0.0000000 |
From this table, we can see that the 90%CI is wide - ranges from 6-13. That’s because we have a weak prior and not much data. Now, what if we’d had a stronger prior? Maybe a normal distribution centered on 10 with a SD of 1
bayes_pois_strong_prior <- function(y, lambda_est){
#our DGP
lambda <- lambda_est
#Our Prior
prior <- dnorm(lambda_est, 10,1)
#Our Likelihood
lik <-prod(dpois(y, lambda))
#The numerator of our posterior
return(lik * prior)
}
bayes_analysis <- bayes_analysis %>%
rowwise() %>%
mutate(posterior_numerator_strong = bayes_pois_strong_prior(pois_data, lambda)) %>%
ungroup() %>%
mutate( posterior_strong = posterior_numerator_strong / sum(posterior_numerator_strong))
ggplot(data=bayes_analysis) +
geom_area(alpha=0.5, fill="red", mapping=aes(x=lambda, y=posterior)) +
geom_area(alpha=0.5, fill="blue", mapping=aes(x=lambda, y=posterior_strong)) +
ggtitle("Red = Flat Priot, Blue = Informative Prior")
knitr::kable(bayes_analysis %>% select(lambda, posterior, posterior_strong))
lambda | posterior | posterior_strong |
---|---|---|
6 | 0.0014172 | 0.0000009 |
7 | 0.0500958 | 0.0010764 |
8 | 0.2885012 | 0.0755171 |
9 | 0.4155515 | 0.4874885 |
10 | 0.2006038 | 0.3879948 |
11 | 0.0399899 | 0.0469127 |
12 | 0.0038406 | 0.0010053 |
13 | 0.0000000 | 0.0000043 |
A noticable difference. The 90% CI is now from 8-11, and the 80% is even narrower.
What’s super near about this is that you can simulate samples from your posterior density. Say, draw 100 sampled lambdas, then, for each lambda, draw a sample of 10 random numbers (as in our initial distribution). We can then see how these posterior predictive distributions compare to the original.
nsims <- 10
posterior_sims <- data.frame(sampled_lambda = sample(6:13, size = nsims,
replace=TRUE,
prob = bayes_analysis$posterior),
sim = 1:nsims) %>%
group_by(sim) %>%
nest() %>%
mutate(predicted_values = purrr::map(data, ~rpois(10, .$sampled_lambda))) %>%
unnest(predicted_values) %>%
ungroup()
ggplot() +
geom_density(data=posterior_sims, mapping=aes(group = sim, x=predicted_values), color="black") +
geom_density(mapping=aes(x=pois_data), fill="lightblue", color = "red", alpha = 0.7) +
theme_bw()
Today we’re going to go through fitting and evaluating a linear
regression fit using Bayesian techiniques. For that, we’re going to use
the rstanarm
library which uses STAN to perform the MCMC simulations.
We’ll use the seal linear regression as an example.
library(brms)
library(tidybayes)
library(ggdist)
seals <- read.csv("http://biol607.github.io/lab/data/17e8ShrinkingSeals%20Trites%201996.csv")
head(seals)
## age.days length.cm
## 1 5337 131
## 2 2081 123
## 3 2085 122
## 4 4299 136
## 5 2861 122
## 6 5052 131
Note that when you loaded brms
it gave you some warnings
bout wanting to use more cores. This is great - MCMC is one of those
places where using all of your computer’s cores (most these days have at
least two) can really speed things along. And the
parallelization is done for you!
options(mc.cores = parallel::detectCores())
The basic steps of fitting a linear regression using Bayesian techniques (presuming you’ve already settled on a linear data generating process and a normal error generating process) are as follows.
1. Fit the model
2. Assess convergence of chains
3. Evaluate posterior distributions
4. Check for model misspecification (fit v. residual, qq plot)
5. Evaluate simulated residual distributions
6. Evaluate simulated fit versus observed values
7. Compare posterior predictive simulations to observed values
8. Visualize fit and uncertainty
To begin, let’s define our model. By default, though, it sets relatively flat priors for slopes and an intercept prior ~ N(0,10)). No intercept is set for the SD, as the SD results from your choice of slope and intercept.
To specify that we’re using a gaussian error, simply set the
family
argument to gaussian()
- evertyhing
else is handled for you.
set.seed(607)
seal_lm_bayes <- brm(length.cm ~ age.days,
data = seals,
family=gaussian(), file = "lab/brms_fits/seal_lm_bayes")
Note the output - you can see you’re doing something! And you get a sense of speed. And you can see the multiple cores going!
Before we diagnose whether we have a good model or not, we want to
make sure that our MCMC chains have converged properly so that we can
feel confident in our ability to assess the model. Now,
rstanarm
usually runs until you have reached convergence,
as the models it works with are pretty straightforward. But, good to
check.
We’re going to check a few diagnostics:
Diagnostic | Fix |
---|---|
Did your chains converge? | More iterations, check model |
Are your posteriors well-behaved? | Longer burning, more interations, check model & priors |
Are samples in your chains uncorrelated? | Change your thinning interval |
So, first, did your model converge? The easiest way to see this is to plot the trace of your four chains. We can use `plot’ to see both how well our model converged and the distribution of each parameter.
plot(seal_lm_bayes)
Note, there’s a variable
argument, so, you can look at
just one chain if you want.
get_variables(seal_lm_bayes)
## [1] "b_Intercept" "b_age.days" "sigma" "lprior"
## [5] "lp__" "accept_stat__" "stepsize__" "treedepth__"
## [9] "n_leapfrog__" "divergent__" "energy__"
plot(seal_lm_bayes, variable = "b_Intercept")
This looks pretty good, and those posteriors look pretty normal!
You can also look at just the chains by turning our fit model into a
posterior data frame using tidybayes::gather_draws()
or
tidybayes::spread_draws()
. Or we can use
brms::as_draws()
to get an mcmc object that other packages
use.
#this is for a certain kind of object
seal_posterior_mcmc <- as_draws(seal_lm_bayes, add_chain = T)
seal_posterior <- gather_draws(seal_lm_bayes, b_age.days, b_Intercept)
seal_posterior_wide <- spread_draws(seal_lm_bayes, b_age.days, b_Intercept)
head(seal_posterior_mcmc)
## # A draws_list: 1000 iterations, 4 chains, and 5 variables
##
## [chain = 1]
## $b_Intercept
## [1] 115 116 116 115 116 115 116 116 116 116
##
## $b_age.days
## [1] 0.0024 0.0023 0.0023 0.0024 0.0024 0.0024 0.0023 0.0023 0.0024 0.0024
##
## $sigma
## [1] 5.7 5.6 5.7 5.7 5.7 5.7 5.7 5.7 5.7 5.7
##
## $lprior
## [1] -5.4 -5.4 -5.4 -5.4 -5.4 -5.4 -5.4 -5.4 -5.4 -5.4
##
##
## [chain = 2]
## $b_Intercept
## [1] 115 116 116 116 116 116 116 116 116 116
##
## $b_age.days
## [1] 0.0025 0.0024 0.0024 0.0024 0.0023 0.0023 0.0024 0.0024 0.0024 0.0024
##
## $sigma
## [1] 5.7 5.7 5.7 5.7 5.7 5.7 5.7 5.7 5.7 5.7
##
## $lprior
## [1] -5.4 -5.4 -5.4 -5.4 -5.4 -5.4 -5.4 -5.4 -5.4 -5.4
##
## # ... with 990 more iterations, and 2 more chains, and 1 more variables
head(seal_posterior)
## # A tibble: 6 × 5
## # Groups: .variable [1]
## .chain .iteration .draw .variable .value
## <int> <int> <int> <chr> <dbl>
## 1 1 1 1 b_age.days 0.00244
## 2 1 2 2 b_age.days 0.00231
## 3 1 3 3 b_age.days 0.00234
## 4 1 4 4 b_age.days 0.00243
## 5 1 5 5 b_age.days 0.00239
## 6 1 6 6 b_age.days 0.00243
head(seal_posterior_wide)
## # A tibble: 6 × 5
## .chain .iteration .draw b_age.days b_Intercept
## <int> <int> <int> <dbl> <dbl>
## 1 1 1 1 0.00244 115.
## 2 1 2 2 0.00231 116.
## 3 1 3 3 0.00234 116.
## 4 1 4 4 0.00243 115.
## 5 1 5 5 0.00239 116.
## 6 1 6 6 0.00243 115.
Then, you can make whatever plot you want!
ggplot(seal_posterior,
aes(x = .iteration, y = .value, color = as.factor(.chain))) +
geom_line(alpha = 0.4) +
facet_wrap(vars(.variable), scale = "free_y") +
scale_color_manual(values = c("red", "blue", "orange", "yellow"))
You can assess convergence by examining the Rhat values in
rhat(seal_lm_bayes)
. These values are something called the
Gelman-Rubin statistic, and it should be at or very close to 1. You can
also do fun things to visualize them with bayesplot
library(bayesplot)
rhat(seal_lm_bayes)
## b_Intercept b_age.days sigma lprior lp__
## 0.9997210 0.9998069 1.0031777 1.0034689 1.0002511
mcmc_rhat(rhat(seal_lm_bayes))
Last, we want to look at autocorrelation within our chains. We want values in the thinned chain to be uncorelated with each other. So, while they’d have an autocorrelation that might be high at a distance of 1, this should drop to near zero very quickly. If not, we need a different thinning interval.
mcmc_acf(seal_lm_bayes)
What do you do if you have funky errors? 1) Check your model for
errors/bad assumptions. Just visualize the data! 2) Check your priors to
see if they are of and try something different. Maybe a uniform prior
was a bad choice, and a flat normal is a better idea. (I have made this
error) 3) Try different algorithms found in ?stan_glm
but
make sure you read the documentation to know what you are doing. 4) Dig
deeper into the docs and muck with some of the guts of how the algoright
works. Common fixes include the following, but there are so many more -
iter
to up from 2000 to more iterations -
warmpup
to change the burnin period - thin
to
step up the thinning - adapt_delta
to change the acceptance
probability of your MCMC chains.
Also, as a final note, using shinystan
you can create a
nice interactive environment to do posterior checks.
So, your MCMC is ok. What now? Well, we have our usual suite of model diagnostics - but they’re just a wee bit different given that we have simulated chains we’re working with. So, here’s what we’re going to look at:
Diagnostic | Probable Error |
---|---|
Fitted v. residual | Check linearity, error |
QQ Plot | Check error distribution |
Simulated residual histograms | Check error distribution |
Simulated Fit v. Observed | Check linearity |
Reproduction of Sample Properties | Respecify Model |
Outlier Analysis | Re-screen data |
It’s all there with performance
!
library(performance)
check_model(seal_lm_bayes)
As with all diagnostics, failures indicate a need to consider respecifying a model and/or poor assumptions in the error generating process.
Last, we can look for outliers using loo
- leave one out
- and their influence on our HMC process We use a particular form of
this function as there are a few.
plot(loo(seal_lm_bayes), label_points = TRUE, cex=1.1)
Points with a score >0.5 are a bit worrisoome, and scores >1.0 have a large deal of leverage and should be examined.
OK, phew we have gotten through the model checking stage. Now, what does our model tell us?
#adding extra digits as some of these are quite small
summary(seal_lm_bayes, digits=5)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: length.cm ~ age.days
## Data: seals (Number of observations: 9665)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 115.77 0.17 115.43 116.11 1.00 3744 2856
## age.days 0.00 0.00 0.00 0.00 1.00 4144 2880
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 5.68 0.04 5.59 5.76 1.00 1498 1251
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Wow. No p-values, no nothing. We have, in the first block, the mean, SD (assuming a gaussian distribution of posteriors), and some of the quantiles of the parameters.
We can also visualize coefficients. Here we’ll use
tidybayes
which has some really cool plotting tools.
library(tidybayes)
library(ggdist)
post_ggplot <- seal_lm_bayes |>
gather_draws(b_Intercept, b_age.days, sigma) |>
ggplot(aes(x = .value, y = .variable)) +
stat_halfeye( .width = c(0.8, 0.95)) +
ylab("") +
ggtitle("Posterior Medians with 80% and 95% Credible Intervals")
post_ggplot
## Warning: Using the `size` aesthietic with geom_segment was deprecated in ggplot2 3.4.0.
## ℹ Please use the `linewidth` aesthetic instead.
Woof - things are so different, we can’t get a good look in there. Let’s try looking at just one parameter.
seal_lm_bayes |>
gather_draws(b_age.days) |>
ggplot(aes(x = .value, y = .variable)) +
stat_halfeye( .width = c(0.8, 0.95)) +
ylab("") +
ggtitle("Posterior Medians with 80% and 95% Credible Intervals")
That’s an interesting look! We might also want to see how things
differ by chain. To do that (and we can incorporate this for looking at
multiple parameters), let’s try ggridges
.
library(ggridges)
seal_lm_bayes |>
gather_draws(b_age.days) |>
ggplot(aes(x = .value, y = .chain, fill = factor(.chain))) +
geom_density_ridges(alpha = 0.5)
That looks pretty good, and we can see how similar our posteriors are
to one another. FYI, I think ggridges
and
geom_halfeyeh
are two pretty awesome ways of looking at
Bayesian Posteriors. But that just might be me.
If we want to know more about the posteriors, we have to begin to explore the probability densities from the chains themselves. To get the chains, we convert the object into a data frame.
seal_chains <- as.data.frame(seal_lm_bayes)
head(seal_chains)
## b_Intercept b_age.days sigma lprior lp__
## 1 115.4480 0.002441275 5.676816 -5.400236 -30506.90
## 2 116.0561 0.002306863 5.633530 -5.391457 -30507.28
## 3 115.9654 0.002336190 5.686337 -5.399981 -30506.24
## 4 115.4900 0.002429601 5.698174 -5.403815 -30506.63
## 5 115.6376 0.002390062 5.698978 -5.403945 -30505.87
## 6 115.4837 0.002425535 5.707371 -5.405721 -30507.07
We can now do really interesting things, like, say, as what is the weight of the age coefficient that is less than 0? To get this, we need to know the number of entries in the chain that are <0, and then divide that total by the total length of the chains.
sum(seal_chains$b_age.days<0)/nrow(seal_chains)
## [1] 0
Oh, that’s 0. Let’s test something more interesting. How much of the PPD of the slope is between 0.00229 and 0.00235?
sum(seal_chains$b_age.days>0.00229 &
seal_chains$b_age.days<0.00235) / nrow(seal_chains)
## [1] 0.2965
28.9% - nice chunk. We can also look at some other properties of that chain:
mean(seal_chains$b_age.days)
## [1] 0.002369251
median(seal_chains$b_age.days)
## [1] 0.002368962
To get the Highest Posteriod Density Credible Intervals (often called the HPD intervals)
posterior_interval(seal_lm_bayes)
## 2.5% 97.5%
## b_Intercept 1.154308e+02 1.161129e+02
## b_age.days 2.283042e-03 2.457137e-03
## sigma 5.594510e+00 5.756990e+00
## lprior -5.412882e+00 -5.385790e+00
## lp__ -3.050986e+04 -3.050529e+04
Yeah, zero was never in the picture.
This is all well and good, but, how does our model fit? Cn we actually see how well the model fits the data, and how well the data generating process fits the data relative to the overall uncertainty.
To visualize, we have coefficient estimates. We can use good olde
ggplot2
along with the geom_abline()
function
to overlay a fit onto our data.
library(ggplot2)
#the data
seal_plot <- ggplot(data = seals,
mapping=aes(x = age.days, y = length.cm)) +
geom_point(size=2)
#add the fit
seal_plot +
geom_abline(intercept = fixef(seal_lm_bayes)[1], slope = fixef(seal_lm_bayes)[2],
color="red")
This is great, but what if we want to see the CI of the fit? Rather
than use an area plot, we can actually use the output of the chains to
visualize uncertainty. seal_chains
contains simulated
slopes and intercepts. Let’s use that.
seal_plot +
geom_abline(intercept = seal_chains[,1], slope = seal_chains[,2], color="grey", alpha=0.6) +
geom_abline(intercept = fixef(seal_lm_bayes)[1], slope = fixef(seal_lm_bayes)[2], color="red")
We can see the tightness of the fit, and that we have high confidence in the output of our model.
If you didn’t want to work with the raw chains, there’s also a great
tool in tidybayes::add_linpred_draws()
.
library(modelr)
seal_fit <- data_grid(seals, age.days = seq_range(age.days, 100)) |>
add_linpred_draws(seal_lm_bayes)
head(seal_fit)
## # A tibble: 6 × 6
## # Groups: age.days, .row [1]
## age.days .row .chain .iteration .draw .linpred
## <dbl> <int> <int> <int> <int> <dbl>
## 1 958 1 NA NA 1 118.
## 2 958 1 NA NA 2 118.
## 3 958 1 NA NA 3 118.
## 4 958 1 NA NA 4 118.
## 5 958 1 NA NA 5 118.
## 6 958 1 NA NA 6 118.
seal_plot +
geom_line(data = seal_fit,
aes(y = .linpred, group = .draw), color = "grey") +
geom_abline(intercept = fixef(seal_lm_bayes)[1], slope = fixef(seal_lm_bayes)[2], color="red")
So how to we visualize uncertainty given our large SD of our fit? We
can add additional simulated values from posterior_predict
at upper and lower values of our x-axis, and put lines through them.
seal_predict <- posterior_predict(seal_lm_bayes, newdata=data.frame(age.days=c(1000, 8500)))
This produces a 4000 x 2 matrix, each row is one simulation, each column is for one of the new values.
seal_predict <- as.data.frame(seal_predict)
seal_predict$x <- 1000
seal_predict$xend <- 8500
#The full viz
seal_plot +
geom_segment(data = seal_predict,
mapping=aes(x=x, xend=xend, y=V1, yend=V2),
color="lightblue", alpha=0.1)+
geom_abline(intercept = seal_chains[,1], slope = seal_chains[,2], color="darkgrey", alpha=0.6) +
geom_abline(intercept = fixef(seal_lm_bayes)[1], slope = fixef(seal_lm_bayes)[2], color="red")
We can now see how much of the range of the data is specified by both our data and error generating process. There’s still some data that falls outside of the range, although that’s not surprising given our large sample size.
We can also again do this with tidybayes and
add_predicted_draws()
and
ggdist::stat_lineribbon()
to great effect!
seal_pred <- data_grid(seals, age.days = seq_range(age.days, 100)) |>
add_predicted_draws(seal_lm_bayes)
seal_plot +
stat_lineribbon(data = seal_pred, aes(y = .prediction),
alpha = 0.5)+
scale_fill_viridis_d(option = "F")
## Warning: Using the `size` aesthietic with geom_ribbon was deprecated in ggplot2 3.4.0.
## ℹ Please use the `linewidth` aesthetic instead.
## Warning: Unknown or uninitialised column: `linewidth`.
## Warning: Using the `size` aesthietic with geom_line was deprecated in ggplot2 3.4.0.
## ℹ Please use the `linewidth` aesthetic instead.
## Warning: Unknown or uninitialised column: `linewidth`.
## Unknown or uninitialised column: `linewidth`.
What if you wanted to try different priors, and assess the influence of your choice? First, let’s see how our current prior relates to our posterior.
prior_summary(seal_lm_bayes)
## prior class coef group resp dpar nlpar lb ub
## (flat) b
## (flat) b age.days
## student_t(3, 125, 5.9) Intercept
## student_t(3, 0, 5.9) sigma 0
## source
## default
## (vectorized)
## default
## default
Eh, not much, most likely. Let’ see if we had a different prior on
the slope. Maybe a strong prior of a slope of 110. A very strong prior.
Our brms
uses lists to create priors
seal_lm_bayes_prior <- brm(length.cm ~ age.days,
data = seals,
family=gaussian(),
prior = c(prior(normal(110, 5), class = Intercept),
prior(normal(1, 1), class = b),
prior(uniform(3, 10), class = sigma)),
file = "lab/brms_fits/seal_lm_bayes_prior")
## Warning: It appears as if you have specified an upper bounded prior on a parameter that has no natural upper bound.
## If this is really what you want, please specify argument 'ub' of 'set_prior' appropriately.
## Warning occurred for prior
## <lower=0> sigma ~ uniform(3, 10)
## Warning: There were 3867 divergent transitions after warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See
## https://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 4.34, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#tail-ess
Note that this was faster due to the tighter priors.
Fist, the relationship between how lean you are and how quickly you lose fat. Implement this to get a sense ot the general workflow for analysis
fat <- read.csv("./data/17q04BodyFatHeatLoss Sloan and Keatinge 1973 replica.csv")
#initial visualization to determine if lm is appropriate
fat_plot <- ggplot(data=fat, aes(x=leanness, y=lossrate)) +
geom_point()
fat_plot
#fit the model!
fat_mod <- brm(lossrate ~ leanness,
data = fat,
family=gaussian())
# Inspect chains and posteriors
plot(fat_mod)
#Inspect rhat
mcmc_rhat(rhat(fat_mod))
#Inspect Autocorrelation
mcmc_acf(as.data.frame(fat_mod))
#model assumptions
fat_fit <- predict(fat_mod) %>% as_tibble
fat_res <- residuals(fat_mod)%>% as_tibble
qplot(fat_res$Estimate, fat_fit$Estimate)
#fit
pp_check(fat_mod, type="scatter")
#normality
qqnorm(fat_res$Estimate)
qqline(fat_res$Estimate)
pp_check(fat_mod, type="error_hist", bins=8)
##match to posterior
pp_check(fat_mod, type="stat_2d", test=c("mean", "sd"))
pp_check(fat_mod)
#coefficients
summary(fat_mod, digits=5)
#confidence intervals
posterior_interval(fat_mod)
#visualize
fat_chains <- as.data.frame(fat_mod)
fat_plot +
geom_abline(intercept=fat_chains[,1], slope = fat_chains[,2], alpha=0.1, color="lightgrey") +
geom_abline(intercept=fixef(fat_mod)[1], slope = fixef(fat_mod)[2], color="red") +
geom_point()
For your first faded example, let’s look at the relationship between DEET and mosquito bites.
deet <- read.csv("./data/17q24DEETMosquiteBites.csv")
deet_plot <- ggplot(data=___, aes(x=dose, y=bites)) +
geom_point()
deet_plot
#fit the model!
deet_mod <- brm(___ ~ dose,
data = ____,
family=gaussian())
# Inspect chains and posteriors
plot(deet_mod)
#Inspect rhat
mcmc_rhat(rhat(deet_mod))
#Inspect Autocorrelation
mcmc_acf(as.data.frame(deet_mod))
#model assumptions
deet_fit <- predict(____) %>% as_tibble
deet_res <- residuals(____)%>% as_tibble
qplot(____$Estimate, ____$Estimate)
#fit
pp_check(____, type="____")
#normality
qqnorm(____$Estimate)
qqline(____$Estimate)
pp_check(____, type="error_hist", bins=8)
##match to posterior
pp_check(____, type="stat_2d", test=c("mean", "sd"))
pp_check(____)
#coefficients
summary(___, digits=5)
#confidence intervals
posterior_interval(___)
#visualize
deet_chains <- as.data.frame(___)
deet_plot +
geom_abline(intercept=deet_chains[,1], slope = deet_chains[,2], alpha=0.1, color="lightgrey") +
geom_abline(intercept=fixef(___)[1], slope = fixef(___)[2], color="red") +
geom_point()
Do longer lived species also have larger home ranges? Let’s test this!
zoo <- read.csv("../data/17q02ZooMortality Clubb and Mason 2003 replica.csv")
zoo_plot <- ggplot(data=___, aes(x=mortality, y=homerange)) +
___()
___
#fit the model!
zoo_mod <- brm(___ ~ ___,
data = ____,
family=___,
file = "zoo_mod.Rds")
#model assumptions
deet_fit <- predict(____) %>% as_tibble
deet_res <- residuals(____)%>% as_tibble
qplot(____$____, ____$____)
#fit
pp_check(____, type="____")
#normality
qqnorm(____$Estimate)
qqline(____$Estimate)
pp_check(____, type="____", bins=____)
##match to posterior
pp_check(____, type="stat_2d", test=c("____", "____"))
pp_check(____)
#coefficients
summary(___, digits=5)
#confidence intervals
___(___)
#visualize
zoo_chains <- as.data.frame(___)
zoot_plot +
___(___=___[,1], ___ = ___[,2], alpha=0.1, color="lightgrey") +
___(___=fixef(___)[1], ___ = ____(___)[2], color="red") +
geom_point()