--- title: "Estimate growth using MCMC" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Estimate growth using MCMC} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, warning=FALSE, message=FALSE} library(BayesGrowth) ``` `BayesGrowth` combines length-at-age modelling with MCMC implemented using [Stan](https://mc-stan.org/) and the [rstan](http://mc-stan.org/rstan/) package. Growth modelling using models such as the von Bertalanffy growth function involves three parameters: $L_{\infty}$, $k$ and either $L_{0}$ or $t_{0}$. Two of these parameters: $L_{0}$ and $L_{\infty}$ have direct biological meaning as the length-at-birth and maximum length, respectively. This package provides the tools to run an MCMC model with these two parameters treated as size-at-birth and maximum length using a Stan model. This MCMC model is pre-specified and built into wrapper functions. The user can therefore run an MCMC growth model using knowledge of species length-at-birth and maximum size as priors. ## Installation This package provides a series of wrapper functions to the `rstan` package which will run MCMC models in Stan using No U-turn sampling (NUTS). `rstan` models are automatically run by the package. However, you'll probably want to install the `rstan` package anyway in order to work with the resulting model outputs. ```{r, eval = FALSE} install.packages("rstan") ``` You can install the released version of `BayesGrowth` from [Github](https://github.com/jonathansmart/BayesGrowth) with: ``` {r, eval = FALSE} devtools::install_github("jonathansmart/BayesGrowth") ``` ## Usage The main `BayesGrowth` function is `Estimate_MCMC_Growth()` which is the wrapper function around an `rstan` model. It requires a data input that includes columns that can be identified "Age" and "Length", the model needs to be specified (several options are available) and the priors must be specified. Priors include the max size with an error, length-at-birth with an error and upper limits for $k$ and $\sigma$. These latter two parameters have no informative priors and only require sensible upper bounds. Many fish species (including this example) have a size at birth of zero. Therefore, this can value can be used as a prior along with a very small error to indicate high certainty of this prior. The `L0.se` argument cannot be zero, but the model is specified to truncate $L_{0}$ at zero and keep growth positive. Note that `rstan` estimates parameter precision ($\tau$) rather than parameter standard error ($\theta{\sigma}$). Therefore, standard error is converted to precision as $\tau = (1/\theta{\sigma})^2$. This is handled by the `BayesGrowth` package so the user does not need to do this conversion themselves. Here is an example of `Estimate_MCMC_Growth()` in action: ```{r model,message=FALSE, warning=FALSE, eval=FALSE} library(BayesGrowth) # built in dataset for running examples data("example_data") ## Biological info - lengths in mm max_size <- 440 max_size_se <- 5 birth_size <- 0 birth_size_se <- 0.001 # cannot be zero # Use the function to estimate the rstan model MCMC_example_results <- Estimate_MCMC_Growth(example_data, Model = "VB" , iter = 10000, n.chains = 4, # minimum of 3 chains recommended BurnIn = 1000, # default is 10% of iterations thin = 10, # a thinning rate of 10 is applied to overcome autocorrelation Linf = max_size, Linf.se = max_size_se, L0 = birth_size, L0.se = birth_size_se, sigma.max = 100, verbose = TRUE, k.max = 1) ``` ```{r, echo = FALSE} data("MCMC_example_results") ``` # Outputs and compatibility with `rstan` and `bayesplot` packages `Estimate_MCMC_Growth` returns the `rstan` outputs which is an object of class `stanfit`. Therefore, all accompanying R packages to `rstan` can be used to interrogate the results. A useful package for analysing the posteriors is `bayesplot` which has several functions to explore chain convergence, posterior densities, autocorrelation, etc. ## Summaries As an `rstan` model object is returned from the function, all of the diagnostics from the `rstan` library can be used. For example, a summary and structure of results can be produced. ```{r summary,message=FALSE} library(rstan) summary(MCMC_example_results,pars = c("Linf", "k", "L0", "sigma"))$summary str(MCMC_example_results,max.level = 2) ``` ## Chain convergence The chains of the MCMC model can be examined using a trace plot. The right hand panels show the chains for each parameter as they progress through the MCMC iterations. The left hand panels show the posterior distributions of each parameter. As this model is specified to have a normal distribution, all parameter posteriors should be normally distributed once convergence has been reached. $L_{0}$ is an exception as this parameter is truncated to remain above zero. Therefore, as $L_{0}$ approaches zero it will be right skewed. This plot shows good model convergence. All of the chains overlap and each of the parameters has the correct posterior distribution with a single mode. ```{r Diagnostics, warning=FALSE, message=FALSE, fig.height = 8, fig.width = 8} library(bayesplot) mcmc_combo(MCMC_example_results,pars = c("Linf", "k", "L0", "sigma")) ``` Convergence can also be checked using a Gelman Rubin test. If all point estimates of the Gelman-Rubin diagnostic ($\hat{R}$) for each parameter are less than 1.2 then convergence has been reached. Ideally these should be close to one. If not, then the number of iterations should be increased. This result is returned as part of the `Estimate_MCMC_Growth` summary and can also be accessed using the `rhat` function. These can also be plotted using `mcmc_rhat`. ```{r GelmabRubin, warning=FALSE, message=FALSE, fig.height = 6, fig.width = 8} rhats <- rhat(MCMC_example_results,pars = c("Linf", "k", "L0", "sigma")) mcmc_rhat(rhats) ``` ## Autocorrelation Autocorrelation can be examined using the `mcmc_acf` function in the `bayesplot` package. This plot shows the autocorrelation lag that occurs in the MCMC model for each parameter. If each parameter has an autocorrelation value of zero at the end of the lag series, then no autocorrelation has occurred. ```{r Autocorr, warning=FALSE, message=FALSE, fig.height = 8, fig.width = 8} mcmc_acf(MCMC_example_results,pars = c("Linf", "k", "L0", "sigma")) ``` # Other useful `BayesGrowth` functions Additional `BayesGrowth` functions are available that help the user manipulate the returned `Estimate_MCMC_Growth` object. For example `Get_MCMC_parameters` will return a tibble of parameter results and statistics as a more manageable object than `summary()`. ```{r pars, warning=FALSE, message=FALSE} Get_MCMC_parameters(MCMC_example_results) ``` The `Calculate_MCMC_growth_curve` function will provide confidence intervals around the growth curve based on MCMC parameter percentiles. This is essentially a wrapper around the `tidybayes::mean_qi()` function which means it can be passed straight into a ggplot with the `tidybayes::geom_line_ribbon` function. ```{r plot, warning=FALSE, message=FALSE, fig.height = 6, fig.width = 8, warning=FALSE} # Return a growth curve with 50th and 95th percentiles growth_curve <- Calculate_MCMC_growth_curve(obj = MCMC_example_results, Model = "VB", max.age = max(example_data$Age), probs = c(.5,.95)) library(tidybayes) library(ggplot2) ggplot(growth_curve, aes(Age, LAA))+ geom_point(data = example_data, aes(Age, Length), alpha = .3)+ geom_lineribbon(aes( ymin = .lower, ymax = .upper, fill = factor(.width)), size = .8) + labs(y = "Total Length (mm)", x = "Age (yrs)")+ scale_fill_brewer(palette="BuPu", direction=1,name =NULL)+ scale_y_continuous(expand = c(0,0))+ scale_x_continuous(expand = c(0,0), breaks = seq(0,13,1))+ theme_bw()+ theme(text = element_text(size = 14)) ``` This represents a much improved fit over a standard non-linear estimated model, even if the length-at-birth were fixed at zero. Here the fit is compared using an nls model fit using the `AquaticLifeHistory` package (https://jonathansmart.github.io/AquaticLifeHistory/). ```{r compare_plot, warning=FALSE, message=FALSE, fig.height = 6, fig.width = 8, echo=FALSE} library(AquaticLifeHistory) fixed_mod <- Estimate_Growth(example_data,n.bootstraps = 1, Birth.Len = 0, models = "VB", plots = FALSE) free_mod <- Estimate_Growth(example_data,n.bootstraps = 1, models = "VB", plots = FALSE) growth_curve <- Calculate_MCMC_growth_curve(MCMC_example_results, Model = "VB", max.age = max(example_data$Age), probs = c(.95)) ggplot(growth_curve, aes(Age, LAA))+ geom_line(data = free_mod$Estimates,inherit.aes = FALSE, aes(Age, AVG, col = "nls model - free L0"), size = 1.5)+ geom_line(data = fixed_mod$Estimates,inherit.aes = FALSE, aes(Age, AVG, col = "nls model - fixed L0"), size = 1.5)+ geom_point(data = example_data, aes(Age, Length), alpha = .3)+ geom_lineribbon( aes(ymin = .lower, ymax =.upper,fill = "MCMC model", col = "MCMC model"),size = 1.5, alpha = .5) + geom_line(aes(Age, LAA, col = "MCMC model"), size = 1.5)+ labs(y = "Total Length (mm)", x = "Age (yrs)")+ scale_color_viridis_d(name = "Model", begin = .3, end = .8)+ scale_fill_viridis_d(begin = .5, end = .6)+ guides(fill = FALSE)+ scale_y_continuous(expand = c(0,0))+ scale_x_continuous(expand = c(0,0), breaks = seq(0,13,1))+ theme_bw()+ theme(text = element_text(size = 14)) ``` # Model comparison and selection The Leave One Out Information Criterion (LOOIC) and/or Widely Applicable Information Critereon (WAIC) should be calculated for different candidate models to determine the best fitting growth model. The `Compare_Growth_Models` function can calculate both for all three models in a single function call. The user should specify the same arguments as the original `Estimate_MCMC_Growth` function call to ensure consistency. If the arguments of `Estimate_MCMC_Growth` are updated (for example priors are changed, more iterations are run, thinning is used, etc.) then `Compare_Growth_Models` should be re-run accordingly. This function has a longer runtime than the initial model run as it will fit all three candidate Bayesian growth models. This is necessary as all models will be run with identical priors and model options which is important for determining the best model. The model with the highest LOOIC or WAIC weights provides the best fit to the data and should be used as the final growth model. LOOIC is considered more robust than WAIC but comparing both can be useful. The `stats` argument in the function call lets the user decide if they would like to return results for LooIC, WAIC or both. This model comparison analysis should be the first step in the model fitting process so that subsequent diagnostics and results are only produced for the best fitting model. From these results we can see that the best fitting model was the Von Bertalanffy growth model. Note that this function does not check model convergence (a large number of warning messages suggests that one or more models has struggled to converge). Therefore, full diagnostics need to be run on the best fitting model. If the model configurations are updated to deal with issues such as autocorrelation then model comparisons and selection should be undertaken again. ```{r DIC, warning=FALSE, message=FALSE, eval=FALSE} Looic_example_results <- Compare_Growth_Models(data = example_data, stats = "LooIC", iter = 10000, n_cores = 3, n.chains = 4, BurnIn = 1000, thin = 1, Linf = max_size, Linf.se = max_size_se, L0 = birth_size, L0.se = birth_size_se, verbose = TRUE, sigma.max = 100, k.max = 1) ``` ```{r, warning=FALSE, message=FALSE,echo=FALSE} data("Looic_example_results") Looic_example_results ```