This vignette provides an overview of the R package LatentBMA, which implements Bayesian model averaging (BMA) algorithms for univariate link latent Gaussian models (ULLGMs). For detailed information, refer to “Steel M.F.J. & Zens G. (2024). Model Uncertainty in Latent Gaussian Models with Univariate Link Function”. The package supports various g-priors and a beta-binomial prior on the model space. It also includes auxiliary functions for visualizing and tabulating BMA results. Currently, it offers an easy ‘out-of-the-box’ solution for model averaging of Poisson log-normal (PLN) and binomial logistic-normal (BiL) models. The codebase is designed to be easily extendable to other likelihoods, priors, and link functions.
Consider a Poisson log-normal regression model of the form \(y_i \sim \mathcal{P}(e^{z_i})\) where \(z_i = \alpha + x_i'\beta + \epsilon_i\) and \(\epsilon_i \sim \mathcal{N}(0, \sigma^2)\). We simulate data with \(n=100\) observations and \(p=20\) covariates, where the first two covariates are relevant to the outcome, setting \(\alpha=2\) and \(\sigma^2=0.25\).
set.seed(123) # Ensure reproducibility
X <- matrix(rnorm(100*20), 100, 20)
z <- 2 + X %*% c(0.5, -0.5, rep(0, 18)) + rnorm(100, 0, sqrt(0.25))
y <- rpois(100, exp(z))
The following code loads the package and runs a BMA MCMC algorithm with the default prior setup, which is a BRIC prior with \(m=p/2\), see the package manual for more details. Alternative choices of priors are also documented in the package manual.
To estimate a binomial logistic-normal (BiL) model of the form \(y_i \sim \mathcal{Bin}(N_i, 1/(1+e^{-z_i}))\), where \(z_i = \alpha + x_i'\beta + \epsilon_i\) and \(\epsilon_i \sim \mathcal{N}(0, \sigma^2)\), and \(N_i\) is the number of trials for each observation, one can use a similar syntax. The following code simulates data with \(n=100\) observations and \(p=20\) covariates, assuming \(N_i = 50\) trials for each observation, with the first two covariates relevant to the outcome, setting \(\alpha=1\) and \(\sigma^2=0.25\):
set.seed(123) # Ensure reproducibility
X <- matrix(rnorm(100*20), 100, 20)
Ni <- rep(50, 100)
z <- 1 + X %*% c(0.5, -0.5, rep(0, 18)) + rnorm(100, 0, sqrt(0.25))
y <- rbinom(100, Ni, 1/(1+exp(-z)))
The corresponding ULLGM-BiL model in LatentBMA can
be called using a similar command as before, changing only the
model
parameter and specifying the number of trials \(N_i\):
To summarize the posterior output in a table, one can use
LatentBMA::summarizeBMA()
. Note that all functions in
LatentBMA that generate tables support LaTeX and HTML
output. summarizeBMA()
outputs a knitr::kable
object which can be fully customized. The algorithm correctly identifies
the first two predictors as the most relevant, as can be seen from the
column with posterior inclusion probabilities.
Variable | Posterior Mean | Posterior SD | PIP |
---|---|---|---|
Intercept | 1.066 | 0.052 | - |
x1 | 0.410 | 0.057 | 1.000 |
x2 | -0.523 | 0.052 | 1.000 |
x3 | 0.002 | 0.015 | 0.028 |
x4 | -0.002 | 0.014 | 0.033 |
x5 | 0.001 | 0.010 | 0.019 |
x6 | -0.001 | 0.010 | 0.017 |
x7 | 0.000 | 0.008 | 0.017 |
x8 | -0.003 | 0.020 | 0.039 |
x9 | -0.001 | 0.008 | 0.021 |
x10 | 0.000 | 0.007 | 0.017 |
x11 | 0.003 | 0.019 | 0.037 |
x12 | 0.000 | 0.007 | 0.019 |
x13 | -0.003 | 0.018 | 0.048 |
x14 | 0.000 | 0.007 | 0.010 |
x15 | -0.001 | 0.011 | 0.024 |
x16 | -0.002 | 0.014 | 0.025 |
x17 | 0.000 | 0.007 | 0.014 |
x18 | 0.000 | 0.005 | 0.009 |
x19 | 0.006 | 0.024 | 0.068 |
x20 | -0.001 | 0.010 | 0.023 |
sigma^2 | 0.138 | 0.038 | - |
g | 400.000 | 0.000 | - |
Model Size | 2.469 | 0.702 | - |
To extract the top models and the corresponding posterior model
probabilities (PMPs) from the regression output,
LatentBMA::topModels()
can be used. In this simple setting,
the algorithm strongly concentrates on the true model with two included
predictors.
model | Model #1 | Model #2 | Model #3 | Model #4 | Model #5 |
x1 | x | x | x | x | x |
x2 | x | x | x | x | x |
x4 | x | ||||
x11 | x | ||||
x13 | x | ||||
x19 | x | ||||
PMP | 0.635 | 0.046 | 0.027 | 0.020 | 0.020 |
Several commands are available to visually summarize the results. To
view the posterior distribution of model size, one can use
LatentBMA::plotModelSize()
. All plotting functions in
LatentBMA output a ggplot2::ggplot
object,
which can be fully customized.
The estimated posterior inclusion probabilities and posterior means
using LatentBMA::plotBeta()
and
LatentBMA::plotPIP()
can be visualized as follows:
In order to assess the convergence of the algorithm, it can be useful
to examine posterior traceplots. The function
LatentBMA::tracePlot()
provides functionality to generate
traceplots for the parameters and the size of the visited models. For
example, to look at the traceplots of \(\alpha\) and \(\sigma^2\), one can use the following
code.
For advanced customization of the algorithm, experienced users can
utilize the internal function ULLGM_BMA_MCMC
from the
package. Although this function is not exported, it can be accessed
using the triple colon operator (:::
) or through the code
repository. This function allows for deeper customization and
fine-tuning of the algorithm’s parameters, offering more control over
its execution.
For example, one could implement user-specified functions for likelihoods, gradients, and g-priors. Templates for these modifications can be found in the code repository. Please note that this functionality has not been thoroughly tested, so special care is advised when working with these modifications.