--- title: "Getting Started with AIPW" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with AIPW} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, fig.width = 6, comment = "#>"#, # cache=TRUE ) ``` Contents: * [Installation](#Installation) * [One-line version](#one_line) * [Longer version](#details) + [Create an AIPW object](#constructor) + [Fit the object](#fit) + [Calculate average treatment effects](#ate) + [Calculate average treatment effects among the treated](#att) * [Parallelization](#par) * [Using tmle/tmle3 as input](#tmle_input) + [tmle](#tmle) ## Installation 1. Install AIPW from [GitHub](https://github.com/yqzhong7/AIPW) ```{r, eval = FALSE} install.packages("remotes") remotes::install_github("yqzhong7/AIPW") ``` __* CRAN version only supports SuperLearner and tmle. Please install the Github version (master branch) to use sl3 and tmle3.__ 2. Install [SuperLearner](https://CRAN.R-project.org/package=SuperLearner) or [sl3](https://tlverse.org/sl3/articles/intro_sl3.html) ```{r, eval = FALSE} #SuperLearner install.packages("SuperLearner") #sl3 remotes::install_github("tlverse/sl3") install.packages("Rsolnp") ``` ## Input data for analyses ```{r example data} library(AIPW) library(SuperLearner) library(ggplot2) set.seed(123) data("eager_sim_obs") cov = c("eligibility","loss_num","age", "time_try_pregnant","BMI","meanAP") ``` ## Using AIPW to estimate the average treatment effect ### One line version (Method chaining from R6class) Using native AIPW class allows users to define different covariate sets for the exposure and the outcome models, respectively. ```{r one_line} AIPW_SL <- AIPW$new(Y= eager_sim_obs$sim_Y, A= eager_sim_obs$sim_A, W= subset(eager_sim_obs,select=cov), Q.SL.library = c("SL.mean","SL.glm"), g.SL.library = c("SL.mean","SL.glm"), k_split = 10, verbose=FALSE)$ fit()$ #Default truncation is set to 0.025; using 0.25 here is for illustrative purposes and not recommended summary(g.bound = c(0.25,0.75))$ plot.p_score()$ plot.ip_weights() ``` ### A more detailed tutorial #### 1. Create an AIPW object * ##### Use [SuperLearner](https://CRAN.R-project.org/package=SuperLearner) libraries ```{r SuperLearner, message=FALSE,eval=F} library(AIPW) library(SuperLearner) #SuperLearner libraries for outcome (Q) and exposure models (g) sl.lib <- c("SL.mean","SL.glm") #construct an aipw object for later estimations AIPW_SL <- AIPW$new(Y= eager_sim_obs$sim_Y, A= eager_sim_obs$sim_A, W= subset(eager_sim_obs,select=cov), Q.SL.library = sl.lib, g.SL.library = sl.lib, k_split = 10, verbose=FALSE) ``` If outcome is missing, analysis assumes missing at random (MAR) by estimating propensity scores with I(A=a, observed=1). Missing exposure is not supported. #### 2. Fit the AIPW object This step will fit the data stored in the AIPW object to obtain estimates for later average treatment effect calculations. ```{r} #fit the AIPW_SL object AIPW_SL$fit() # or you can use stratified_fit # AIPW_SL$stratified_fit() ``` #### 3. Calculate average treatment effects * ##### Estimate the ATE with propensity scores truncation ```{r} #estimate the average causal effects from the fitted AIPW_SL object AIPW_SL$summary(g.bound = 0.25) #propensity score truncation ``` * ##### Check the balance of propensity scores and inverse probability weights by exposure status after truncation ```{r ps_trunc} library(ggplot2) AIPW_SL$plot.p_score() AIPW_SL$plot.ip_weights() ``` #### 4. Calculate average treatment effects among the treated/controls * ##### `stratified_fit()` fits the outcome model by exposure status while `fit()` does not. Hence, `stratified_fit()` must be used to compute ATT/ATC [(Kennedy et al. 2015)](http://www.ehkennedy.com/uploads/5/8/4/5/58450265/2015_kennedy_et_al_-_semiparametric_causal_inference_in_matched_cohort_studies.pdf) ```{r} suppressWarnings({ AIPW_SL$stratified_fit()$summary() }) ``` ## Parallelization with future.apply In default setting, the `AIPW$fit()` method will be run sequentially. The current version of AIPW package supports parallel processing implemented by [future.apply](https://github.com/HenrikBengtsson/future.apply) package under the [future](https://github.com/HenrikBengtsson/future) framework. Before creating a `AIPW` object, simply use `future::plan()` to enable parallelization and `set.seed()` to take care of the random number generation (RNG) problem: ```{r parallel, eval=FALSE} # install.packages("future.apply") library(future.apply) plan(multiprocess, workers=2, gc=T) set.seed(888) AIPW_SL <- AIPW$new(Y= eager_sim_obs$sim_Y, A= eager_sim_obs$sim_A, W= subset(eager_sim_obs,select=cov), Q.SL.library = sl3.lib, g.SL.library = sl3.lib, k_split = 10, verbose=FALSE)$fit()$summary() ``` ## Use `tmle` fitted object as input AIPW shares similar intermediate estimates (nuisance functions) with the Targeted Maximum Likelihood / Minimum Loss-Based Estimation (TMLE). Therefore, `AIPW_tmle` class is designed for using `tmle` fitted object as input. Details about these two packages can be found [here](https://www.jstatsoft.org/article/view/v051i13) and [here](https://tlverse.org/tlverse-handbook/). This feature is designed for debugging and easy comparisons across these three packages because cross-fitting procedures are different in `tmle`. In addition, this feature does not support ATT outputs. #### `tmle` As shown in the message, [tmle](https://CRAN.R-project.org/package=tmle) only support cross-fitting in the outcome model. ```{r tmle, eval=F} # install.packages("tmle") library(tmle) library(SuperLearner) tmle_fit <- tmle(Y=eager_sim_obs$sim_Y, A=eager_sim_obs$sim_A, W=eager_sim_obs[,-1:-2], Q.SL.library=c("SL.mean","SL.glm"), g.SL.library=c("SL.mean","SL.glm"), family="binomial", cvQinit = TRUE) cat("\nEstimates from TMLE\n") unlist(tmle_fit$estimates$ATE) unlist(tmle_fit$estimates$RR) unlist(tmle_fit$estimates$OR) cat("\nEstimates from AIPW\n") a_tmle <- AIPW_tmle$ new(A=eager_sim_obs$sim_A,Y=eager_sim_obs$sim_Y,tmle_fit = tmle_fit,verbose = TRUE)$ summary(g.bound=0.025) ```