In this example, we will illustrate how to deal with binary treatment when using CausalEGM.
library(RcausalEGM)
if (!(reticulate::py_module_available('CausalEGM'))){
cat("## Please install the CausalEGM package using the function: install_causalegm()")
::knit_exit()
knitr }
Let’s first generate a simulation dataset with binary treatment.
<- 500
n <- 20
p <- matrix(rnorm(n * p), n, p)
v <- rbinom(n, 1, 0.3 + 0.2 * (v[, 1] > 0))
x <- pmax(v[, 1], 0) * x + v[, 2] + pmin(v[, 3], 0) + rnorm(n) y
Let’s take a look at the simulation data.
<- par(mfrow=c(1,3))
oldpar <- c(sum(x==1), sum(x==0))
slices <- c(paste("T group:",round(sum(x==1)*100/length(x), 2), "%", sep=""), paste("C group:",round(sum(x==0)*100/length(x), 2), "%", sep=""))
lbls pie(slices, labels = lbls, main="Treatment Variables")
hist(y, breaks=12, col="red",xlab="y values")
boxplot(v[,1:5],main="First five covariates", xlab="Covariate index", ylab="v values")
par(oldpar)
Start training a CausalEGM model. Users can refer to the core API “causalegm” by help(causalegm) for detailed usage.
Note that the parameters for x, y, v are required. Besides, users can also specify the z_dims as a integer list with four elements.
#help(causalegm)
<- causalegm(x=x,y=y,v=v,n_iter=2000) model
After the above model training, users can find the .txt format of individual treatment effect (ITE) estimates in the “output_dir” directory (parameter in “causalegm”).
Alternatively, several keys estimates, including average treatment effect and individual treatment effect can be directly obtained from the trained model.
<- mean(model$causal_pre)
ATE paste("The average treatment effect (ATE):", round(ATE, 3))
#> [1] "The average treatment effect (ATE): 0.483"
<- model$causal_pre
ITE boxplot(ITE, main="ITE distribution", ylab="Values")
Besides ATE and ITE estimation, we also provide APIs for predicting the counterfactual outcome directly.
<- 1-x
x_cf <- get_est(model, v, x_cf)
y_cf boxplot(y_cf, main="Counterfactual outcome", ylab="Values")
We demonstrate how to estimate CATE by an external data after training the model.
<- 100
n_test <- matrix(rnorm(n_test * p), n_test, p)
v_test <- get_est(model, v_test)
CATE boxplot(CATE, main="CATE", ylab="Values")