The goal of CausalModels is to provide a survey of fundamental causal inference models in one single location. While there are many packages for these types of models, CausalModels brings them all to one place with a simple user experience. The package uses a format that is familiar to users using well known statistical and machine learning models. While causal inference models require careful consideration of variables to correctly infer a causal effect, CausalModels uses simple code while requiring the user to make these considerations. This enables efficient and thoughtful research using causal inference. As of May 30, 2022, the package has been published on CRAN.
You can install the development version of CausalModels from GitHub with:
# install.packages("devtools")
::install_github("ander428/CausalModels") devtools
Since the package has been published on CRAN, the production version can be installed with:
install.packages("CausalModels")
This is a basic example which shows you how to solve a common problem:
library(CausalModels)
library(causaldata)
#> Warning: package 'causaldata' was built under R version 4.1.3
data(nhefs)
<- nhefs[which(!is.na(nhefs$wt82)),]
nhefs.nmv $qsmk <- as.factor(nhefs.nmv$qsmk)
nhefs.nmv
<- c("sex", "race", "age", "education", "smokeintensity",
confounders "smokeyrs", "exercise", "active", "wt71")
# initialize package
?init_params#> starting httpd help server ... done
init_params(wt82_71, qsmk,
covariates = confounders,
data = nhefs.nmv, simple = F)
#> Successfully initialized!
#>
#> Summary:
#>
#> Outcome - wt82_71
#> Treatment - qsmk
#> Covariates - [ sex, race, age, education, smokeintensity, smokeyrs, exercise, active, wt71 ]
#>
#> Size - 1566 x 67
#>
#> Default formula for outcome models:
#> wt82_71 ~ qsmk + sex + race + education + exercise + active + age + (qsmk * age) + I(age * age) + smokeintensity + (qsmk * smokeintensity) + I(smokeintensity * smokeintensity) + smokeyrs + (qsmk * smokeyrs) + I(smokeyrs * smokeyrs) + wt71 + (qsmk * wt71) + I(wt71 * wt71)
#>
#> Default formula for propensity models:
#> qsmk ~ sex + race + education + exercise + active + age + I(age * age) + smokeintensity + I(smokeintensity * smokeintensity) + smokeyrs + I(smokeyrs * smokeyrs) + wt71 + I(wt71 * wt71)
# mode the causal effect of qsmk on wt82_71
<- standardization(nhefs.nmv)
model print(model)
#>
#> Call: glm(formula = wt82_71 ~ qsmk + sex + race + education + exercise +
#> active + age + (qsmk * age) + I(age * age) + smokeintensity +
#> (qsmk * smokeintensity) + I(smokeintensity * smokeintensity) +
#> smokeyrs + (qsmk * smokeyrs) + I(smokeyrs * smokeyrs) + wt71 +
#> (qsmk * wt71) + I(wt71 * wt71), family = family, data = data)
#>
#> Coefficients:
#> (Intercept) qsmk1
#> -0.9699812 0.5509460
#> sex1 race1
#> -1.4371844 0.5868376
#> education2 education3
#> 0.8174769 0.5824119
#> education4 education5
#> 1.5240890 -0.1792422
#> exercise1 exercise2
#> 0.3063727 0.3550789
#> active1 active2
#> -0.9460683 -0.2707615
#> age I(age * age)
#> 0.3495673 -0.0060652
#> smokeintensity I(smokeintensity * smokeintensity)
#> 0.0482197 -0.0009597
#> smokeyrs I(smokeyrs * smokeyrs)
#> 0.1418662 -0.0018076
#> wt71 I(wt71 * wt71)
#> 0.0393011 -0.0009787
#> qsmk1:age qsmk1:smokeintensity
#> 0.0123138 0.0448028
#> qsmk1:smokeyrs qsmk1:wt71
#> -0.0235529 0.0291350
#>
#> Degrees of Freedom: 1565 Total (i.e. Null); 1542 Residual
#> (3132 observations deleted due to missingness)
#> Null Deviance: 97180
#> Residual Deviance: 82690 AIC: 10710
#>
#> Average treatment effect of qsmk:
#> Estimate - 3.4927
#> SE - 0.4723109
#> 95% CI - ( 2.566988 , 4.418413 )
summary(model)
#>
#> Call:
#> glm(formula = wt82_71 ~ qsmk + sex + race + education + exercise +
#> active + age + (qsmk * age) + I(age * age) + smokeintensity +
#> (qsmk * smokeintensity) + I(smokeintensity * smokeintensity) +
#> smokeyrs + (qsmk * smokeyrs) + I(smokeyrs * smokeyrs) + wt71 +
#> (qsmk * wt71) + I(wt71 * wt71), family = family, data = data)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -41.913 -4.168 -0.314 3.869 44.573
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.9699812 4.3673208 -0.222 0.824266
#> qsmk1 0.5509460 2.8229123 0.195 0.845286
#> sex1 -1.4371844 0.4693195 -3.062 0.002235 **
#> race1 0.5868376 0.5828368 1.007 0.314158
#> education2 0.8174769 0.6085125 1.343 0.179339
#> education3 0.5824119 0.5575569 1.045 0.296382
#> education4 1.5240890 0.8351981 1.825 0.068221 .
#> education5 -0.1792422 0.7462118 -0.240 0.810205
#> exercise1 0.3063727 0.5360193 0.572 0.567697
#> exercise2 0.3550789 0.5592886 0.635 0.525603
#> active1 -0.9460683 0.4105673 -2.304 0.021338 *
#> active2 -0.2707615 0.6851128 -0.395 0.692745
#> age 0.3495673 0.1648157 2.121 0.034084 *
#> I(age * age) -0.0060652 0.0017347 -3.496 0.000485 ***
#> smokeintensity 0.0482197 0.0518339 0.930 0.352375
#> I(smokeintensity * smokeintensity) -0.0009597 0.0009409 -1.020 0.307878
#> smokeyrs 0.1418662 0.0943836 1.503 0.133023
#> I(smokeyrs * smokeyrs) -0.0018076 0.0015458 -1.169 0.242437
#> wt71 0.0393011 0.0836422 0.470 0.638514
#> I(wt71 * wt71) -0.0009787 0.0005255 -1.862 0.062751 .
#> qsmk1:age 0.0123138 0.0670159 0.184 0.854238
#> qsmk1:smokeintensity 0.0448028 0.0360169 1.244 0.213712
#> qsmk1:smokeyrs -0.0235529 0.0654333 -0.360 0.718931
#> qsmk1:wt71 0.0291350 0.0276439 1.054 0.292075
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for gaussian family taken to be 53.62833)
#>
#> Null deviance: 97176 on 1565 degrees of freedom
#> Residual deviance: 82695 on 1542 degrees of freedom
#> (3132 observations deleted due to missingness)
#> AIC: 10706
#>
#> Number of Fisher Scoring iterations: 2
#>
#> Average treatment effect of qsmk:
#> Beta SE 2.5 % 97.5 %
#> 3.4927 0.4723109 2.566988 4.418413
#>