--- title: "Getting Started" output: rmarkdown::html_vignette: md_extensions: [ "-autolink_bare_uris" ] vignette: > %\VignetteIndexEntry{Getting Started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ``` r library(CausalQueries) library(dplyr) library(knitr) ``` # Make a model **Generating**: To make a model you need to provide a DAG statement to `make_model`. For instance * `"X->Y"` * `"X -> M -> Y <- X"` or * `"Z -> X -> Y <-> X"`. ``` r # examples of models xy_model <- make_model("X -> Y") iv_model <- make_model("Z -> X -> Y <-> X") ``` **Graphing**: Once you have made a model you can inspect the DAG: ``` r plot(xy_model) ``` ![plot of chunk unnamed-chunk-3](unnamed-chunk-3-1.png) **Simple summaries:** You can access a simple summary using `summary()` ``` r summary(xy_model) #> #> Causal statement: #> X -> Y #> #> Nodal types: #> $X #> 0 1 #> #> node position display interpretation #> 1 X NA X0 X = 0 #> 2 X NA X1 X = 1 #> #> $Y #> 00 10 01 11 #> #> node position display interpretation #> 1 Y 1 Y[*]* Y | X = 0 #> 2 Y 2 Y*[*] Y | X = 1 #> #> Number of types by node: #> X Y #> 2 4 #> #> Number of causal types: 8 #> #> Note: Model does not contain: posterior_distribution, stan_objects; #> to include these objects use update_model() #> #> Note: To pose causal queries of this model use query_model() ``` or you can examine model details using `inspect()`. **Inspecting**: The model has a set of parameters and a default distribution over these. ``` r xy_model |> inspect("parameters_df") #> #> parameters_df #> Mapping of model parameters to nodal types: #> #> param_names: name of parameter #> node: name of endogeneous node associated #> with the parameter #> gen: partial causal ordering of the #> parameter's node #> param_set: parameter groupings forming a simplex #> given: if model has confounding gives #> conditioning nodal type #> param_value: parameter values #> priors: hyperparameters of the prior #> Dirichlet distribution #> #> param_names node gen param_set nodal_type given param_value priors #> 1 X.0 X 1 X 0 0.50 1 #> 2 X.1 X 1 X 1 0.50 1 #> 3 Y.00 Y 2 Y 00 0.25 1 #> 4 Y.10 Y 2 Y 10 0.25 1 #> 5 Y.01 Y 2 Y 01 0.25 1 #> 6 Y.11 Y 2 Y 11 0.25 1 ``` **Tailoring**: These features can be edited using `set_restrictions`, `set_priors` and `set_parameters`. Here is an example of setting a monotonicity restriction (see `?set_restrictions` for more): ``` r iv_model <- iv_model |> set_restrictions(decreasing('Z', 'X')) ``` Here is an example of setting priors (see `?set_priors` for more): ``` r iv_model <- iv_model |> set_priors(distribution = "jeffreys") #> Altering all parameters. ``` **Simulation**: Data can be drawn from a model like this: ``` r data <- make_data(iv_model, n = 4) data |> kable() ``` | Z| X| Y| |--:|--:|--:| | 0| 0| 1| | 1| 0| 1| | 1| 1| 0| | 1| 1| 0| # Update the model **Updating**: Update using `update_model`. You can pass all `rstan` arguments to `update_model`. ``` r df <- data.frame(X = rbinom(100, 1, .5)) |> mutate(Y = rbinom(100, 1, .25 + X*.5)) xy_model <- xy_model |> update_model(df, refresh = 0) ``` **Inspecting**: You can access the posterior distribution on model parameters directly thus: ``` r xy_model |> grab("posterior_distribution") |> head() |> kable() ``` | X.0| X.1| Y.00| Y.10| Y.01| Y.11| |---------:|---------:|---------:|---------:|---------:|---------:| | 0.5515163| 0.4484837| 0.2847141| 0.1177726| 0.5741150| 0.0233983| | 0.5265501| 0.4734499| 0.1473281| 0.2074736| 0.6224344| 0.0227639| | 0.4597457| 0.5402543| 0.2233002| 0.1362462| 0.3923527| 0.2481008| | 0.5754801| 0.4245199| 0.1631742| 0.1542270| 0.6013710| 0.0812279| | 0.6779082| 0.3220918| 0.1429733| 0.1443515| 0.6242878| 0.0883874| | 0.5075180| 0.4924820| 0.0409841| 0.2349819| 0.6799476| 0.0440865| where each row is a draw of parameters. # Query the model ## Arbitrary queries **Querying**: You ask arbitrary causal queries of the model. Examples of *unconditional* queries: ``` r xy_model |> query_model("Y[X=1] > Y[X=0]", using = c("priors", "posteriors")) #> #> Causal queries generated by query_model (all at population level) #> #> |label |using | mean| sd| cred.low| cred.high| #> |:---------------|:----------|-----:|-----:|--------:|---------:| #> |Y[X=1] > Y[X=0] |priors | 0.249| 0.195| 0.008| 0.722| #> |Y[X=1] > Y[X=0] |posteriors | 0.552| 0.104| 0.327| 0.725| ``` This query asks the probability that $Y(1)> Y(0)$. Examples of *conditional* queries: ``` r xy_model |> query_model("Y[X=1] > Y[X=0] :|: X == 1 & Y == 1", using = c("priors", "posteriors")) #> #> Causal queries generated by query_model (all at population level) #> #> |label |using | mean| sd| cred.low| cred.high| #> |:-------------------------------------|:----------|-----:|-----:|--------:|---------:| #> |Y[X=1] > Y[X=0] given X == 1 & Y == 1 |priors | 0.498| 0.288| 0.027| 0.971| #> |Y[X=1] > Y[X=0] given X == 1 & Y == 1 |posteriors | 0.779| 0.126| 0.514| 0.983| ``` This query asks the probability that $Y(1) > Y(0)$ *given* $X=1$ and $Y=1$; it is a type of "causes of effects" query. Note that ":|:" is used to separate the main query element from the conditional statement to avoid ambiguity, since "|" is reserved for the "or" operator. Queries can even be conditional on counterfactual quantities. Here the probability of a positive effect given *some* effect: ``` r xy_model |> query_model("Y[X=1] > Y[X=0] :|: Y[X=1] != Y[X=0]", using = c("priors", "posteriors")) #> #> Causal queries generated by query_model (all at population level) #> #> |label |using | mean| sd| cred.low| cred.high| #> |:--------------------------------------|:----------|-----:|-----:|--------:|---------:| #> |Y[X=1] > Y[X=0] given Y[X=1] != Y[X=0] |priors | 0.494| 0.291| 0.020| 0.977| #> |Y[X=1] > Y[X=0] given Y[X=1] != Y[X=0] |posteriors | 0.829| 0.085| 0.673| 0.991| ``` Note that we use ":" to separate the base query from the condition rather than "|" to avoid confusion with logical operators. ## Output Query output is ready for printing as tables, but can also be plotted, which is especially useful with batch requests: ``` r batch_queries <- xy_model |> query_model(queries = list(ATE = "Y[X=1] - Y[X=0]", `Positive effect given any effect` = "Y[X=1] > Y[X=0] :|: Y[X=1] != Y[X=0]"), using = c("priors", "posteriors"), expand_grid = TRUE) batch_queries |> kable(digits = 2, caption = "tabular output") ``` Table: tabular output |label |query |given |using |case_level | mean| sd| cred.low| cred.high| |:--------------------------------|:---------------|:----------------|:----------|:----------|-----:|----:|--------:|---------:| |ATE |Y[X=1] - Y[X=0] |- |priors |FALSE | -0.01| 0.32| -0.64| 0.62| |ATE |Y[X=1] - Y[X=0] |- |posteriors |FALSE | 0.43| 0.09| 0.24| 0.59| |Positive effect given any effect |Y[X=1] > Y[X=0] |Y[X=1] != Y[X=0] |priors |FALSE | 0.50| 0.29| 0.03| 0.98| |Positive effect given any effect |Y[X=1] > Y[X=0] |Y[X=1] != Y[X=0] |posteriors |FALSE | 0.83| 0.09| 0.67| 0.99| ``` r batch_queries |> plot() ``` ![plot of chunk unnamed-chunk-14](unnamed-chunk-14-1.png)