--- title: "policy_eval" output: rmarkdown::html_vignette: fig_caption: true toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{policy_eval} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: ref.bib --- ```{r lib} library(polle) ``` This vignette is a guide to `policy_eval()` and some of the associated S3 methods. The purpose of `policy_eval` is to estimate (evaluate) the value of a user-defined policy or a policy learning algorithm. For details on the methodology, see the associated paper [@nordland2023policy]. We consider a fixed two-stage problem as a general setup and simulate data using `sim_two_stage()` and create a `policy_data` object using `policy_data()`: ```{r simdata} d <- sim_two_stage(n = 2e3, seed = 1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B", "BB"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd ``` ## Evaluating a user-defined policy User-defined policies are created using `policy_def()`. In this case we define a simple static policy always selecting action `'1'`: ```{r poldef} p1 <- policy_def(policy_functions = '1', reuse = TRUE, name = "(A=1)") ``` As we want to apply the same policy function at both stages we set `reuse = TRUE`. `policy_eval()` implements three types of policy evaluations: Inverse probability weighting estimation, outcome regression estimation, and doubly robust (DR) estimation. As doubly robust estimation is a combination of the two other types, we focus on this approach. For details on the implementation see Algorithm 1 in [@nordland2023policy]. ```{r polevalp1} (pe1 <- policy_eval(policy_data = pd, policy = p1, type = "dr")) ``` `policy_eval()` returns an object of type `policy_eval` which prints like a `lava::estimate` object. The policy value estimate and variance are available via `coef()` and `vcov()`: ```{r coef} coef(pe1) vcov(pe1) ``` ### Working with `policy_eval` objects The `policy_eval` object behaves like an `lava::estimate` object, which can also be directly accessed using `estimate()`. `estimate` objects makes it easy to work with estimates with an iid decomposition given by the influence curve/function, see the [estimate vignette](https://CRAN.R-project.org/package=lava/vignettes/influencefunction.html). The influence curve is available via `IC()`: ```{r ic} IC(pe1) |> head() ``` Merging `estimate` objects allow the user to get inference for transformations of the estimates via the Delta method. Here we get inference for the average treatment effect, both as a difference and as a ratio: ```{r estimate_merge} p0 <- policy_def(policy_functions = 0, reuse = TRUE, name = "(A=0)") pe0 <- policy_eval(policy_data = pd, policy = p0, type = "dr") (est <- merge(pe0, pe1)) estimate(est, function(x) x[2]-x[1], labels="ATE-difference") estimate(est, function(x) x[2]/x[1], labels="ATE-ratio") ``` ### Nuisance models So far we have relied on the default generalized linear models for the nuisance g-models and Q-models. As default, a single g-model trained across all stages using the state/Markov type history, see the `policy_data` vignette. Use `get_g_functions()` to get access to the fitted model: ```{r get_nuisance} (gf <- get_g_functions(pe1)) ``` The g-functions can be used as input to a new policy evaluation: ```{r pe_pol} policy_eval(policy_data = pd, g_functions = gf, policy = p0, type = "dr") ``` or we can get the associated predicted values: ```{r pred_gfun} predict(gf, new_policy_data = pd) |> head(6) ``` Similarly, we can inspect the Q-functions using `get_q_functions()`: ```{r get_qfun} get_q_functions(pe1) ``` Note that a model is trained for each stage. Again, we can predict from the Q-models using `predict()`. Usually, we want to specify the nuisance models ourselves using the `g_models` and `q_models` arguments: ```{r gq_models, cache = TRUE} pe1 <- policy_eval(pd, policy = p1, g_models = list( g_sl(formula = ~ BB + L_1, SL.library = c("SL.glm", "SL.ranger")), g_sl(formula = ~ BB + L_1 + C_2, SL.library = c("SL.glm", "SL.ranger")) ), g_full_history = TRUE, q_models = list( q_glm(formula = ~ A * (B + C_1)), # including action interactions q_glm(formula = ~ A * (B + C_1 + C_2)) # including action interactions ), q_full_history = TRUE) ``` Here we train a super learner g-model for each stage using the full available history and a generalized linear model for the Q-models. The `formula` argument is used to construct the model frame passed to the model for training (and prediction). The valid formula terms depending on `g_full_history` and `q_full_history` are available via `get_history_names()`: ```{r history_names} get_history_names(pd) # state/Markov history get_history_names(pd, stage = 1) # full history get_history_names(pd, stage = 2) # full history ``` Remember that the action variable at the current stage is always named `A`. Some models like `glm` require interactions to be specified via the model frame. Thus, for some models, it is important to include action interaction terms for the Q-models. ## Evaluating a policy learning algorithm The value of a learned policy is an important performance measure, and `policy_eval()` allow for direct evaluation of a given policy learning algorithm. For details, see Algorithm 4 in [@nordland2023policy]. In `polle`, policy learning algorithms are specified using `policy_learn()`, see the associated vignette. These functions can be directly evaluated in `policy_eval()`: ```{r pe_pl} policy_eval(pd, policy_learn = policy_learn(type = "ql")) ``` In the above example we evaluate the policy estimated via Q-learning. Alternatively, we can first learn the policy and then pass it to `policy_eval()`: ```{r pe_manual_pol_eval} p_ql <- policy_learn(type = "ql")(pd, q_models = q_glm()) policy_eval(pd, policy = get_policy(p_ql)) ``` ## Cross-fitting A key feature of `policy_eval()` is that it allows for easy cross-fitting of the nuisance models as well the learned policy. Here we specify two-fold cross-fitting via the `M` argument: ```{r crossfits} pe_cf <- policy_eval(pd, policy_learn = policy_learn(type = "ql"), M = 2) ``` Specifically, both the nuisance models and the optimal policy are fitted on each training fold. Subsequently, the doubly robust value score is calculated on the validation folds. The `policy_eval` object now consists of a list of `policy_eval` objects associated with each fold: ```{r pe_crossfits} pe_cf$folds$fold_1 |> head() pe_cf$cross_fits$fold_1 ``` In order to save memory, particularly when cross-fitting, it is possible not to save the nuisance models via the `save_g_functions` and `save_q_functions` arguments. ## Parallel processing via `future.apply` It is easy to parallelize the cross-fitting procedure via the `future.apply` package: ```{r demo_future, eval = FALSE} library(future.apply) plan("multisession") # local parallel procession library("progressr") # progress bar handlers(global = TRUE) policy_eval(pd, policy_learn = policy_learn(type = "ql"), q_models = q_rf(), M = 20) plan("sequential") # resetting to sequential processing ``` # SessionInfo ```{r sessionInfo} sessionInfo() ``` # References