--- title: "Explain" author: "Roland Krasser" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Explain} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The explore package offers a simplified way to use machine learning to understand and explain patterns in the data. * `explain_tree()` creates a decision tree. The target can be binary, categorical or numerical * `explain_forest()` creates a random forest. The target can be binary, categorical or numerical * `explain_xgboost()` creates a random forest. The target must be binary (0/1, FALSE/TRUE) * `explain_logreg()` creates a logistic regression. The target must be binary * `balance_target()` to balance a target * `weight_target()` to create weights for the decision tree We use synthetic data in this example ```{r message=FALSE, warning=FALSE} library(dplyr) library(explore) data <- create_data_buy(obs = 1000) glimpse(data) ``` ### Explain / Model #### Decision Tree ```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4} data %>% explain_tree(target = buy) ``` ```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4} data %>% explain_tree(target = mobiledata_prd) ``` ```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4} data %>% explain_tree(target = age) ``` #### Random Forest ```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4} data %>% explain_forest(target = buy, ntree = 100) ``` To get the model itself as output you can use the parameter `out = "model` or `out = all` to get all (feature importance as plot and table, trained model). To use the model for a prediction, you can use `predict_target()` #### XGBoost As XGBoost only accepts numeric variables, we use `drop_var_not_numeric()` to drop `mobile_data_prd` as it is not a numeric variable. An alternative would be to convert the non numeric variables into numeric. ```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4} data %>% drop_var_not_numeric() |> explain_xgboost(target = buy) ``` Use parameter `out = "all"` to get more details about the training ```{r message=FALSE, warning=FALSE} train <- data %>% drop_var_not_numeric() |> explain_xgboost(target = buy, out = "all") ``` ```{r message=FALSE, warning=FALSE} train$importance ``` ```{r message=FALSE, warning=FALSE} train$tune_plot ``` ```{r message=FALSE, warning=FALSE} train$tune_data ``` To use the model for a prediction, you can use `predict_target()` #### Logistic Regression ```{r message=FALSE, warning=FALSE} data %>% explain_logreg(target = buy) ``` ### Balance Target If you have a data set with a very unbalanced target (in this case only 5% of all observations have `buy == 1`) it may be difficult to create a decision tree. ```{r message=FALSE, warning=FALSE} data <- create_data_buy(obs = 2000, target1_prob = 0.05) data %>% describe(buy) ``` It may help to balance the target before growing the decision tree (or use weighs as alternative). In this example we down sample the data so buy has 10% of `target == 1`. ```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4} data %>% balance_target(target = buy, min_prop = 0.10) %>% explain_tree(target = buy) ```