--- title: "Predict" author: "Roland Krasser" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Predict} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The explore package offers a simplified way to use machine learning and make a prediction. * `explain_tree()` creates a decision tree * `explain_forest()` creates a random forest * `explain_xgboost()` creates a xgboost model * `explain_logreg()` creates a logistic regression * `predict_target()` uses a model to make a prediction We use synthetic data in this example ```{r message=FALSE, warning=FALSE} library(dplyr) library(explore) train <- create_data_buy(obs = 1000, seed = 1) glimpse(train) ``` ### Train model First we create a decision tree model, using `buy` as target (`buy` contains only 0 and 1 values) ```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4} train %>% explain_tree(target = buy) ``` We see some clear patterns. Now we create a random forest model (as it is more accurate). To get the model itself, use parameter `out = "model"` ```{r fig.height=4, fig.width=6, message=FALSE, warning=FALSE} model <- train %>% explain_forest(target = buy, out = "model") ``` ### Predict Now we create test data and use the model for a prediction. We use a different seed so we get different data. ```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4} test <- create_data_buy(obs = 1000, seed = 2) glimpse(test) ``` ```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4} test <- test %>% predict_target(model = model) glimpse(test) ``` Now we got 2 new variables `prediction_0` (the probability of `buy == 0`) and `prediction_1` (the probability of `buy == 1`). We can check the predictions by comparing `prediction_1` with real values of buy. ```{r message=FALSE, warning=FALSE, fig.width=6, fig.height=4} test %>% explore(prediction_1, target = buy) ``` There is a clear difference between `buy == 0` and `buy == 1`. So the prediction works.