---
title: "Main Steps"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Main Steps}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, eval=FALSE}
library(PheCAP)
```
Load data into R. The last argument of `PhecapData`, 0.4, refers to the percentage of labels reserved as test set.
```{r, eval=FALSE}
data(ehr_data)
data <- PhecapData(ehr_data, "healthcare_utilization", "label", 0.4)
data
```
Specify the surrogate used for surrogate-assisted feature extraction (SAFE). The typical way is to specify a main ICD code, a main NLP CUI, as well as their combination. In some cases one may want to define surrogate through lab test. The default lower_cutoff is 1, and the default upper_cutoff is 10. Feel free to change the cutoffs based on domain knowledge.
```{r, eval=FALSE}
surrogates <- list(
PhecapSurrogate(
variable_names = "main_ICD",
lower_cutoff = 1, upper_cutoff = 10),
PhecapSurrogate(
variable_names = "main_NLP",
lower_cutoff = 1, upper_cutoff = 10),
PhecapSurrogate(
variable_names = c("main_ICD", "main_NLP"),
lower_cutoff = 1, upper_cutoff = 10))
```
Run surrogate-assisted feature extraction (SAFE) and show result.
```{r, eval=FALSE}
feature_selected <- phecap_run_feature_extraction(data, surrogates)
feature_selected
```
Train phenotyping model and show the fitted model, with the AUC on the training set as well as random splits.
```{r, eval=FALSE}
model <- phecap_train_phenotyping_model(data, surrogates, feature_selected)
model
```
Validate phenotyping model using validation label, and show the AUC and ROC.
```{r, eval=FALSE}
validation <- phecap_validate_phenotyping_model(data, model)
validation
phecap_plot_roc_curves(validation)
```
Apply the model to all the patients to obtain predicted phenotype.
```{r, eval=FALSE}
phenotype <- phecap_predict_phenotype(data, model)
```