The ODT R package implements the Optimal Decision Tree (ODT) algorithm (1), a novel approach designed for the field of personalized medicine. This algorithm employs tree-based methods to recommend the most suitable treatment for each patient by considering their unique genomic and mutational data.
Optimal Decision Trees iteratively refine drug recommendations along each branch until a predefined group size is achieved, ensuring that treatment suggestions are both personalized and statistically robust. This approach enhances decision-making in therapeutic contexts, allowing healthcare professionals to tailor interventions based on individual patient profiles.
The ODT package can be easily installed from the Comprehensive R Archive Network (CRAN) repository. To install the package, you can use the following command in your R console:
Unlike other personalized medicine algorithms that use classification or regression trees, ODT works by solving optimization problems. It takes into account how each patient responds to different drugs (sensitivity data) and their genomic or mutational information.
The algorithm selects a splitting variable, which could be a gene or a type of mutation, depending on the data being studied. For each split, ODT determines the best treatments and optimizes the measure of sensitivity for both branches based on these treatments (for example, using IC50 data). In other words, the algorithm assigns the best treatment to each patient by optimizing sensitivity data while creating an optimal decision tree.
The package consists of three main functions:
trainTree
): This function allows users to
train the decision tree using the patients’ genomic or mutational data
(biomarker matrix) and the drug responses (sensitivity matrix).predictTree
): After training the tree,
this function predicts the optimal treatment for each patient based on
their expression and/or mutational data.niceTree
): This function generates a
graphical representation of the decision tree splits. Users can also
download this plot in various formats to a specified directory.Figure 1. ODT Model Workflow.
As shown in Figure 1, the ODT model operates using two key inputs: the sensitivity matrix and the biomarker matrix. Initially, the model takes the biomarker data - which may consist of a binary matrix indicating the presence or absence of mutations, or a matrix reflecting gene expression levels - to train the decision tree.
At each step, the trained tree splits patients into two groups based
on the presence or absence of specific biomarkers. This split is
optimized to ensure that the assigned treatment has the highest
sensitivity for each group. The algorithm continues to recursively
divide the branches until a predefined minimum group size is reached, at
which point further splits are no longer possible.
In this example, we will use a binary matrix called
mut_small
, which contains mutation information, along with
a drug response matrix named drug_small
for selected
patients. We will work with a small dataset that has IC50 values.
First, we need to train the decision tree using the selected data. We
will use the trainTree
function, which requires the
following inputs:
The output of the trainTree
function will be a decision
tree that reflects the splits made by the ODT algorithm based
on the provided mutational and sensitivity data, along with the
treatments assigned at each split. To visualize the optimized tree, we
will use the niceTree
function. This function displays the
mutations selected at each node and the treatment assigned to each
branch (both for branches with and without the mutation).
The necessary inputs for the niceTree
function are:
trainTree
function.Additionally, users can customize several fixed parameters related to the plot’s appearance:
For more information regarding plot customization options, please
refer to the niceTree
function documentation.
To determine the treatment selected for each specific patient, we
will use the predictTree
function. This function identifies
the treatment assigned by the algorithm based on the trained decision
tree and the provided patient data. The required inputs for this
function are:
trainTree
function.The following code snippet demonstrates how to use the
predictTree
function:
# Load the necessary library and datasets
library(ODT)
data("mutations_w34")
data("drug_response_w34")
# Select a subset of the mutation and drug response data
mut_small <- mutations_w34[1:100, 1:50] # Select first 100 patients and 50 genes
drug_small <- drug_response_w34[1:100, 1:15] # Select first 100 patients and 15 drugs
# Train the decision tree using the selected patient data
ODT_MUT <- trainTree(PatientData = mut_small, PatientSensitivity = drug_small, minbucket = 2)
# Visualize the trained decision tree
niceTree(ODT_MUT)
## $Tree
## [1] root
## | [2] NPM1 <= 1
## | | [3] KRAS <= 1
## | | | [4] NRAS <= 1
## | | | | [5] WT1 <= 1
## | | | | | [6] SF3B1 <= 1
## | | | | | | [7] FLT3 <= 1
## | | | | | | | [8] KIAA0907 <= 1: Dasatinib
## | | | | | | | [9] KIAA0907 > 1: Sorafenib
## | | | | | | [10] FLT3 > 1
## | | | | | | | [11] TET2 <= 1: Sorafenib
## | | | | | | | [12] TET2 > 1: Lapatinib
## | | | | | [13] SF3B1 > 1: Ruxolitinib (INCB018424)
## | | | | [14] WT1 > 1: Crenolanib
## | | | [15] NRAS > 1: Nilotinib
## | | [16] KRAS > 1: Pazopanib (GW786034)
## | [17] NPM1 > 1
## | | [18] SRSF2 <= 1
## | | | [19] IDH1 <= 1: Ibrutinib (PCI-32765)
## | | | [20] IDH1 > 1: Crenolanib
## | | [21] SRSF2 > 1: Quizartinib (AC220)
##
## $Plot
# Predict the optimal treatment for each patient
ODT_MUTpred <- predictTree(tree = ODT_MUT, PatientSensitivityTrain = drug_small, PatientData = mut_small)
# Retrieve and display the names of the selected treatments
names_drug <- colnames(drug_small)
selected_treatments <- names_drug[ODT_MUTpred]
selected_treatments[1:3] # Treatment selected for first 3 patients
## [1] "Crenolanib" "Ruxolitinib (INCB018424)"
## [3] "Dasatinib"
Figure 2. Trained Decision Tree Output from the niceTree Function: This figure illustrates the decision tree generated by the ODT algorithm, showcasing the splits based on mutational data and the corresponding treatments assigned at each node.
In this example, we will use a matrix called gene_small
,
which contains gene expression information, along with a drug response
matrix named drug_small
for selected patients.
First, we will train the decision tree using the selected data with
the trainTree
function. The required inputs for this
function are:
The output of the trainTree
function will be a decision
tree that reflects the splits made by the ODT algorithm based
on the provided genomic and sensitivity data, along with the treatments
assigned at each split. To visualize the optimized tree, we will use the
niceTree
function. This function displays the biomarker
selected at each node and the treatment assigned to each branch.
The necessary inputs for the niceTree
function are:
trainTree
function.Additionally, users can customize several fixed parameters related to the plot’s appearance:
For more information regarding plot customization options, please
refer to the niceTree
function documentation.
To determine the treatment selected for each specific patient, we
will use the predictTree
function. This function identifies
the treatment assigned by the algorithm based on the trained decision
tree and the provided patient data. The required inputs for this
function are:
trainTree
function.The following code snippet demonstrates how to use the
predictTree
function:
# Load the necessary library and datasets
library(ODT)
# Load the gene expression and drug response data
data("expression_w34")
data("drug_response_w34")
# Select a subset of the gene expression and drug response data
gene_small <- expression_w34[1:3, 1:3]
drug_small <- drug_response_w34[1:3, 1:3]
# Train the decision tree using the selected patient data
ODT_EXP <- trainTree(PatientData = gene_small, PatientSensitivity = drug_small, minbucket = 1)
# Visualize the trained decision tree
niceTree(ODT_EXP)
## $Tree
## [1] root
## | [2] TSPAN6 <= -0.86591: Crizotinib (PF-2341066)
## | [3] TSPAN6 > -0.86591: Axitinib (AG-013736)
##
## $Plot
# Predict the optimal treatment for each patient
ODT_EXPpred <- predictTree(tree = ODT_EXP, PatientSensitivityTrain = drug_small, PatientData = gene_small)
# Retrieve and display the names of the selected treatments
selected_treatments <- colnames(drug_small)[ODT_EXPpred]
selected_treatments
## [1] "Crizotinib (PF-2341066)" "Axitinib (AG-013736)"
## [3] "Axitinib (AG-013736)"
Figure 3. Trained Decision Tree Output from the niceTree Function: This figure illustrates the decision tree generated by the ODT algorithm, showcasing the splits based on expression data and the corresponding treatments assigned at each node.
In this example, we will use a binary matrix containing mutation information along with a drug response matrix from existing patients. We will train a model to later predict the best treatment for a new patient whose sensitivity response to different treatments is unknown.
# Load the necessary library and datasets
library(ODT)
data("mutations_w34")
data("mutations_w12")
data("drug_response_w12")
data("drug_response_w34")
# Define a binary matrix for new patients (using the first patient as an example)
mut_newpatients<-mutations_w34[1, ,drop=FALSE]
# Train the decision tree model using known patient data
ODT_MUT<-trainTree(PatientData = mutations_w12, PatientSensitivity=drug_response_w12, minbucket =10)
# Visualize the trained decision tree
niceTree(ODT_MUT,folder=NULL)
## $Tree
## [1] root
## | [2] NRAS <= 1
## | | [3] KRAS <= 1
## | | | [4] BCOR <= 1
## | | | | [5] PTPN11 <= 1
## | | | | | [6] TP53 <= 1
## | | | | | | [7] CBFB-MYH11 <= 1
## | | | | | | | [8] CEBPA <= 1: Quizartinib (AC220)
## | | | | | | | [9] CEBPA > 1: AZD1480
## | | | | | | [10] CBFB-MYH11 > 1: JNJ-28312141
## | | | | | [11] TP53 > 1: XAV-939
## | | | | [12] PTPN11 > 1: Panobinostat
## | | | [13] BCOR > 1: RAF265 (CHIR-265)
## | | [14] KRAS > 1: Selumetinib (AZD6244)
## | [15] NRAS > 1: Trametinib (GSK1120212)
##
## $Plot
# Predict the optimal treatment for the new patient
ODT_MUTpred<-predictTree(tree=ODT_MUT, PatientSensitivityTrain=drug_response_w12, PatientData=mut_newpatients)
# Retrieve and display the name of the selected treatment
selected_treatment <- colnames(drug_response_w12)[ODT_MUTpred]
selected_treatment
## [1] "Quizartinib (AC220)"
Figure 4. Trained Decision Tree for New Patients Using Mutational Data: This figure illustrates the output of the niceTree function, showcasing the decision tree trained on existing patient data. It highlights the splits based on mutation information and the treatment recommendations for new patients.
In this example, we will use a matrix containing gene expression information along with a drug response matrix from existing patients. We will train a model to predict the best treatment for a new patient whose sensitivity response to different treatments is unknown.
# Load the necessary library and datasets
library(ODT)
# Load gene expression and drug response data
data("expression_w34")
data("expression_w12")
data("drug_response_w12")
data("drug_response_w34")
# Define a matrix for new patients (using the first patient as an example)
exp_newpatients <- expression_w34[1, , drop = FALSE]
# Train the decision tree model using known patient data
ODT_EXP <- trainTree(PatientData = expression_w12, PatientSensitivity = drug_response_w12, minbucket = 10)
# Visualize the trained decision tree
niceTree(ODT_EXP, folder = NULL)
## $Tree
## [1] root
## | [2] VCAN <= 6.53
## | | [3] VAMP3 <= 7.55
## | | | [4] LUC7L <= 5.44: ABT-737
## | | | [5] LUC7L > 5.44: Venetoclax
## | | [6] VAMP3 > 7.55: CHIR-99021
## | [7] VCAN > 6.53
## | | [8] TEAD3 <= -1.41: Trametinib (GSK1120212)
## | | [9] TEAD3 > -1.41
## | | | [10] LRP6 <= -1.74: JNJ-28312141
## | | | [11] LRP6 > -1.74: Panobinostat
##
## $Plot
# Predict the optimal treatment for the new patient
ODT_EXPpred <- predictTree(tree = ODT_EXP, PatientSensitivityTrain = drug_response_w12, PatientData = exp_newpatients)
# Retrieve and display the name of the selected treatment
selected_treatment <- colnames(drug_response_w12)[ODT_EXPpred]
selected_treatment
## [1] "Panobinostat"
Figure 5. Trained Decision Tree for New Patients Using Genomic Expression Data: This figure illustrates the output of the niceTree function, showcasing the decision tree trained on existing patient data. It highlights the splits based on gene expression information and the treatment recommendations for new patients.
More information can be found at:
## R version 4.4.1 Patched (2024-09-30 r87211)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.0
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Madrid
## tzcode source: internal
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ODT_1.0.0 data.tree_1.1.0 partykit_1.2-22 mvtnorm_1.3-1
## [5] libcoin_1.0-10 matrixStats_1.4.1 rmarkdown_2.28 knitr_1.48
##
## loaded via a namespace (and not attached):
## [1] Matrix_1.7-0 jsonlite_1.8.9 dplyr_1.1.4 compiler_4.4.1
## [5] tidyselect_1.2.1 rpart_4.1.23 Rcpp_1.0.13 stringr_1.5.1
## [9] rsvg_2.6.1 magick_2.8.5 DiagrammeR_1.0.11 jquerylib_0.1.4
## [13] splines_4.4.1 yaml_2.3.10 fastmap_1.2.0 lattice_0.22-6
## [17] R6_2.5.1 generics_0.1.3 Formula_1.2-5 htmlwidgets_1.6.4
## [21] visNetwork_2.1.2 tibble_3.2.1 inum_1.0-5 pillar_1.9.0
## [25] bslib_0.8.0 RColorBrewer_1.1-3 rlang_1.1.4 utf8_1.2.4
## [29] cachem_1.1.0 stringi_1.8.4 xfun_0.48 sass_0.4.9
## [33] cli_3.6.3 withr_3.0.1 magrittr_2.0.3 digest_0.6.37
## [37] rstudioapi_0.16.0 lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.0
## [41] glue_1.8.0 survival_3.7-0 fansi_1.0.6 purrr_1.0.2
## [45] pkgconfig_2.0.3 tools_4.4.1 htmltools_0.5.8.1