---
title: "ODT"
author: "Maddi Eceiza, Lucia Ruiz, Angel Rubio, Katyna Sada Del Real"
date: "September 2024"
vignette: >
%\VignetteIndexEntry{ODT}
%\VignettePackage{ODT}
%\usepackage[UTF-8]{inputenc}
%\VignetteEngine{knitr::rmarkdown}
output:
rmarkdown: html_vignette
email:
- ksada@unav.es
---
```{r LoadFunctions, echo=FALSE, message=FALSE, warning=FALSE, results='hide'}
library(knitr)
library(rmarkdown)
opts_chunk$set(error = FALSE)
```
---
## Abstract
The *ODT* R package implements the Optimal Decision Tree (ODT) algorithm [(1)](#references), a novel approach designed for the field of personalized medicine. This algorithm employs tree-based methods to recommend the most suitable treatment for each patient by considering their unique genomic and mutational data.
Optimal Decision Trees iteratively refine drug recommendations along each branch until a predefined group size is achieved, ensuring that treatment suggestions are both personalized and statistically robust. This approach enhances decision-making in therapeutic contexts, allowing healthcare professionals to tailor interventions based on individual patient profiles.
# Installation
The *ODT* package can be easily installed from the Comprehensive R Archive Network (CRAN) repository. To install the package, you can use the following command in your R console:
```{r, eval=FALSE}
install.packages("ODT")
```
# Introduction
Unlike other personalized medicine algorithms that use classification or regression trees, *ODT* works by solving optimization problems. It takes into account how each patient responds to different drugs (sensitivity data) and their genomic or mutational information.
The algorithm selects a splitting variable, which could be a gene or a type of mutation, depending on the data being studied. For each split, *ODT* determines the best treatments and optimizes the measure of sensitivity for both branches based on these treatments (for example, using IC50 data). In other words, the algorithm assigns the best treatment to each patient by optimizing sensitivity data while creating an optimal decision tree.
The package consists of three main functions:
- **Model Training** (**`trainTree`**): This function allows users to train the decision tree using the patients' genomic or mutational data (biomarker matrix) and the drug responses (sensitivity matrix).
- **Optimal Treatment Assignment** (**`predictTree`**): After training the tree, this function predicts the optimal treatment for each patient based on their expression and/or mutational data.
- **Decision Tree Visualization** (**`niceTree`**): This function generates a graphical representation of the decision tree splits. Users can also download this plot in various formats to a specified directory.
Figure 1. ODT Model Workflow.
As shown in Figure 1, the *ODT* model operates using two key inputs: the sensitivity matrix and the biomarker matrix. Initially, the model takes the biomarker data - which may consist of a binary matrix indicating the presence or absence of mutations, or a matrix reflecting gene expression levels - to train the decision tree.
At each step, the trained tree splits patients into two groups based on the presence or absence of specific biomarkers. This split is optimized to ensure that the assigned treatment has the highest sensitivity for each group. The algorithm continues to recursively divide the branches until a predefined minimum group size is reached, at which point further splits are no longer possible.\
---
# Example Using Mutational Data
In this example, we will use a binary matrix called `mut_small`, which contains mutation information, along with a drug response matrix named `drug_small` for selected patients. We will work with a small dataset that has IC50 values.
First, we need to train the decision tree using the selected data. We will use the `trainTree` function, which requires the following inputs:
- **PatientData**: The binary matrix containing mutation information, where rows correspond to patients/samples and columns correspond to genes/features.
- **PatientSensitivity**: The matrix that provides drug response information, where rows correspond to patients/samples and columns correspond to drugs.
- **minbucket**: A fixed parameter that specifies the minimum number of patients required in a branch of the tree to allow a split.
```{r, eval=FALSE}
ODT_MUT <- trainTree(PatientResponse = mut_small, PatientSensitivity = drug_small, minbucket = 1)
```
---
The output of the `trainTree` function will be a decision tree that reflects the splits made by the *ODT* algorithm based on the provided mutational and sensitivity data, along with the treatments assigned at each split. To visualize the optimized tree, we will use the `niceTree` function. This function displays the mutations selected at each node and the treatment assigned to each branch (both for branches with and without the mutation).
The necessary inputs for the `niceTree` function are:
- **tree**: The trained decision tree obtained from the `trainTree` function.
- **folder**: The directory where the output image will be saved.
Additionally, users can customize several fixed parameters related to the plot's appearance:
- **colors**
- **fontname**
- **fontstyle**
- **shape**
- **output_format**
For more information regarding plot customization options, please refer to the `niceTree` function documentation.
```{r, eval=FALSE}
niceTree(tree = ODT_MUT, folder = NULL)
```
---
To determine the treatment selected for each specific patient, we will use the `predictTree` function. This function identifies the treatment assigned by the algorithm based on the trained decision tree and the provided patient data. The required inputs for this function are:
- **tree**: The trained decision tree obtained from the `trainTree` function.
- **PatientData**: The binary matrix containing mutation information, where rows correspond to patients/samples and columns correspond to genes/features.
- **PatientSensitivityTrain**: A matrix containing the drug response values of the **training dataset**. In this matrix, rows correspond to patients, and columns correspond to drugs. It is only for extracting treatment names and is not used in the prediction process itself.
The following code snippet demonstrates how to use the `predictTree` function:
```{r eval=TRUE, message=FALSE}
# Load the necessary library and datasets
library(ODT)
data("mutations_w34")
data("drug_response_w34")
# Select a subset of the mutation and drug response data
mut_small <- mutations_w34[1:100, 1:50] # Select first 100 patients and 50 genes
drug_small <- drug_response_w34[1:100, 1:15] # Select first 100 patients and 15 drugs
# Train the decision tree using the selected patient data
ODT_MUT <- trainTree(PatientData = mut_small, PatientSensitivity = drug_small, minbucket = 2)
# Visualize the trained decision tree
niceTree(ODT_MUT)
# Predict the optimal treatment for each patient
ODT_MUTpred <- predictTree(tree = ODT_MUT, PatientSensitivityTrain = drug_small, PatientData = mut_small)
# Retrieve and display the names of the selected treatments
names_drug <- colnames(drug_small)
selected_treatments <- names_drug[ODT_MUTpred]
selected_treatments[1:3] # Treatment selected for first 3 patients
```
Figure 2. Trained Decision Tree Output from the *niceTree* Function: This figure illustrates the decision tree generated by the ODT algorithm, showcasing the splits based on mutational data and the corresponding treatments assigned at each node.
---
# Example Using Gene Expression Data
In this example, we will use a matrix called `gene_small`, which contains gene expression information, along with a drug response matrix named `drug_small` for selected patients.
First, we will train the decision tree using the selected data with the `trainTree` function. The required inputs for this function are:
- **PatientData**: The numeric matrix containing gene expression information, where rows correspond to patients/samples and columns correspond to genes/features.
- **PatientSensitivity**: The matrix that provides drug response information, where rows correspond to patients/samples and columns correspond to drugs.
- **minbucket**: A fixed parameter that specifies the minimum number of patients required in a branch of the tree to allow a split.
```{r, eval=FALSE}
ODT_EXP <- trainTree(PatientData = gene_small, PatientSensitivity = drug_small, minbucket = 1)
```
---
The output of the `trainTree` function will be a decision tree that reflects the splits made by the *ODT* algorithm based on the provided genomic and sensitivity data, along with the treatments assigned at each split. To visualize the optimized tree, we will use the `niceTree` function. This function displays the biomarker selected at each node and the treatment assigned to each branch.
The necessary inputs for the `niceTree` function are:
- **tree**: The trained decision tree obtained from the `trainTree` function.
- **folder**: The directory where the output image will be saved.
Additionally, users can customize several fixed parameters related to the plot's appearance:
- **colors**
- **fontname**
- **fontstyle**
- **shape**
- **output_format**
For more information regarding plot customization options, please refer to the `niceTree` function documentation.
```{r, eval=FALSE}
niceTree(tree = ODT_EXP, folder = NULL)
```
---
To determine the treatment selected for each specific patient, we will use the `predictTree` function. This function identifies the treatment assigned by the algorithm based on the trained decision tree and the provided patient data. The required inputs for this function are:
- **tree**: The trained decision tree obtained from the `trainTree` function.
- **PatientData**: The numeric matrix containing gene expression information, where rows correspond to patients/samples and columns correspond to genes/features.
- **PatientSensitivityTrain**: A matrix containing the drug response values of the **training dataset**. In this matrix, rows correspond to patients, and columns correspond to drugs. It is only for extracting treatment names and is not used in the prediction process itself.
The following code snippet demonstrates how to use the `predictTree` function:
```{r, eval=TRUE}
# Load the necessary library and datasets
library(ODT)
# Load the gene expression and drug response data
data("expression_w34")
data("drug_response_w34")
# Select a subset of the gene expression and drug response data
gene_small <- expression_w34[1:3, 1:3]
drug_small <- drug_response_w34[1:3, 1:3]
# Train the decision tree using the selected patient data
ODT_EXP <- trainTree(PatientData = gene_small, PatientSensitivity = drug_small, minbucket = 1)
# Visualize the trained decision tree
niceTree(ODT_EXP)
# Predict the optimal treatment for each patient
ODT_EXPpred <- predictTree(tree = ODT_EXP, PatientSensitivityTrain = drug_small, PatientData = gene_small)
# Retrieve and display the names of the selected treatments
selected_treatments <- colnames(drug_small)[ODT_EXPpred]
selected_treatments
```
Figure 3. Trained Decision Tree Output from the *niceTree* Function: This figure illustrates the decision tree generated by the ODT algorithm, showcasing the splits based on expression data and the corresponding treatments assigned at each node.
---
# Example: Assigning Optimal Treatment to New Patients (Mutational Data)
In this example, we will use a binary matrix containing mutation information along with a drug response matrix from existing patients. We will train a model to later predict the best treatment for a new patient whose sensitivity response to different treatments is unknown.
```{r, eval=TRUE}
# Load the necessary library and datasets
library(ODT)
data("mutations_w34")
data("mutations_w12")
data("drug_response_w12")
data("drug_response_w34")
# Define a binary matrix for new patients (using the first patient as an example)
mut_newpatients<-mutations_w34[1, ,drop=FALSE]
# Train the decision tree model using known patient data
ODT_MUT<-trainTree(PatientData = mutations_w12, PatientSensitivity=drug_response_w12, minbucket =10)
# Visualize the trained decision tree
niceTree(ODT_MUT,folder=NULL)
# Predict the optimal treatment for the new patient
ODT_MUTpred<-predictTree(tree=ODT_MUT, PatientSensitivityTrain=drug_response_w12, PatientData=mut_newpatients)
# Retrieve and display the name of the selected treatment
selected_treatment <- colnames(drug_response_w12)[ODT_MUTpred]
selected_treatment
```
Figure 4. Trained Decision Tree for New Patients Using Mutational Data: This figure illustrates the output of the *niceTree* function, showcasing the decision tree trained on existing patient data. It highlights the splits based on mutation information and the treatment recommendations for new patients.
---
# Example: Assigning Optimal Treatment to New Patients (Gene Expression Data)
In this example, we will use a matrix containing gene expression information along with a drug response matrix from existing patients. We will train a model to predict the best treatment for a new patient whose sensitivity response to different treatments is unknown.
```{r, eval=TRUE}
# Load the necessary library and datasets
library(ODT)
# Load gene expression and drug response data
data("expression_w34")
data("expression_w12")
data("drug_response_w12")
data("drug_response_w34")
# Define a matrix for new patients (using the first patient as an example)
exp_newpatients <- expression_w34[1, , drop = FALSE]
# Train the decision tree model using known patient data
ODT_EXP <- trainTree(PatientData = expression_w12, PatientSensitivity = drug_response_w12, minbucket = 10)
# Visualize the trained decision tree
niceTree(ODT_EXP, folder = NULL)
# Predict the optimal treatment for the new patient
ODT_EXPpred <- predictTree(tree = ODT_EXP, PatientSensitivityTrain = drug_response_w12, PatientData = exp_newpatients)
# Retrieve and display the name of the selected treatment
selected_treatment <- colnames(drug_response_w12)[ODT_EXPpred]
selected_treatment
```
Figure 5. Trained Decision Tree for New Patients Using Genomic Expression Data: This figure illustrates the output of the *niceTree* function, showcasing the decision tree trained on existing patient data. It highlights the splits based on gene expression information and the treatment recommendations for new patients.
# References
More information can be found at:
1. Gimeno, M., Sada del Real, K., & Rubio, A. (2023). Precision oncology: a review to assess interpretability in several explainable methods. *Briefings in Bioinformatics*, 24(4), bbad200. https://doi.org/10.1093/bib/bbad200
# Session Information
```{r}
sessionInfo()
```