--- title: "2.D: PANTHER & rbioapi" author: "Moosa Rezwani" description: > Connect to PANTHER in R with rbioapi package. date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true anchor_sections: true number_sections: true self_contained: true dev: png encoding: 'UTF-8' vignette: > %\VignetteIndexEntry{2.D: PANTHER & rbioapi} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r message=FALSE, include=FALSE, results="hide", setup, echo=FALSE} knitr::opts_chunk$set( echo = TRUE, eval = TRUE, message = FALSE, warning = FALSE, collapse = TRUE, tidy = FALSE, cache = FALSE, dev = "png", comment = "#>" ) library(rbioapi) rba_options(timeout = 30, skip_error = TRUE) ``` # Introduction {#introduction} Directly quoting the the paper published by [PANTHER](https://www.pantherdb.org "Protein Analysis THrough Evolutionary Relationships (PANTHER)") (Protein Analysis THrough Evolutionary Relationships) authors: > The PANTHER classification system () a comprehensive system that combines genomes, gene function , pathways and statistical analysis tools to enable to analyze large-scale genome-wide experimental data. The system (PANTHER v.14.0) covers 131 complete genomes organized gene families and subfamilies; evolutionary relationships between are represented in phylogenetic trees, multiple sequence and statistical models (hidden Markov models (HMMs)). The families and subfamilies are annotated with Gene Ontology (GO) terms, sequences are assigned to PANTHER pathways. A suite of tools has built to allow users to browse and query gene functions and analyze-scale experimental data with a number of statistical tests. is widely used by bench scientists, bioinformaticians, computer and systems biologists. > > (source: Mi, Huaiyu, et al. "Protocol Update for large-scale genome and gene function analysis with the PANTHER classification system (v. 14.0)." *Nature protocols* 14.3 (2019): 703-721) The available tools in PANTHER's **RESTful API services** can be divided into 3 broad categories: Mapping genes, retrieving information, and research tools. Herein, we provide a very short introduction; you can always check functions' manuals for detailed guides and examples. ------------------------------------------------------------------------ # Map genes {#map-genes} - `rba_panther_mapping()`: map your gene-set to PANTHER database and retrieve attributes and annotations associated with your genes - `rba_panther_ortholog()`: Retrieve Orthologs of your genes - `rba_panther_homolog()`: Retrieve Homologs of your genes ------------------------------------------------------------------------ # Get information {#get-information} - `rba_panther_info`(): Retrieve a list of PANTHER's supported organisms, datasets, families, or pathways - `rba_panther_family`(): Retrieve Orthologs, MSA, or Tree topology of a given PANTHER family. ------------------------------------------------------------------------ # Gene List Analysis {#gene-list-analysis} `rba_panther_enrich()` is equivalent to [Gene List analysis tool's webpage](https://www.pantherdb.org/index.jsp "PANTHER Gene List Analysis"). Depending on the provided input's class, PANTHER will perform either over-representation analysis or statistical enrichment analysis. Below we demonstrate how to perform such analyses. ## Get the available annotation datasets {#analysis-get-available-annotation-datasets} First, we need to select an annotation dataset to conduct the analysis based on it. Each annotation dataset contains a collection of terms, where each term is associated with a group of genes. To retrieve the list of available annotation datasets in PANTHER, use the following command: ```{r enrich_available_annotations} annots <- rba_panther_info(what = "datasets") ``` ```{r enrich_available_annotations_results, echo=FALSE} if (is.data.frame(annots)) { DT::datatable( data = annots, options = list( scrollX = TRUE, paging = TRUE, fixedHeader = TRUE, keys = TRUE ) ) } else { print("Vignette building failed. It is probably because the web service was down during the building.") } ``` Please note that you should use the ID of the desired annotation dataset, not its label. For example, using `"biological_process"` is incorrect; you should rather use `"GO:0008150"`. ## Submit the analysis request {#submit-the-analysis-request} Depending on the provided input, PANTHER will conduct two types of analysis: 1. If a character vector is supplied, over-representation analysis will be performed using either Fisher's exact or binomial test. 2. If a data frame with gene identifiers and their corresponding expression values is supplied, statistical enrichment test is performed using Mann-Whitney U (Wilcoxon Rank-Sum) test. rbioapi determines the proper analysis based on the class of the `genes` parameter. Please refer to the details section of `rba_panther_enrich()` function manual for more information. ### Over-representation analysis {#over-representation-analysis} Now, suppose we want to perform an over-representation analysis against the 'GO biological process' annotation dataset. In this example, we only provide the gene names, thus over-representation analysis will be conducted: ```{r rba_panther_overrep, message=TRUE} # Create a variable to store the genes vector my_genes_vec <- c( "p53", "BRCA1", "cdk2", "Q99835", "CDC42", "CDK1","KIF23","PLK1", "RAC2","RACGAP1","RHOA", "RHOB", "PHF14", "RBM3", "MSL1" ) # Submit the analysis request. enriched <- rba_panther_enrich( genes = my_genes_vec, organism = 9606, annot_dataset = "GO:0008150", cutoff = 0.05 ) # Note that we didn't supply the `test_type` parameter. # In this case, the function will default to using Fisher's exact test # (i.e. `test_type = "FISHER"`). # You may also use binomial test for the over-representation analysis # (i.e. `test_type = "BINOMIAL"`). ``` ```{r rba_panther_overrep_results, echo=FALSE} if (utils::hasName(enriched, "result") && is.data.frame(enriched$result)) { DT::datatable( data = enriched$result, options = list( scrollX = TRUE, paging = TRUE, fixedHeader = TRUE, keys = TRUE, pageLength = 10 ) ) } else { print("Vignette building failed. It is probably because the web service was down during the building.") } ``` ### Statistical enrichment analysis {#statistical-enrichment-analysis} As you can see in the above example, only a vector of gene names was used. We can also use the corresponding expression values of the genes. In this case, PANTHER will perform a statistical enrichment analysis. To do so, the only change will be to supply a data frame to the `genes` parameter. Note that in this case, Mann-Whitney U Test will be performed. The data frame should have two columns: the first column should contain the gene identifiers as a character vector; the second column should contain the corresponding expression values as a numeric vector. ```{r rba_panther_enrich, eval=FALSE} # Create a variable to store the data frame my_genes_df <- data.frame( genes = c( "p53", "BRCA1", "cdk2", "Q99835", "CDC42", "CDK1","KIF23","PLK1", "RAC2","RACGAP1","RHOA", "RHOB", "PHF14", "RBM3", "MSL1" ), ## generate random expression values expression = runif(15, 0, 10) ) # Submit the analysis request. enriched <- rba_panther_enrich( genes = my_genes_df, organism = 9606, annot_dataset = "GO:0008150", cutoff = 0.05 ) # Note that we didn't supply the `test_type` parameter. # In this case, the function will default to Mann-Whitney U Test # (i.e. `test_type = "Mann-Whitney"`). # This is the only valid value for the statistical enrichment analysis test, # thus ommiting or supplying it will not make a difference. ``` **Please Note:** Other services supported by rbioapi also provide Over-representation analysis tools. Please see the vignette article [Do with rbioapi: Over-Representation (Enrichment) Analysis in R](rbioapi_do_enrich.html) ([link to the documentation site](https://rbioapi.moosa-r.com/articles/rbioapi_do_enrich.html)) for an in-depth review. ------------------------------------------------------------------------ # Tree grafter {#tree-grafter} `rba_panther_tree_grafter()` is an equivalent to the "[Graft sequence into PANTHER library of trees](https://www.pantherdb.org/tools/sequenceSearchForm.jsp)" tool. ------------------------------------------------------------------------ # How to Cite? {#citations} To cite PANTHER (Please see ): - Huaiyu Mi, Dustin Ebert, Anushya Muruganujan, Caitlin Mills, Laurent-Philippe Albou, Tremayne Mushayamaha, Paul D Thomas, PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API, *Nucleic Acids Research*, Volume 49, Issue D1, 8 January 2021, Pages D394--D403, To cite rbioapi: - Moosa Rezwani, Ali Akbar Pourfathollah, Farshid Noorbakhsh, rbioapi: user-friendly R interface to biologic web services' API, Bioinformatics, Volume 38, Issue 10, 15 May 2022, Pages 2952--2953, ------------------------------------------------------------------------ # Links {#links} - [This article in rbioapi documentation site](https://rbioapi.moosa-r.com/articles/rbioapi_panther.html "2.C: PANTHER & rbioapi") - [Functions references in rbioapi documentation site](https://rbioapi.moosa-r.com/reference/index.html#section-enrichr-rba-enrichr- "rbioapi reference") - [rbioapi vignette index](rbioapi.html "rbioapi: User-Friendly R Interface to Biologic Web Services' API") ------------------------------------------------------------------------ # Session info {#session-info} ```{r sessionInfo, echo=FALSE} sessionInfo() ```