--- title: "CoNI" author: "José Manuel Monroy Kuhn" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{CoNI} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(CoNI) ``` ## Introduction This vignette is an introduction to `CoNI`. Following this vignette, one can analyze the data in Valentina et al. (2021). `CoNI` is run using gene expression and metabolic data for the livers of two mice cohorts, one on a high-fat diet (HFD) and the other on a chow diet (Chow). These data is already pre-processed, that is, normalized and low-count filtered. For more information see Valentina et al. (2021) #### Installation requirements `CoNI` requires a few R packages to function. Make sure they are installed before installing CoNI: ```{r install_dependencies} # dependencies<-c("igraph","doParallel","cocor","tidyverse","foreach","ggrepel","gplots","gridExtra","plyr","ppcor","tidyr","Hmisc") # # `%notin%`<-Negate(`%in%`) # for(package in dependencies){ # if(package %notin% rownames(installed.packages())){ # install.packages(package,dependencies = TRUE) # } # } ``` ```{r, eval=FALSE, echo=TRUE} # if (!requireNamespace("BiocManager", quietly = TRUE)) # install.packages("BiocManager") # BiocManager::install("genefilter") ``` Python3 is also required to run `CoNI`. Make sure it is installed and in your path. ## Correlation guided Network integration (CoNI) ### Run CoNI #### Load data ```{r Download_data} #Chow Data data(Chow_MetaboliteData) #Metabolite data data(Chow_GeneExpData) #Gene expression data #HFD data data(HFD_MetaboliteData) #Metabolite data data(HFD_GeneExpData) #Gene expression data ``` Before running `CoNI`, one has to make sure the sample names between omics datasets match (as both omic data sets should come from the same samples). If they do not match, `CoNI` will throw an error. ```{r matchRows} #Match rownames both omics data rownames(Chow_MetaboliteData)<-rownames(Chow_GeneExpData) rownames(HFD_MetaboliteData)<-rownames(HFD_GeneExpData) #Shorten names Chow_metabo<-Chow_MetaboliteData Chow_gene<-Chow_GeneExpData HFD_metabo<-HFD_MetaboliteData HFD_gene<-HFD_GeneExpData ``` To run `CoNI`, one can decide to keep metabolites (vertex-features) based on the significance of their pairwise correlations (padjustvertexD=FALSE) or do more stringent filtering by keeping only metabolites that significantly correlate after multiple-testing adjustment (padjustvertexD=TRUE). The user can also pre-filter the genes by keeping only those that significantly correlate with at least one metabolite (correlateDFs=TRUE). Note: This run takes around 5 hours (12 cores, 32Gb of RAM). Adjust according to the available resources. ```{r CoNI_chow,eval=FALSE, echo=TRUE} #Run for Chow # CoNIResults_Chow<-CoNI(edgeD = Chow_gene,vertexD = Chow_metabo, # saveRaw = FALSE,filter_highVarianceEdge = TRUE,correlateDFs=TRUE, # padjustvertexD = FALSE, split_number = 200, # outputDir = "./Chow/",outputName = "CoNIChow", # splitedgeD = TRUE,numCores = 2,onlySgRes = TRUE) ``` To speed up this vignette, load significant Chow results: ```{r load_CoNIResults_chow} #Load chow results data(CoNIResults_Chow) ``` Note: This run takes around 5 hours (12 cores, 32Gb of RAM). Adjust according to the available resources. ```{r CoNI_hfd,eval=FALSE, echo=TRUE} #Run for HFD # CoNIResults_HFD<-CoNI(edgeD = HFD_gene,vertexD = HFD_metabo, # saveRaw = FALSE,filter_highVarianceEdge = TRUE, # padjustvertexD = FALSE, split_number = 200,correlateDFs=TRUE, # outputDir = "./HFD/",outputName = "CoNIHFD", # splitedgeD = TRUE,numCores = 2,onlySgRes = TRUE) ``` To speed up this vignette, load significant HFD results: ```{r load_CoNIResults_hfd} #Load HFD results data(CoNIResults_HFD) ``` To speed up the computation, `CoNI` splits the edge Data into chunks, and it runs in parallel using the data of these chunks. Sometimes is necessary to tune 'split_number' to avoid running out of memory. For parallelization, `CoNI` uses the R package doParallel (Microsoft Corporation and Steve Weston, 2020). One can specify the number of cores with 'numCores'. Internally in `CoNI`, the partial correlations are calculated with the R function ‘pcor.test’ of the package ppcor (Kim, 2015) and the Steiger tests with the function ‘cocor.dep.groups.overlap’ of the R package cocor (Diedenhofen and Musch, 2015). ### Create networks The user can create a network/graph and assign colors to the vertexes with the function 'generate_network'. To assign colors one has to provide a table matching vertexes and colors. The vertex names in the table have to be in the first column. ```{r read_metabolite_annotation} #Read Annotation table data(MetaboliteAnnotation) MetaboliteAnnotation<-assign_colorsAnnotation(MetaboliteAnnotation,col="Class") ``` ```{r network_chow} #Generate Network ChowNetwork<-generate_network(ResultsCoNI = CoNIResults_Chow, colorVertexTable = MetaboliteAnnotation, outputDir = "./", outputFileName = "Chow") ``` One can add "Class" information to the vertexes of the network, providing a data frame to the argument "Class". The first column of the data frame corresponds to the vertexes names and another column to class. This extra information is necessary for comparing treatments, where features are summarized based on the class they belong to. ```{r network_chow_class} #Generate Network Chow ChowNetworkWithClass<-generate_network(ResultsCoNI = CoNIResults_Chow, colorVertexTable = MetaboliteAnnotation, outputDir = "./", outputFileName = "Chow", Class = MetaboliteAnnotation) ``` ```{r network_hfd_class} #Generate Network HFD HFDNetworkWithClass<-generate_network(ResultsCoNI = CoNIResults_HFD, colorVertexTable = MetaboliteAnnotation, outputDir = "./", outputFileName = "HFD", Class = MetaboliteAnnotation) ``` The networks are saved automatically (graphml format) and can be uploaded to Cytoscape for further exploration and visualization. ### Basic network statistics We can obtain some basic network statistics with the function NetStats. ```{r network_stats_chow} library(knitr) library(kableExtra) kable(NetStats(Network = ChowNetworkWithClass),caption="Network statistics Chow") %>% kable_styling(full_width = FALSE) ``` ```{r network_stats_hfd} kable(NetStats(Network = HFDNetworkWithClass),caption="Network statistics HFD") %>% kable_styling(full_width = FALSE) ``` ### Clustering options in igraph One can further make use of the networks by applying different clustering techniques available in the igraph package (Csardi & Nepusz 2006). NOTE:For this vignette, as is small size, the networks are not displayed. Change output in the YAML metadata to html default to get a better visualization when knitting the Rmd file. Or run the code in RStudio. ```{r} library(igraph) coordinates = layout_with_fr(ChowNetworkWithClass) #define layout ``` ###### Community structure detecting based on the leading eigenvector (Spectral community) ```{r spectral} Spectral = cluster_leading_eigen(ChowNetworkWithClass) #See membership for the nodes Spectral$membership ``` ```{r spectral_image,fig.show='hide', out.width = '100%', out.height = '100%', fig.width = 7, fig.heigt = 7} #Plot the network plot(ChowNetworkWithClass, vertex.color=membership(Spectral), layout=coordinates) ``` ###### Community structure via greedy optimization of modularity ```{r greedy} greedy = cluster_fast_greedy(ChowNetworkWithClass) #See membership for the nodes greedy$membership ``` ```{r greedy_image,fig.show='hide',out.width = '100%', out.height = '100%', fig.width = 7, fig.heigt = 7} #Plot the network plot(ChowNetworkWithClass, vertex.color=membership(greedy), layout=coordinates) ``` ###### Community structure via greedy optimization of modularity ```{r betweenness} betweenness = cluster_edge_betweenness(ChowNetworkWithClass,weights=NULL) #See membership for the nodes betweenness$membership ``` ```{r betweenness_image,fig.show='hide',out.width = '100%', out.height = '100%', fig.width = 7, fig.heigt = 7} #Plot the network plot(ChowNetworkWithClass, vertex.color=membership(betweenness), layout=coordinates) ``` See igraph package for more options. ### Local controlling genes (local controlling edge features) Local controlling genes (LCGs); are genes that appear in a network region more often than expected by chance and have a potential regulatory role. To test for an LCG, CoNI counts for every node the number of times a particular gene appears in the edges located within a two-step distance. It then applies a binomial test with a probability of 1/Dnet, where Dnet is the number of genes in the network. ```{r local_controlling_genes_Chow} #Chow #Get results binomial test LCGenes_ResultsBinomialTable_Chow<- find_localControllingFeatures(ResultsCoNI = CoNIResults_Chow,network = ChowNetworkWithClass) #Get list local controlling genes LCGenes_Chow<-as.character(unique(LCGenes_ResultsBinomialTable_Chow$edgeFeatures)) ``` ```{r local_controlling_genes_HFD} #HFD #Get results binomial test LCGenes_ResultsBinomialTable_HFD<- find_localControllingFeatures(ResultsCoNI = CoNIResults_HFD,network = HFDNetworkWithClass) #Get list local controlling genes LCGenes_HFD<-as.character(unique(LCGenes_ResultsBinomialTable_HFD$edgeFeatures)) ``` The user can obtain a table of the local controlling genes matching the paired metabolites and unique metabolites. ```{r local_controlling_tables} #Chow #Generate table local controlling genes TableLCFsChow<-tableLCFs_VFs(CoNIResults = CoNIResults_Chow,LCFs = LCGenes_Chow) #Show first two rows TableLCFChowk<-kable(TableLCFsChow[1:2,],caption="Table Local Controlling Genes") %>% kable_styling(full_width = FALSE) #HFD #Generate table local controlling genes TableLCFsHFD<-tableLCFs_VFs(CoNIResults = CoNIResults_HFD,LCFs = LCGenes_HFD) #Show first two rows TableLCFHFDk<-kable(TableLCFsHFD[1:2,],caption="Table Local Controlling Genes") %>% kable_styling(full_width = FALSE) TableLCFHFDk ``` ### Genes by confounding magnitude In addition to the LCGs, the user can obtain critical genes by ranking the network genes based on the difference of the correlation and partial correlation coefficients. ```{r gene_Magnitude,warning=FALSE,message=FALSE} Top10Chow<-top_n_LF_byMagnitude(CoNIResults_Chow,topn = 10) Top10HFD<-top_n_LF_byMagnitude(CoNIResults_HFD,topn = 10) head(Top10HFD[,c(1:3,ncol(Top10HFD))]) ``` In the results of HFD, we can see that the largest difference between coefficients is for the triplet "PC.ae.C42.2", "PC.ae.C42.0" and "Lpin2". One can visualize the effect of "Lpin2" on the metabolites by fitting two linear models on standardize data and plotting the results. One model we use the metabolite expression and in the other the residuals from the linear models between the gene and each of the metabolites. In one linear model, the slope is the correlation coefficient between metabolites, and in the other, the partial correlation coefficient. ```{r CorvsPcor_oneCombination,fig.show='hold',warning=FALSE,message=FALSE,out.width="50%",fig.align='center'} plotPcorvsCor(ResultsCoNI = CoNIResults_HFD,edgeFeature = "Lpin2", vertexFeatures = c("PC.ae.C42.2","PC.ae.C42.0"), vertexD = HFD_metabo,edgeD = HFD_gene, label_edgeFeature = "Gene",plot_to_screen = TRUE, outputDir = "./") ``` The user can also explore all the metabolite pairs that form triplets with the gene in question. ```{r CorvsPcor_AllCombinations,fig.show='hold',warning=FALSE,message=FALSE,out.width="40%",fig.show='hide'} plotPcorvsCor(ResultsCoNI = CoNIResults_HFD,edgeFeature = "Lpin2", vertexD = HFD_metabo,edgeD = HFD_gene, label_edgeFeature = "Gene",plot_to_screen = TRUE, outputDir = "./") ``` ### Bipartite graphs An alternative network representation is a bipartite graph. In this type of graph, there are two types of nodes. For the example in this vignette, one node type are genes (linker-features, inside the edges in the previous network) and the other metabolites (vertex_features, nodes in the previous network) ```{r bipartite_graphs} #Chow ChowBipartiteGraph<-createBipartiteGraph("./TableForNetwork_Chow.csv",MetaboliteAnnotation) #Save bipartite graph write.graph(ChowBipartiteGraph,file="./Chow_bipartite.graphml",format="graphml") #HFD HFDBipartiteGraph<-createBipartiteGraph("./TableForNetwork_HFD.csv",MetaboliteAnnotation) #Save bipartite graph write.graph(HFDBipartiteGraph,file="./HFD_bipartite.graphml",format="graphml") ``` The resulting bipartite graph can be uploaded in Cytoscape for further visualization and exploration. It is also possible to obtain a hypergraph incidence matrix instead of a bipartite graph ```{r} Chow_HypergraphIncidenceM<-createBipartiteGraph("./TableForNetwork_Chow.csv",MetaboliteAnnotation,incidenceMatrix = TRUE) ``` ## Comparison between treatments ### Triplet comparison Exact triplet matching (metabolite pair - gene) can be done as follows: ```{r triplet_comparison} Compare_Triplets(Treat1_path = "./TableForNetwork_Chow.csv", Treat2_path = "./TableForNetwork_HFD.csv", OutputName = "Shared_triplets_HFDvsChow.csv") ``` No triplets were shared between treatments, reflecting the strong metabolic differences between HFD and Chow. ### Compare metabolite-pair classes Some genes are shared between the resulting CoNI networks of Chow and HFD. These genes do not share the same metabolite pairs but they might share similar metabolite classes. ```{r} (LCP_sharedGene_HFDvsChow<-Compare_VertexClasses_sharedEdgeFeatures( Treat1_path = "./TableForNetwork_HFD.csv", Treat2_path = "./TableForNetwork_Chow.csv", OutputName = "Comparison_LipidClassesPerGene_HFDvsChow.csv", Treat1Name = "HFD", Treat2Name = "Chow")) ``` The results show that the shared genes also show a shift in the metabolite classes they putatively interact with. One can create a barplot of one of the shared genes to compare the metabolite-pair profile between treatments. ```{r gene_metabolitePairProfile,fig.align='center',fig.align='center',fig.width = 6, fig.height = 5} create_edgeFBarplot(CompTreatTable = LCP_sharedGene_HFDvsChow, edgeF = "Gm4553", treat1 = "HFD", treat2 = "Chow", factorOrder = c("HFD","Chow"), col1="#E76BF3", col2 = "#F8766D",EdgeFeatureType = "Gene") ``` If the user wants to explore the global metabolite-pair profile of the shared genes, it can use the function create_GlobalBarplot as follows. ```{r global_metabolitePairProfile,fig.align='center',fig.width = 6, fig.height = 5} (HFDvsChow_GlobalLipidProfile<-create_GlobalBarplot(CompTreatTable = LCP_sharedGene_HFDvsChow, treat1 = "HFD", treat2 = "Chow", factorOrder = c("HFD","Chow"), col1="#E76BF3", col2 = "#F8766D", maxpairs = 1, szggrepel = 6, szaxisTxt = 15, szaxisTitle = 15, xlb = "Metabolite-pair classes")) ``` The user can also explore the global metabolite-pair profile of the shared genes per treatment but showing how every gene contributes to the profile. ```{r stacked_metabolitePairProfile,fig.align='center',fig.show='hold',fig.width = 6, fig.height = 5} create_stackedGlobalBarplot_perTreatment(CompTreatTable = LCP_sharedGene_HFDvsChow, treat = "HFD", max_pairsLegend = 1, xlb = "Metabolite-class-pairs", szTitle = 20, szggrepel = 6, szaxisTxt = 15, szaxisTitle = 15) create_stackedGlobalBarplot_perTreatment(CompTreatTable = LCP_sharedGene_HFDvsChow, treat = "Chow", max_pairsLegend = 1, ylim = 3, xlb = "Metabolite-pair classes", szTitle = 20, szggrepel = 4, szaxisTxt = 15, szaxisTitle = 15) ``` It is possible to obtain the previous plots as a single figure and with the same y-axis range. ```{r stacked_metabolitePairProfile_grid,fig.align='center',fig.width = 7, fig.height = 5} HFDvsChow_StackedLipidProfile<-getstackedGlobalBarplot_and_Grid( CompTreatTable = LCP_sharedGene_HFDvsChow, xlb = "Metabolite-pair classes", Treat1 = "HFD", Treat2 = "Chow", szTitle = 20, szggrepel = 6, szaxisTxt = 15, szaxisTitle = 15) plot(HFDvsChow_StackedLipidProfile) ``` ### Compare metabolite classes per gene The user can explore the results from the perspective of the genes and counting the metabolites per class. ```{r metabolite_classes_per_gene,fig.align='center',fig.width = 7, fig.height = 5} HFD_vs_Chow_LCP_Gene<-getVertexsPerEdgeFeature_and_Grid(LCP_sharedGene_HFDvsChow,"HFD","Chow", Annotation=MetaboliteAnnotation, ggrep=FALSE, small = TRUE, szTitle = 20, szaxisTxt = 15, szaxisTitle = 15) plot(HFD_vs_Chow_LCP_Gene) ```