---
title: "MEGENA_pipeline"
author: "Won-Min Song"
date: "January 04, 2017"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{MEGENA pipeline overview}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## MEGENA pipeline as of 10/25/2016 

This is a routine MEGENA pipeline description encompassing from data correlation analysis to module plotting, and is based on version 1.3.6 <https://CRAN.R-project.org/package=MEGENA>.Please cite the paper below when MEGENA is applied as part of your analysis: 

Song W.-M., Zhang B. (2015) Multiscale Embedded Gene Co-expression Network Analysis. PLoS Comput Biol 11(11): e1004574. doi: 10.1371/journal.pcbi.1004574.

For statistical mechanics aspects involved in MEGENA, please check: 

Song W.-M., Di Matteo T.and Aste T., Building Complex Networks with Platonic Solids, Physical Reivew E, 2012 Apr;85(4 Pt 2):046115.

# calculate correlation

```{r correlation,message=FALSE}
rm(list = ls()) # rm R working space

library(MEGENA)

# input parameters
n.cores <- 2; # number of cores/threads to call for PCP
doPar <-TRUE; # do we want to parallelize?
method = "pearson" # method for correlation. either pearson or spearman. 
FDR.cutoff = 0.05 # FDR threshold to define significant correlations upon shuffling samples. 
module.pval = 0.05 # module significance p-value. Recommended is 0.05. 
hub.pval = 0.05 # connectivity significance p-value based random tetrahedral networks
cor.perm = 10; # number of permutations for calculating FDRs for all correlation pairs. 
hub.perm = 100; # number of permutations for calculating connectivity significance p-value. 

# annotation to be done on the downstream
annot.table=NULL
id.col = 1
symbol.col= 2
###########

data(Sample_Expression) # load toy example data

ijw <- calculate.correlation(datExpr,doPerm = cor.perm,output.corTable = FALSE,output.permFDR = FALSE)
```

# calculate PFN
In this step, Planar Filtered Network (PFN) is calculated by taking significant correlation pairs, ijw. In the case of utilizing a different similarity measure, one can independently format the results into 3-column data frame with column names c("row","col","weight"), and make sure the weight column ranges within 0 to 1. Using this as an input to calculate.PFN() will work just as fine. 

```{r PFN}
#### register multiple cores if needed: note that set.parallel.backend() is deprecated. 
run.par = doPar & (getDoParWorkers() == 1) 
if (run.par)
{
  cl <- parallel::makeCluster(n.cores)
  registerDoParallel(cl)
  # check how many workers are there
  cat(paste("number of cores to use:",getDoParWorkers(),"\n",sep = ""))
}

##### calculate PFN
el <- calculate.PFN(ijw[,1:3],doPar = doPar,num.cores = n.cores,keep.track = FALSE)
g <- graph.data.frame(el,directed = FALSE)
```

# perform clustering
MCA clustering is performed to identify multiscale clustering analysis. "MEGENA.output"" is the core output to be used in the down-stream analyses for summarization and plotting.

```{r MCA,results="hide",warning=FALSE}

##### perform MCA clustering.
MEGENA.output <- do.MEGENA(g,
 mod.pval = module.pval,hub.pval = hub.pval,remove.unsig = TRUE,
 min.size = 10,max.size = vcount(g)/2,
 doPar = doPar,num.cores = n.cores,n.perm = hub.perm,
 save.output = FALSE)

###### unregister cores as these are not needed anymore.
if (getDoParWorkers() > 1)
{
  env <- foreach:::.foreachGlobals
  rm(list=ls(name=env), pos=env)
}


```

# summarize results
```{r summarize}
summary.output <- MEGENA.ModuleSummary(MEGENA.output,
	mod.pvalue = module.pval,hub.pvalue = hub.pval,
	min.size = 10,max.size = vcount(g)/2,
	annot.table = annot.table,id.col = id.col,symbol.col = symbol.col,
	output.sig = TRUE)

if (!is.null(annot.table))
{
  # update annotation to map to gene symbols
  V(g)$name <- paste(annot.table[[symbol.col]][match(V(g)$name,annot.table[[id.col]])],V(g)$name,sep = "|")
  summary.output <- output[c("mapped.modules","module.table")]
  names(summary.output)[1] <- "modules"
}

print(head(summary.output$modules,2))
print(summary.output$module.table)
```

# Plot some modules

You can generate refined module network plots: 

```{r modulePlot}
pnet.obj <- plot_module(output.summary = summary.output,PFN = g,subset.module = "c1_3",
	layout = "kamada.kawai",label.hubs.only = TRUE,
	gene.set = NULL,color.code =  "grey",
	output.plot = FALSE,out.dir = "modulePlot",col.names = c("magenta","green","cyan"),label.scaleFactor = 20,
	hubLabel.col = "black",hubLabel.sizeProp = 1,show.topn.hubs = Inf,show.legend = TRUE)

#X11();
print(pnet.obj[[1]])
```

# plot module hierarchy

```{r module hierarchy}
module.table <- summary.output$module.table
colnames(module.table)[1] <- "id" # first column of module table must be labelled as "id".

hierarchy.obj <- plot_module_hierarchy(module.table = module.table,label.scaleFactor = 0.15,
                                    arrow.size = 0.03,node.label.color = "blue")
#X11();
print(hierarchy.obj[[1]])
```