%\VignetteIndexEntry{Synthetic Lethality: A User Guide} \documentclass[a4paper]{article} \title{Using BiSEp to nominate candidate Synthetic Lethal gene pairs} \author{Mark Wappett} \usepackage{indentfirst} \begin{document} \setkeys{Gin}{width=1\textwidth} \maketitle \section{Introduction} BiSEp (Bimodality subsetting expression) is a set of computational tools that enable the user to nominate candidate synthetic lethal (SL) gene pairs. The toolkit is based on the assumption that a clear on / off gene expression profile is indicitive of tumour loss, and is detectable as bimodality or non-normality. This vignette demonstrates how the toolkit can be used to nominate, assess and visualise candidate SL pairs nominated from gene expression and mutation datasets. \section{Importing gene expression data} Processed gene expression data from most platforms can be input. We recommend that values are all distributed above zero and are in the log2 scale. Example input data format is shown below: <<>>= require(BiSEp) data(INPUT_data) INPUT_data[1:2,1:6] @ All input data should be read in a gene by sample format. Our dataset is gene expression data from the Cancer Cell Line Encyclopedia (CCLE) [1], normalised using fRMA [2] and scaled. \section{Identifying bimodal genes in expression data} We next take the processed data matrix and run the bimodal detection tool across it. This generates a list object containing three matrices, the third of which is called DATA and is simply a capture of the input data matrix. <>= BISEP_data <- BISEP(INPUT_data) biIndex <- BISEP_data$BI bisepIndex <- BISEP_data$BISEP @ The output data frames called biIndex and bisepIndex are the output from the bimodal index function [3] and the novel BISEP function. The output is displayed below: <<>>= biIndex[1:10,] bisepIndex[1:10,] @ The biIndex matrix contains all the bimodal scoring information provided to us by the bimodal index function. This includes the delta (distance between two expression modes), pi (proportion of samples in each expression mode) and BI. When combined, these give us an optimal assessment of bimodality in expression data. The bisepIndex function provides a p-value score for non-normality (column 2) and accurately pin-points the mid-point between the two expression modes for a gene (column 1). TUSC3 scores the highest in the biIndex table - the density distribution below highlights this: \begin{center} <>= plot(density(INPUT_data["TUSC3",]), main="TUSC3 Density Distribution") @ <>= plot(density(INPUT_data["MLH1",]), main="MLH1 Density Distribution") @ \end{center} <>= plot(density(INPUT_data["MLH1",]), main="MLH1 Density Distribution") @ By comparison, MLH1 does not score high for bimodality - but has the lowest p value for non-normality. The density plot demonstrates the unbalanced nature of, and distance between the two populations in a typical non-normal distribution. \section{BIGEE: Bimodality in Gene Expression Exclusivity} Here we take the bimodal / non-normal output from the BISEP tool, and use it as input to the first of the two candidate synthetic lethal detection tools. There are four sample input options to this tool based on the sample type and sample numbers \textbf{cell line, cell line low, patient and patient low}. When sample numbers are below ~200 we recommend using the input parameters with the low suffix in order to prevent a high false positive rate. <<>>= BIGEE_out <- BIGEE(BISEP_data, sampleType="cell_line") @ The percent completion graphic displays the progress of the SL detection component of the tool. This will typically take longer the larger the dataset is, and the more bimodal genes that there are. The output from this tool is a matrix containing gene pairs that look potentially synthetic lethal in the dataset, along with a score. <<>>= BIGEE_out[1:4,] @ It is possible to visualise any candidate relationships using the expressionPlot function: \begin{center} <>= expressionPlot(BISEP_data, gene1="SMARCA4", gene2="SMARCA1") @ \end{center} and look for those gene pairings that ideally are never expressed at low levels together - the signature that we propose could be indicative of synthetic lethality. \begin{center} <>= expressionPlot(BISEP_data, gene1="MTAP", gene2="MLH1") @ \end{center} \section{BEEM: Bimodal Expression Exclusive with Mutation} Here we take the bimodal / non-normal output from the BISEP tool, and use it as input to a tool that detects mutual exclusive loss between bimodally expressed genes and mutated genes. Again, there are four sample input options to this tool based on the sample type and sample numbers \textbf{cell line, cell line low, patient and patient low}. Additionally we also require a second input matrix containing discreet mutation call information. This matrix must be in the rownames = genes, colnames = samples format and there must be overlap between sample names in this mutation matrix, and sample names in the INPUT data matrix seen earlier. The calls in this matrix must be either WT or MUT as shown below: <<>>= data(MUT_data) MUT_data[1:4,1:10] @ Now we can run the function by doing the following: <<>>= BEEMout <- BEEM(BISEP_data, mutData=MUT_data, sampleType="cell_line", minMut=40) @ As with the BIGEE tool, the percent completion graphic displays the progress of the SL detection component of the tool. The output from the tool is a matrix containing the gene pairs that look potentially synthetic lethal, along with a number of other columns of metadata including size of high and low expression population, numbers of those populations that are mutant. <<>>= BEEMout @ Gene pairs where the mutant gene2 is exclusively mutated, or significantly enriched for mutation in the high expression mode of expression gene1 are those that we propose as candidate SL pairs. It is another manifestation of the never-low-together relationship we were looking for in the expression data above. We can visualise these gene pairs using the waterfall plotting function built into the package \SweaveOpts{width=10, height=5} \begin{center} <>= waterfallPlot(BISEP_data, MUT_data, expressionGene="MICB", mutationGene="PBRM1") @ <>= waterfallPlot(BISEP_data, MUT_data, expressionGene="BOK", mutationGene="BRCA2") @ \end{center} The left panel is the density distribution of the bimodal / non-normal expression gene. The right hand panel is a bimodal-mid-point-centered barplot coloured by the mutation status of the mutation gene. \section{FURE: Functional redundancy between synthetic lethal genes} It is assumed that either gene in a synthetic lethal pair is able to functionally compensate for the loss of the other. We developed this tool to enable the user to prioritise gene pairs that have some sort of biological redundancy and score these according to gene ontology[4,5]. The tool takes as input either the output from the BIGEE or the BEEM tools. The following example is run on the first couple of results from the BIGEE output <>= fOut <- FURE(BIGEE_out[1,], inputType="BIGEE") frPairs <- fOut$funcRedundantPairs allPairs <- fOut$allPairs @ <<>>= allPairs[1,] @ \section{References} 1. Berretina J \emph{et al}. (2012) The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 283:603-607. 2. McCall MN, Bolstad BM, and Irizarry RA. (2010) Frozen Robust Multi-array Analysis. Biostatistics, 11(2):242-253. 3. Wang J \emph{et al}. (2009) The Bimodality Index: A criterion for discovering and ranking bimodal signatures from cancer gene expression profiling data. Cancer Informatics, 7:199-216. 4. Carlson, M. (2013) Org.Hs.eg.db: Genome wide annotation for human. R package version 2.8.0. 5. Guangchuang, Y \emph{et al}. (2010) An R packahe for measuring semanic similarity among GO terms and gene products. Bioinformatics, 26(7), 976-978. \end{document}