--- title: "Combining partial covariance matrices using CovCombR package" author: "Deniz Akdemir, Mohamed Somo, Julio Isidro Sanchez" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Combining partial genomic relationship matrices using CovCombR package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo=FALSE} knitr::opts_chunk$set(fig.width=7, fig.height=6) ``` The package is focused on combining partially overlapping relationship / covariance matrices to obtain a combined relationship / covariance matrix. At the moment the only function available to the users is \code{CovComb} the usage of which will be illustrated with an example from phenomics. ### Illustration: Combining data from independent phenotypic experimets. Lets load the package and datasets. ```{r, cache=TRUE} library(CovCombR) data("BarleyPheno") ``` The datasets are very heterogenous, each trial only involves a few traits. We can see this from the following image. Mean number of traits per trait is about 6. ```{r, cache=TRUE} library(plyr) dataall<-rbind.fill(BarleyPheno) image(as.matrix(dataall[,-c(1,2)]), xlab="trials x genotypes", ylab="traits", axes=FALSE) ``` To use the \code{CovComb} function, we first calculate the covariance matrices from all the trials. Then we turn these into positive definite correlation matrices to likelihood convergence failures due to differences in the scaling of variables. The lis to covariance / correlation matrices become the input to the function \code{CovComb}. ```{r, cache=TRUE} covlist<-lapply(BarleyPheno, function(x){cov(x,use="pairwise.complete.obs")}) covlist<-lapply(covlist,function(x){cov2cor(as.matrix(Matrix::nearPD(x)$mat))}) mean(c(unlist(lapply(covlist,function(x){nrow(x)})))) BigK<-CovComb(Klist=covlist, maxiter=1000, loglik = TRUE, plotll = TRUE) ``` Combined covariance matrix has 67 traits. Some of the relationships in this estimated relationship matrix have never been observed but wer infered by the Wishart EM algorithm. ```{r} dim(BigK[[1]]) ``` We can visualise the estimated phenomic covariance matrix. ```{r} heatmap(as.matrix(BigK[[1]]), cexRow = .2,cexCol = .2) ``` ### Adding sparsity to the estimated covariance Once the covariance matrix is obtained sparsity can be introduced using many of the modern sparse covariance estimation methods, for example, the packages \code{spcov}, \code{ggb}, \code{qgraph}, etc,..., can be used to fit sparse covariance estimators. ```{r, cache=TRUE} Graph_lasso <- qgraph::qgraph(BigK[[1]], graph = "cor", directed=FALSE,details = FALSE,esize = 10,sampleSize=3000, layout="spring",nodeNames = rownames(BigK[[1]]),threshold = "hochberg", legend.cex=.15) ``` ## The Wishart EM-Algorithm In this section, we briefly describe the estimation algorithm. Let $A=\left\{a_1, a_2, \ldots, a_m \right\}$ be the set of not necessarily disjoint subsets of genotypes covering a set of $K$ (i.e., $K= \cup_{i=1}^m a_i$) with total $n$ genotypes. Let $G_{a_1}, G_{a_2},\ldots, G_{a_m}$ be the corresponding sample of covariance matrices. Starting from an initial estimate $\Sigma^{(0)}=\nu\Psi^{(0)},$ the Wishart EM-Algorithm repeats updating the estimate of the covariance matrix until convergence: \begin{equation}\label{eq:covar1} \begin{split} \Psi^{(t+1)} & =\frac{1}{\nu m}\sum_{a\in A}P_a\left[ \begin{matrix} G_{aa} & G_{aa}(B^{(t)}_{b|a})' \\ B^{(t)}_{b|a}G_{aa} & \nu \Psi^{(t)}_{bb|a}+ B^{(t)}_{b|a}G_{aa}(B^{(t)}_{b|a})' \end{matrix}\right]P'_a \end{split} \end{equation} where $B^{(t)}_{b|a}=\Psi^{(t)}_{ab}(\Psi^{(t)}_{aa})^{-1},$ $\Psi^{(t)}_{bb|a}=\Psi^{(t)}_{bb}-\Psi^{(t)}_{ab}(\Psi^{(t)}_{aa})^{-1}\Psi^{(t)}_{ba},$ $a$ is the set of genotypes in the given partial covariance matrix and $b$ is the set difference of $K$ and $a.$ The matrices $P_a$ are permutation matrices that put each matrix in the sum in the same order. The initial value, $\Sigma^{(0)}$ is usually assumed to be an identity matrix of dimesion $n.$ The estimate $\Psi^{(T)}$ at the last iteration converts to the estimated covariance with $\Sigma^{(T)}=\nu\Psi^{(T)}.$ A weighted version of this algorithm can be obtained replacing $G_{aa}$ in Equation~\ref{eq:covar1} with $G^{(w_a)}_{aa}=w_aG_{aa}+(1-w_a)\nu\Psi^{(T)}$ for a vector of weights $(w_1,w_2,\ldots, w_m)'.$ ## References Adventures in Multi-Omics I: Combining heterogeneous data sets via relationships matrices Deniz Akdemir, Julio Isidro Sanchez, November 2019; . qgraph: Network Visualizations of Relationships in Psychometric Data. Sacha Epskamp, Angelique O. J. Cramer, Lourens J. Waldorp, Verena D. Schmittmann, Denny Borsboom (2012). Journal of Statistical Software, 48(4), 1-18. .