In this vignette, we consider approximating a data matrix as a product of multiple low-rank matrices (a.k.a., factor matrices).
Test data is available from toyModel
.
library("iTensor")
<- iTensor::toyModel("ICA_Type1")
data1 <- iTensor::toyModel("ICA_Type2")
data2 <- iTensor::toyModel("ICA_Type3")
data3 <- iTensor::toyModel("ICA_Type4")
data4 <- iTensor::toyModel("ICA_Type5")
data5 dim(data1$X_observed)
## [1] 500 3
dim(data2$X_observed)
## [1] 500 3
dim(data3$X_observed)
## [1] 500 3
dim(data4$X_observed)
## [1] 480 3
dim(data5$gene) # N < P systems
## [1] 64 3116
Summary of these data is as follows:
You will see that the stuctures of these data as follows.
pairs(data1$X_observed, main="data1")
pairs(data2$X_observed, main="data2")
pairs(data3$X_observed, main="data3")
pairs(data4$X_observed, main="data4")
dim(data5$gene)
## [1] 64 3116
To perform and visualize all the ICA at once, here we write some functions as follows.
<- function(X, J){
allICA # Classic ICA Methods
<- ICA(X=X, J=J, algorithm="FastICA")
out.FastICA <- ICA(X=X, J=J, algorithm="InfoMax")
out.InfoMax <- ICA(X=X, J=J, algorithm="ExtInfoMax")
out.ExtInfoMax # Modern ICA Method
<- ICA2(X=X, J=J, algorithm="JADE")
out.JADE # Output
list(out.FastICA=out.FastICA, out.InfoMax=out.InfoMax,
out.ExtInfoMax=out.ExtInfoMax, out.JADE=out.JADE)
}
<- function(S, out){
CorrIndexAllICA <- lapply(out, function(x, S){CorrIndex(cor(S, Re(x$S)))}, S=S)
tmp <- gsub("out.", "", names(tmp))
Name <- round(unlist(tmp), 3)
Value names(Value) <- NULL
cbind(Name, Value)
<- as.data.frame(cbind(Name, Value))
tmp colnames(tmp) <- c("Method", "CorrIndex")
::kable(tmp)
knitr }
Note that we select only four ICA algorithms for the demonstration
but in iTensor
12 ICA algorithms are available. For the
details, check ?ICA
and ?ICA2
.
In this data, according to the independence between estimated source signal, an upright square means that the calculation is successful and other cases such as a rhombus rotated 45 degrees from a square means that the calculation is fail.
set.seed(123456)
<- allICA(X=data1$X_observed, J=3) out1
Here we use CorrIndex, which is a performance index to evaluate ICA results (Sobhani 2022). CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well (the closer to 0, the better the performance).
CorrIndexAllICA(data1$S_true, out1)
Method | CorrIndex |
---|---|
FastICA | 0.263 |
InfoMax | 7.439 |
ExtInfoMax | 0.263 |
JADE | 0.221 |
We can see the details of source signals as follows.
pairs(out1$out.FastICA$S, main="FastICA")
pairs(out1$out.InfoMax$S, main="InfoMax")
pairs(out1$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out1$out.JADE$S), main="JADE")
In this data, if the outliers are distributed along the lines of x = 0 and y = 0, the calculation is successful and if they are of the cross-shape, it is a failure.
set.seed(123456)
<- allICA(X=data2$X_observed, J=3) out2
CorrIndex imply that all the algorithms performed well.
CorrIndexAllICA(data2$S_true, out2)
Method | CorrIndex |
---|---|
FastICA | 0.268 |
InfoMax | 0.322 |
ExtInfoMax | 0.268 |
JADE | 0.374 |
We can see the details of source signals as follows.
pairs(out2$out.FastICA$S, main="FastICA")
pairs(out2$out.InfoMax$S, main="InfoMax")
pairs(out2$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out2$out.JADE$S), main="JADE")
In this data, the uniform distribution should be extracted in such a way that it is not rhombic, and furthermore the normal distribution and the t-distribution with 1 degree of freedom should be extracted independently. Sometimes, it is difficult ot determine visually though.
set.seed(123456)
<- allICA(X=data3$X_observed, J=3) out3
CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well.
CorrIndexAllICA(data3$S_true, out3)
Method | CorrIndex |
---|---|
FastICA | 0.23 |
InfoMax | 3.799 |
ExtInfoMax | 0.213 |
JADE | 0.408 |
We can see the details of source signals as follows.
pairs(out3$out.FastICA$S, main="FastICA")
pairs(out3$out.InfoMax$S, main="InfoMax")
pairs(out3$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out3$out.JADE$S), main="JADE")
In this data, if the mesh pattern is reproduced, the calculation is successful.
set.seed(123456)
<- allICA(X=data4$X_observed, J=3) out4
CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well.
CorrIndexAllICA(data4$S_true, out4)
Method | CorrIndex |
---|---|
FastICA | 0.081 |
InfoMax | 8.026 |
ExtInfoMax | 0.081 |
JADE | 0.08 |
We can see the details of source signals as follows.
pairs(out4$out.FastICA$S, main="FastICA")
pairs(out4$out.InfoMax$S, main="InfoMax")
pairs(out4$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out4$out.JADE$S), main="JADE")
This kind of data is an thin and long matrix, which is a very common
data structure in omics data. In iTensor
, we re-implemented
the IPCA function of mixOmics
package.
library("mixOmics")
## Loading required package: MASS
## Loading required package: lattice
## Loading required package: ggplot2
##
## Loaded mixOmics 6.24.0
## Thank you for using mixOmics!
## Tutorials: http://mixomics.org
## Bookdown vignette: https://mixomicsteam.github.io/Bookdown
## Questions, issues: Follow the prompts at http://mixomics.org/contact-us
## Cite us: citation('mixOmics')
set.seed(123456)
# mixOmics's IPCA
<- ipca(data5$gene, ncomp=3, mode="deflation")
res.ipca
# iTensor's IPCA
<- ICA2(X=as.matrix(data5$gene), J=3, algorithm="IPCA") out5
We can see the details of source signals as follows. In this data, we
can confirm that the IPCA of mixOmics
and
iTensor
have the same results, except for the order and
constant-fold relationships.
pairs(res.ipca$x, main="IPCA (mixOmics)", col=data5$treatment[,"Treatment.Group"], pch=16)
pairs(out5$S, main="IPCA (iTensor)", col=data5$treatment[,"Treatment.Group"], pch=16)
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux bookworm/sid
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so; LAPACK version 3.11.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] mixOmics_6.24.0 ggplot2_3.4.2 lattice_0.21-8 MASS_7.3-59
## [5] iTensor_1.0.2
##
## loaded via a namespace (and not attached):
## [1] tidyr_1.3.0 sass_0.4.5 utf8_1.2.3
## [4] generics_0.1.3 stringi_1.7.12 digest_0.6.31
## [7] magrittr_2.0.3 evaluate_0.20 grid_4.3.0
## [10] RColorBrewer_1.1-3 fastmap_1.1.1 plyr_1.8.8
## [13] jsonlite_1.8.4 Matrix_1.5-4 ggrepel_0.9.3
## [16] RSpectra_0.16-1 gridExtra_2.3 mgcv_1.8-42
## [19] purrr_1.0.1 fansi_1.0.4 scales_1.2.1
## [22] codetools_0.2-19 jquerylib_0.1.4 cli_3.6.1
## [25] rlang_1.1.0 splines_4.3.0 munsell_0.5.0
## [28] withr_2.5.0 cachem_1.0.7 yaml_2.3.7
## [31] ellipse_0.4.5 tools_4.3.0 parallel_4.3.0
## [34] reshape2_1.4.4 BiocParallel_1.34.0 einsum_0.1.0
## [37] dplyr_1.1.2 colorspace_2.1-0 corpcor_1.6.10
## [40] groupICA_0.1.1 vctrs_0.6.2 R6_2.5.1
## [43] matrixStats_0.63.0 lifecycle_1.0.3 stringr_1.5.0
## [46] pkgconfig_2.0.3 rTensor_1.4.8 geigen_2.3
## [49] pillar_1.9.0 bslib_0.4.2 gtable_0.3.3
## [52] jointDiag_0.4 glue_1.6.2 rARPACK_0.11-0
## [55] Rcpp_1.0.10 highr_0.10 xfun_0.39
## [58] tibble_3.2.1 tidyselect_1.2.0 knitr_1.42
## [61] nlme_3.1-162 htmltools_0.5.5 igraph_1.4.2
## [64] rmarkdown_2.21 compiler_4.3.0