--- title: "Stability Measures" output: rmarkdown::html_vignette: toc: true description: > Evaluating robustness of cluster solutions through resampling methods. vignette: > %\VignetteIndexEntry{Stability Measures} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r echo = FALSE} options(crayon.enabled = FALSE, cli.num_colors = 0) ``` Download a copy of the vignette to follow along here: [stability_measures.Rmd](https://raw.githubusercontent.com/BRANCHlab/metasnf/main/vignettes/stability_measures.Rmd) In this vignette, we will highlight the main stability measure options in the `metasnf` package. ## Data set-up ```{r eval = FALSE} library(metasnf) my_dl <- data_list( list(cort_t, "cortical_thickness", "neuroimaging", "continuous"), list(cort_sa, "cortical_area", "neuroimaging", "continuous"), list(subc_v, "subcortical_volume", "neuroimaging", "continuous"), list(income, "household_income", "demographics", "continuous"), list(pubertal, "pubertal_status", "demographics", "continuous"), uid = "unique_id" ) set.seed(42) sc <- snf_config( my_dl, n_solutions = 4, max_k = 40 ) sol_df <- batch_snf(my_dl, sc) ``` To begin start calculating resampling-based stability measures, we'll build subsamples of the data list using the `subsample_dl` function. ```{r eval = FALSE} my_dl_subsamples <- subsample_dl( my_dl, n_subsamples = 100, subsample_fraction = 0.85 ) ``` `my_dl_subsamples` contains a list of 100 subsamples of the full data list. Each variation only has a random 85% of the original observations. ```{r eval = FALSE} batch_subsample_results <- batch_snf_subsamples( my_dl_subsamples, sc, verbose = TRUE ) ``` By default, the function returns a one-element list: `cluster_solutions`, which is itself a list of cluster solution data frames corresponding to each of the provided data list subsamples. Setting the parameters `return_sim_mats` and `return_solutions` to `TRUE` will turn the result of the function to a three-element list containing the corresponding solutions data frames and final fused similarity matrices of those cluster solutions, should you require these objects for your own stability calculations. The function `subsample_pairwise_aris` can then be used to calculate the ARIs between cluster solutions across the subsamples. ```{r eval = FALSE} pairwise_aris <- subsample_pairwise_aris( batch_subsample_results, verbose = TRUE ) ``` `pairwise_aris` is a list containing a summary data frame of the ARIs between subsamples for each row of the original settings data frame as well as another list of all the generated inter-subsample ARIs as a result of setting `return_raw_aris` to `TRUE`. The raw inter-subsample ARIs corresponding to a particualr settings data frame row can be visualized with a heatmap: ```{r eval = FALSE} inter_ss_ari_hm <- ComplexHeatmap::Heatmap( pairwise_aris$"raw_aris"$"s1", heatmap_legend_param = list( color_bar = "continuous", title = "Inter-Subsample\nARI", at = c(0, 0.5, 1) ), show_column_names = FALSE, show_row_names = FALSE ) ``` ```{r eval = FALSE, echo = FALSE} save_heatmap( inter_ss_ari_hm, "vignettes/inter_ss_ari_hm.png", width = 400, height = 300, res = 70 ) ```