--- title: "Using with spatialTIME" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using with spatialTIME} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Background More than likely, you are going to want to use another package to analyze the simulated data produced by `{scSpatialSIM}`. Here, we will show how the output from `{scSpatialSIM}` works seemlessly with the spatial analysis package, `{spatialTIME}` ### Simulating Data First thing we need to do is simulate data. The `vignette("Introduction)` showed us how we can do this in a univariate way where only a single cells positivity was simulated and bivaraiate way where we get positivity for multiple cell types. We can use this data with `{spatialTIME}` to derive spatial statistics like Ripley's *K*. See the spatialTIME package for more information about everything the package can do. ```{r} library(scSpatialSIM) ``` In this vignette, we will stick with a single cell type and look at univariate clustering with and without holes to display the caution that should be taken when using the observed Ripley's *K* values. ```{r, creating_spatial_object} #set seed set.seed(333) #create the new object sim_object = CreateSimulationObject(sims = 5, cell_types = 1) %>% #produce the point pattern GenerateSpatialPattern() #make tissues sim_object = GenerateTissue(sim_object, density_heatmap = F) %>% #create holes GenerateHoles(hole_prob = c(0.3, 0.5), density_heatmap = F) %>% #Create positive/negative cells GenerateCellPositivity(probs = c(0, 0.9)) ``` With the simulation object filled in with a single cell type, it can be split into a spatial list as well as a summary using `CreateSpatialList()` and `SummariseSpatial()`. ```{r sim_object_export} #creating the spatial list spatial_list = CreateSpatialList(sim_object, single_df = F) #summarise the spatial list summary_df = SummariseSpatial(spatial_list, markers = "Cell 1 Assignment") head(summary_df) ``` ### Using with `spatialTIME` ```{r} library(spatialTIME) ``` The `{spatialTIME}` package requires the input of 3 things. 1) spatial data frames in list format, 2) a summary of those spatial data frames, and 3) a clinical data frames The summary data frame acts as a linker between the spatial data frame IDs and the entries in the clinical data frame. Our spatial data frames need 1 more column: their name. This is pretty easy to do. ```{r adding_spatial_df_name} #loop over all spatial data frames and add their names sf_names = names(spatial_list) spatial_list = lapply(setNames(sf_names, sf_names), function(nam){ spatial_list[[nam]] %>% dplyr::mutate(`Sample Tag` = nam, .before = 1) }) ``` Our summary data frame already contains the names of the spatial data frames but needs a patient ID added. ```{r} summary_df$`Patient ID` = 1:5 ``` The number 3 data we needed was the clinical data frame. For the sake of this example we will just make a single columned data frame with 1 through 5 to match the "Patient ID" in the summary data. ```{r} clinical = data.frame(`Patient ID` = 1:5, check.names = F) ``` These column names will be used when creating the `mIF` object along with the 3 data objects. ```{r} mif = create_mif(clinical_data = clinical, sample_data = summary_df, spatial_list = spatial_list, patient_id = "Patient ID", sample_id = "Sample Tag") mif ``` Since our window is from 0 to 10 (10 units), we should keep our search radius max less than 5. Lets do Univariate Ripley's K from 0 to 3 units and visualize the results. ```{r} mif = ripleys_k(mif = mif, mnames = "Cell 1 Assignment", r_range = seq(0, 3, 0.1), permute = FALSE, edge_correction = "translation", workers = 1, xloc = "x", yloc = "y") ``` ```{r} library(ggplot2) mif$derived$univariate_Count %>% ggplot() + geom_line(aes(x = r, y = `Degree of Clustering Exact`, color = `Sample Tag`)) + labs(title = "Univariate Clustering - Simulation") ``` Up until now, we have been using the all simulated cells. We can filter the spatial_list down to create another mIF object that doesn't have the cells belonging to the class of "hole" (`Hole Assignment` == "Keep"). For now, sub-setting to only Tissue 1 or Tissue 2 will be left alone. Let's create this mIF and run Ripley's K on it. ```{r} mif_holes = create_mif(clinical_data = clinical, sample_data = summary_df, spatial_list = lapply(spatial_list, function(spat){ spat %>% dplyr::filter(`Hole Assignment` == "Keep") }), patient_id = "Patient ID", sample_id = "Sample Tag") mif_holes = ripleys_k(mif = mif_holes, mnames = "Cell 1 Assignment", r_range = seq(0, 3, 0.1), permute = FALSE, edge_correction = "translation", workers = 1, xloc = "x", yloc = "y") ``` To see the importance of using the exact CSR approach (or permutation) when there are holes on samples, first look at the spatial plots to see how much the holes are impacting the distribution of the points. ```{r} mif_holes$spatial %>% do.call(dplyr::bind_rows, .) %>% ggplot() + geom_point(aes(x = x, y = y, color = as.factor(`Cell 1 Assignment`))) + facet_wrap(~`Sample Tag`) ``` The regions with holes are rather small so wouldn't ```{r} dat = do.call(dplyr::bind_rows, mif_holes$derived$univariate_Count) dat %>% dplyr::mutate(`Exact-Theo` = `Exact CSR` - `Theoretical CSR`) %>% ggplot() + geom_density(aes(x = `Exact-Theo`, fill = `Sample Tag`), alpha = 0.2, adjust = 0.2) ``` This clearly shows that the exact *typically* has larger values than those of the theoretical (positive values). These can be visualized by calculating the frequency of *r*s that have an exact CSR measurement greater than the theoretical CSR estimate, indicating the the theoretical is underestimating the CSR of the cores. ```{r} dat %>% dplyr::mutate(`Exact-Theo` = `Exact CSR` - `Theoretical CSR`) %>% dplyr::group_by(`Sample Tag`) %>% dplyr::mutate(prop = ifelse(`Exact-Theo` > 0, 1/dplyr::n(), 0)) %>% dplyr::select(`Sample Tag`, r, `Exact-Theo`, prop) %>% dplyr::summarise(`Total Fraction` = sum(prop)) ```