This repository contains an R package for generating synthetic alpha shapes by either (i) empirical sampling based on an existing dataset with reference shapes, or (ii) probabilistic sampling from a known distribution function on shapes.
Understanding morphological variation is an important task in many applications. Recent studies in computational biology have focused on developing computational tools for the task of sub-image selection which aims at identifying structural features that best describe the variation between classes of shapes. A major part in assessing the utility of these approaches is to demonstrate their performance on both simulated and real datasets. However, when creating a model for shape statistics, real data can be difficult to access and the sample sizes for these data are often small due to them being expensive to collect. Meanwhile, the landscape of current shape simulation methods has been mostly limited to approaches that use black-box inference—making it difficult to systematically assess the power and calibration of sub-image models.
In this R package, we introduce the \(\alpha\)-shape sampler: a probabilistic framework for simulating realistic 2D and 3D biological shapes and images based on probability distributions which can be learned from real data or explicitly stated by the user.
The ashapesampler package supports two mechanisms for sampling shapes in two and three-dimensions, which we outline below. The first strategy empirically samples new shapes based on an existing dataset — this was highlighted in the main text of Winn-Nuñez et al. The second strategy probabalistically samples new shapes from a known distrubtion — this approach is also implementated in this software package with the corresponding theory being derived in the Supporting Information of Winn-Nuñez et al.
The \(\alpha\)-shape sampler consists of four key steps: 1. Input aligned reference shapes as simplicial complexes. A simplicial complex object in this case is a list containing (a) the Euclidean coordinates of the vertices and (b) a list of all vertices, edges, faces, and tetrahedra. Functions are available to read OFF files into R in the correct format and to extract the simplical complex information from a generated alpha complex. A method to convert a binary mask to a 2D simplicial complex for use in the algorithm can be found in the vignettes. 2. Calculate the reach for each shape in the dataset. The reach is estimated based on boundary points of the simplicial complex. Users can choose the summary statistic used for the estimated reach for a reference shape to be either the mean, median, or minimum across points. Default is mean. Once we have the reach for each shape, users can take some summary statistic (usually the minimum) over a J
subset of randomly selected reference shapes to produce new shapes. 3. Sample new points, using the combined point cloud of the randomly selected J
shapes and the estimated reach tau
derived from the J
reference shapes. Parameters for rejection sampling can be adjusted by the users and are discussed further in the vignettes. Note that this step is generally the longest computationally—if the user reaches a computational bottleneck, check to the value of tau
relative to the area/volume of the combined point cloud. Parallelizing also speeds up the algorithm. 4. Output newly generated shape as an alpha shape object.
Users should note that it is critical to align shapes to maximize the pipeline’s success and that there may be some manual parameter tuning for the best results.
Demonstrations for pipeline implementation are in the vignettes. Functions are broken into parts instead of integrated altogether so that users can troubleshoot the pipeline at different stages.
Users an also use the ashapesampler package to generate shapes in two and three dimensions from probability distributions. This approach can prove particularly useful for simulating shapes and benchmarking the performance of different statistical methods. Here, we list the parameters for generating new shapes in two and three dimensions. Options for user-adjusted parameters and defaults can be found in the vignettes. Users should keep a few key points in mind when generating shapes this way: * The bound
parameter is the manifold from which points are sampled. At this time, the package only supports a square, a circle (i.e., a disk where the function assumes it is filled in), and an annulus in two dimensions. In three-dimensions, it supports a cube, sphere (i.e., a ball where the function assumes it is filled in), and torus. The size of these manifolds can be specified using the rmax
and rmin
parameters, where applicable. Adjusting the size may affect computational time if the reach tau
is not adjusted with it. * The reach tau
needs to be specified as a finite value in advance, as this hyperparameter affects the choice of alpha
. Default of tau
is 1, but it can be any finite value. Keep in mind that the smaller that tau
is relative to the area or volume of the manifold, the more detail in the shapes produced and the more time it will take to produce a new generate opbject shape. * By default, alpha
will be as large as theoretically allowed. The smaller alpha
is relative to tau
, the more points will need to be sampled and the more time it will take to produce a new generate shape. This is particularly true when the goal is to have shapes to have both full connectivity/no isolated points as well as preserve the homology. * At this time, the package only supports the truncated normal distribution for randomly selecting alpha
. Bounds of this truncated normal can be adjusted by the user up to what is theoretically allowed. Keep in mind that the general bounds of this distribution should keep alpha
as large as possible for best computational performance.
The ashapesampler software requires the installation of the following R libraries:
Unless stated otherwise, the easiest way to install many of these packages is with the following example command entered in an R shell:
install.packages("alphahull", dependecies = TRUE)
Alternatively, one can also install R packages from the command line.
The code in this repository assumes that basic C++ functions and applications are already set up on the running personal computer or cluster. If not, some of the packages (e.g., TDA and alphashape3d) needed to build alpha complexes and alpha shapes in three dimensions will not work properly. A simple option is to use gcc. macOS users may use this collection by installing the Homebrew package manager and then typing the following into the terminal:
brew install gcc
For macOS users, the Xcode Command Line Tools include a GCC compiler. Instructions on how to install Xcode may be found here. Additional installs for macOS users are automake, curl, glfw3, glew, xquartz, and qpdf. For extra tips on how to run C++ on macOS, please visit here. For tips on how to avoid errors dealing with “-lgfortran” or “-lquadmath”, please visit here.
Package will eventually appear on CRAN, at which time one can download the package there.
To install the package from GitHub, we recommend using the remotes package by running the command:
remotes::install_github('lcrawlab/ashapesampler')
To then load the package in R, use the command
library(ashapesampler)
Other common installation procedures may apply.
The vignettes
folder contains the following demonstrations for running and analyzing results in the ashapesampler:
Additional vignettes and source code can be found in the corresponding results repository.
The auto3dgm paradigm for assigning landmarks via unsupervised learning can be found here.
Primate manibular molar data and neutrophil binary masks can be accessed and downloaded here.
E.T. Winn-Nuñez, H. Witt, D. Bhaskar, R.Y. Huang, I.Y. Wong, J.S. Reichner, and L. Crawford. Generative modeling of biological shapes and images using a probabilistic \(\alpha\)-shape sampler. bioRxiv.
Please send any questions or feedback to the corresponding authors Emily Winn-Nuñez or Lorin Crawford.
We appreciate any feedback you may have with our repository and instructions.