---
title: "Multivariate O-statistics"
author: "Quentin D. Read, Isadora E. Fluck"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{multivariate-Ostats}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  dev = "png", 
  dev.args = list(type = "cairo-png")
)
```

```{r setup}
library(Ostats)
data(pitcher_traits)
```

# Overview

This vignette on multi-variable community-level trait overlap statistics is intended for users who are already familiar with the single-variable O-statistics we introduce in the [first package vignette](Ostats-introduction.html).

We decided to include functions in our package to estimate community-level niche overlap statistics (O-statistics) in multivariate space because of ecologists' long-standing interest in how species' niches take shape across multiple environmental dimensions. Since the niche of a species can be determined by several abiotic and biotic variables, Hutchinson (1957) introduced the mathematical formulation of a niche as an *n*-dimensional hypervolume. Since then, many approaches have been developed to measure the multivariate niche of a species (Green 1971; Swanson et al. 2015). Quantifying the niche in multiple dimensions can help understand how life strategy trade-offs and interactions among multiple traits shape organisms' structure and function.

Blonder and colleagues rekindled interest in using hypervolumes to describe species’ niches in multiple dimensions. They worked out the underlying theory (Blonder et al. 2014; Lamanna et al. 2014; Blonder 2018) and released the R package [hypervolume][hypervolume]. Their package uses stochastic geometry to estimate multivariate kernel densities and . The multivariate O-statistic estimation in the `Ostats` package relies on functions imported from [hypervolume][hypervolume].

In this vignette, we demonstrate how you can use functions in the `Ostats` package to study trait overlap in multiple dimensions.

# Example dataset

The dataset we use in this vignette includes traits collected on carnivorous pitchers of the northern pitcher plant, *Sarracenia purpurea*, at 5 sites in North America and represent a subset of data from Freedman et al. (2021). Pitcher traits measured include orthogonal rosette diameters, pitcher length, pitcher width, keel width, mouth diameter, and lip thickness. See Ellison and Gotelli (2002) for a diagram illustrating traits measured, or see the dataset documentation page for more details by typing `?pitcher_traits` in your console. Site locations and identifying information have been removed for this example.

Here we show a few randomly selected rows from the pitcher traits dataset. Here, all individuals are of the same species. Therefore we will use populations (sites) as the `sp` variable in our analysis, with a single value for `plots`. 

```{r}
set.seed(1)
idx <- sort(sample(nrow(pitcher_traits), 10, replace = FALSE))

knitr::kable(pitcher_traits[idx,])
```


# Calculating hypervolume O-statistics

Here we calculate the univariate O-statistics for each of the pitcher traits separately, then a single multivariate O-statistic representing overlap of the trait hypervolumes. 

When estimating hypervolumes, as with any other multivariate analysis, it is critical to ensure that the hypervolume axes are not too highly correlated with one another. If traits are too highly correlated, the calculated hypervolume will not accurately represent the distribution in trait space. In this case, we resolve the issue by excluding a few of the traits with high correlations. You could also choose to use a variance maximization procedure like PCA instead.

First ensure that the data are in the appropriate form for `Ostats()` and `Ostats_multivariate()`. For this example we will remove individuals with incomplete data. Using `cor()` on the cleaned dataset, we can see the relatively high correlations among some of the traits.

```{r}
pitcher_traits <- pitcher_traits[complete.cases(pitcher_traits), ]

pitcher_sites <- as.factor(pitcher_traits$site_id)

round(cor(pitcher_traits[, -(1:2)]), 2)
```

We select four traits that represent dimensions of different parts of the plant's anatomy. As a result they are relatively uncorrelated with one another: rosette diameter, pitcher width, mouth diameter, and lip thickness.

```{r}
traits_to_use <- c("rosette_diameter_1", "pitcher_width", "mouth_diameter", "lip_thickness")

pitcher_trait_matrix <- as.matrix(pitcher_traits[, traits_to_use])

round(cor(pitcher_trait_matrix), 2)
```

Next we center and scale all the columns to ensure that the hypervolume O-statistic is not influenced by different magnitudes of each trait.

```{r}
apply(pitcher_trait_matrix, 2, range)

pitcher_trait_matrix_scaled <- scale(pitcher_trait_matrix, center = TRUE, scale = TRUE)

round(apply(pitcher_trait_matrix_scaled, 2, range), 2)
```

## Univariate versus multivariate O-statistics

For demonstration purposes, let's calculate the univariate O-statistics for each of the four traits so that we can compare them to the multivariate statistic.

```{r univariate O-stats, message = FALSE, results = 'hide'}
pitcher_univariate <- Ostats(traits = pitcher_trait_matrix_scaled,
                             plots = factor(rep(1, nrow(pitcher_trait_matrix_scaled))),
                             sp = pitcher_sites,
                             run_null_model = FALSE
)
```

Now calculate the multivariate O-statistic for the traits combined. We still specify the argument `random_seed` even in the absence of a null model, because the hypervolumes are constructed with a stochastic algorithm.

```{r multivariate O-stats, message = FALSE, results = 'hide'}
pitcher_multivariate <- Ostats_multivariate(traits = pitcher_trait_matrix_scaled,
                                            plots = factor(rep(1, nrow(pitcher_trait_matrix_scaled))),
                                            sp = pitcher_sites,
                                            random_seed = 333,
                                            run_null_model = FALSE, 
                                            hypervolume_args = list(method = 'box'),
                                            hypervolume_set_args = list(num.points.max = 1000)
)
```

Notice that `Ostats_multivariate()` accepts additional arguments to `hypervolume::hypervolume()`, which it uses to construct the hypervolumes, and `hypervolume::hypervolume_set()`, which it uses to estimate the hypervolume overlaps. For example, the default method for hypervolume construction is `method = "gaussian"`, and the default maximum number of points to use for hypervolume overlap estimation is `num.points.max = 10^(3+sqrt(n))`, where `n` is the number of dimensions. In this vignette, we modified the defaults to use a different method (`"box"`) for hypervolume construction and fewer points (`1000`) for overlap estimation: `hypervolume_args = list(method = 'box'), hypervolume_set_args = list(num.points.max = 1000).` Those options speed processing time dramatically. See the help documentation for `hypervolume::hypervolume` and `hypervolume::hypervolume_set` for more details.

Compare the univariate and multivariate O-statistics.

```{r}
pitcher_univariate$overlaps_norm

pitcher_multivariate$overlaps_norm
```

The single multivariate O-statistic is not comparable to the univariate statistics because of differing dimensionality. Therefore we need to evaluate them against null models.

## Null models for multivariate O-statistics 

If `run_null_model` is set to `TRUE`, `Ostats_multivariate()` will do `nperm` iterations of a null model, defaulting to `99`. Here we use a smaller number for demonstration purposes.

First, for comparison purposes, run the null models for the univariate O-statistics for each of the four traits.

```{r univariate null, message = FALSE, results = 'hide'}
pitcher_univariate_withnull <- Ostats(traits = pitcher_trait_matrix_scaled,
                                      plots = factor(rep(1, nrow(pitcher_trait_matrix_scaled))),
                                      sp = pitcher_sites,
                                      random_seed = 666,
                                      run_null_model = TRUE,
                                      nperm = 50
)
```

Next, do the same for the multivariate overlap. 

```{r multivariate null, message = FALSE, results = 'hide'}
pitcher_multivariate_withnull <- Ostats_multivariate(traits = pitcher_trait_matrix_scaled,
                                            plots = factor(rep(1, nrow(pitcher_trait_matrix_scaled))),
                                            sp = pitcher_sites,
                                            random_seed = 555,
                                            run_null_model = TRUE,
                                            nperm = 50,
                                            hypervolume_args = list(method = 'box'),
                                            hypervolume_set_args = list(num.points.max = 1000)
)
```

Comparing the standardized effect sizes, normalized to remove the effect of varying sample sizes across sites, we can see that three of the four individual traits have O-statistics significantly lower than the null expectation (lower than the 2.5%ile of the null distribution), while the multivariate O-statistic is not significantly different from the null expectation:

```{r, echo = FALSE}
uni_result <- with(pitcher_univariate_withnull, rbind(overlaps_norm, 
                                                      overlaps_norm_ses$raw_lower,
                                                      overlaps_norm_ses$raw_upper))
multi_result <- with(pitcher_multivariate_withnull, c(overlaps_norm,
                                                      overlaps_norm_ses$raw_lower,
                                                      overlaps_norm_ses$raw_upper))
knitr::kable(data.frame(Ostat = c('observed', 'null 2.5%', 'null 97.5%'),
                        cbind(uni_result, multi_result)),
             row.names = FALSE,
             col.names = c('O-statistic', 'rosette diameter', 'pitcher width', 'mouth diameter', 'lip thickness', 'multivariate'),
             digits = 3)
```


# Visualizing hypervolume O-statistics

We can visualize the overlap of the different hypervolumes using the function `Ostats_multivariate_plot()`. This function returns a list of plot objects, one for each community. Each plot displays an upper triangular matrix of panels, one for each pair of traits, with the cross-sections of the hypervolumes in those two dimensions, as well as the raw data points.

For example, we can plot the overlap of the hypervolumes in 4-dimensional space formed by the individual traits of the pitcher plants at each site. Note that although we calculated the hypervolume overlaps on the scaled traits, we are plotting the hypervolumes using the unscaled trait axes. However, this requires us to set the argument `contour_level = 0.0001` because the densities of the unscaled hypervolumes are too low to plot using the default `contour_level = 0.01`.

```{r default plot, message = FALSE, fig.height = 9, fig.width = 9}
Ostats_multivariate_plot(plots = factor(rep(1, nrow(pitcher_trait_matrix))),
                         sp = pitcher_sites,
                         traits = pitcher_trait_matrix,
                         contour_level = 0.0001,
                         overlap_dat = pitcher_multivariate
)
```

We can customize the plot by supplying a color palette to the `colorvalues` argument (the default is a colorblind-friendly palette imported from [viridis](https://CRAN.R-project.org/package=viridis/vignettes/intro-to-viridis.html)), and add additional spacing around the hypervolume contours using the `axis_expansion` argument. The default value of this argument is `0.01`, or 1% in all directions. Finally, we can suppress plotting the raw data and only plot the contours using `plot_points = FALSE`.

```{r custom plot, message = FALSE, fig.height = 9, fig.width = 9}
Ostats_multivariate_plot(plots = factor(rep(1, nrow(pitcher_trait_matrix))),
                         sp = pitcher_sites,
                         traits = pitcher_trait_matrix,
                         contour_level = 0.0001,
                         overlap_dat = pitcher_multivariate,
                         colorvalues = rainbow(length(unique(pitcher_sites))),
                         axis_expansion = 0.05,
                         plot_points = FALSE
)
```

The code underlying `Ostats_multivariate_plot()` was adapted from the `plot.Hypervolumelist()` function in the [hypervolume][hypervolume] package.

# Some practical notes

When working with multivariate trait data, it is critical to examine correlations among traits in the dataset. If you construct hypervolumes with traits that are highly correlated and measure their overlap, you will get spurious results. In addition, it's important to consider the issue of dimensionality. It is difficult if not impossible to estimate hypervolume overlap in more than five or six dimensions. A study that quantified trait space across a wide variety of systems found that using three to six dimensions jointly maximizes trait space quality and computational feasibility (Mouillot et al., in press). Others have critiqued using hypervolume kernel density estimation to quantify species niches (Qiao et al. 2017); the technical issues only get worse as more dimensions are added. Using a variance maximization technique like PCA could be helpful to reduce dimensionality. Regardless of what method you choose, the absolute amount of overlap can't be compared among hypervolumes with different dimensionality, making it especially important to compare the O-statistics with null distributions. 

Finally, some work has shown that trait-based inference derived from analyzing multiple traits may not improve our understanding of biological systems in many cases, relative to inference based on one well-chosen biologically meaningful trait, especially when many commonly-measured traits are different manifestations of the same few underlying biological processes (Swenson 2013, but see Kraft et al. 2015). In light of that, you should carefully consider which trait or traits are relevant to your research question. Whenever possible, compare inference based on single trait overlap to the inference you get when considering multiple traits, and select the most parsimonious model.

# Works cited

- Blonder, B. 2018. Hypervolume concepts in niche- and trait-based ecology. *Ecography* 41:1441–1455.
- Blonder, B., C. Lamanna, C. Violle, and B.J. Enquist, 2014. The n-dimensional hypervolume. *Global Ecology and Biogeography* 23:595–609.
- Ellison, A.M., and N.J. Gotelli. 2002. Nitrogen availability alters the expression of carnivory in the Northern Pitcher Plant, *Sarracenia purpurea*. *Proceedings of the National Academy of Science* 99(7):4409-12.
- Freedman, Z.B., A. McGrew, B. Baiser, M. Besson, D. Gravel, T. Poisot, S. Record, L.B. Trotta, and N.J. Gotelli. 2021. Environment-host-microbial interactions shape the *Sarracenia purpurea* microbiome at the continental scale. *Ecology* 102(5):e03308.
- Green, R.H. 1971. A multivariate statistical approach to the Hutchinsonian niche: bivalve molluscs of central Canada. *Ecology* 52:543–556.
- Hutchinson, G.E. 1957. Concluding remarks. *Cold Spring Harb. Symp. Quant. Biol.* 22:415–427.
- Kraft, N.J.B., O. Godoy, and J.M. Levine. 2015. Plant functional traits and the multidimensional nature of species coexistence. *Proceedings of the National Academy of Sciences* 112:797–802.
- Lamanna, C., B. Blonder, C. Violle, N.J.B. Kraft, B. Sandel, I. Imova, J.C. Donoghue, J.-C. Svenning, B.J. McGill, B. Boyle, V. Buzzard, S. Dolins., P.M. Jorgensen., A. Marcuse-Kubitza, N. Morueta-Holme, R.K. Peet, W.H. Piel, J. Regetz, M. Schildhauer, N. Spencer, B. Thiers, S.K. Wiser., and B.J. Enquist. 2014. Functional trait space and the latitudinal diversity gradient. *Proceedings of the National Academy of Sciences* 111:13745–13750. 
- Mouillot, D., N. Loiseau, M. Grenié, A.C. Algar, M. Allegra, M.M. Cadotte, N. Casajus, P. Denelle, M. Guéguen, A. Maire, B. Maitner, B.J. McGill, M. McLean, N. Mouquet, F. Munoz, W. Thuiller, S. Villéger, C. Violle, and A. Auber. The dimensionality and structure of species trait spaces. *Ecology Letters*, in press.
- Qiao, H., L.E. Escobar, E.E. Saupe, L. Ji, and J. Soberón. 2017. A cautionary note on the use of hypervolume kernel density estimators in ecological niche modelling. *Global Ecology and Biogeography* 26:1066–1070.
- Swanson, H.K., M. Lysy, M. Power, A.D. Stasko, J.D. Johnson, and J.D. Reist. 2015. A new probabilistic method for quantifying n-dimensional ecological niches and niche overlap. *Ecology* 96:318–324.
- Swenson, N.G. 2013. The assembly of tropical tree communities – the advances and shortcomings of phylogenetic and functional trait analyses. *Ecography* 36:264–276.


[hypervolume]: https://cran.r-project.org/package=hypervolume