---
title: "Plot design optimization"
author: "Anika Seppelt"
#date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{3. Plot design optimization}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

The function `metrics.variables` used for the calculation of stand-level variables and metrics (see vignette "Stand-level") requires arguments specifying the plot designs and sizes. If the optimal plot design and size for the calculation of stand-level variables is not known, the optimal plots design for the corresponding TLS data can be determined by two different approaches implemented in FORTLS. The approaches depend on whether field data for the sample plots is available or not. 

```{r include=FALSE}
dir.data <- system.file("data", package="FORTLS")
setwd(dir.data)
load("Rioja.data.RData")
load("Rioja.simulations.RData")
tree.tls <- Rioja.data$tree.tls
tree.field <- Rioja.data$tree.field
dir.data <- system.file("exdata", package="FORTLS")
library(FORTLS)
```

## Estimating optimal plot size without field data

If no field data is available, the function **`estimation.plot.size`** can be applied to determine the optimal plot design and size. This function uses the data frame containing the list of detected trees (introduced in **`tree.tls`**) and estimates stand-level density ($N$, trees/ha) and basal area ($G$, m$^2$/ha) for many simulated differently-sized plots and the three plot designs (circular fixed area, k-tree and angle-count) by increasing continuously their sizes. 

Thus, circular fixed area plots with increasing radius (increment of 0.1 m) to the maximum radius defined by **`radius.max`** in **`plot.parameters`** (by default set to 25, if radius is larger than furthest tree, the horizontal distance to this furthest tree is considered as maximum radius) will be simulated and for each plot, the variables (N and G) are estimated. Similarly, k-tree plots with tree numbers ($k$) ranging from 1 to **`k.max`** (specified in `plot.parameters`, default value set to 50 or total number of trees in the plot) and angle-count plots with increasing basal area factor (BAF, increments of 0.1 m$^2$/ha) to the maximum value specified by **`BAF.max`** in `plot.parameters` (set to 4 by default) are simulated and the respective stand-level variables are calculated. Optionally, the minimum diameter at breast height (**`dbh.min`**, in cm) to include the trees in the estimations can be defined. By default the minimum $dbh$ is set to 4 cm.

The function generates size-estimation charts i.e., plots showing the estimated stand-level density ($N$) and basal area ($G$) on the $y$ axes respective to the different plot sizes ($x$ axes). The estimations will be performed for simulated plots corresponding to all sample plots. By default the output graphs will contain one line for each sample plot. When **`average`** is set to `TRUE`, the average of all estimations (for all plots) as a continuous line and the standard deviation as grey shaded area will be drawn instead of multiple lines for each sample plot. One chart for each plot design is drawn by default. If **`all.plot.designs`** is set to `TRUE`, the line charts of all three plot design will be drawn in one graph with different colours for each plot design.

```{r fig.height=7, fig.width=7, fig.align = "center", message=FALSE, warning=FALSE}
estimation.plot.size(tree.tls = tree.tls,
                     plot.parameters = data.frame(radius.max = 25, k.max = 50, BAF.max = 4),
                     dbh.min = 4,
                     average = TRUE, all.plot.designs = FALSE)
```

The continuous line represents the average over all sample plots (i.e. 16 plots in the example shown here) of the estimated density ($N$) on the left and the basal area ($G$) on the right. The dotted line indicates the number of sample plots. This figure helps to find suitable plot designs for the calculation of stand-level metrics and variables. The optimal plot design and size should be chosen within a range where the estimated values for $N$ and $G$ reach a stable level. A too small plot leads to high errors of estimation, since only few trees enter the plot and therefore the sample is too small. In the example above, the basal area estimated for fixed area plots with radius smaller than 5 m is much higher (around 40-50 m$^2$/ha) than the true value (around 20 m$^2$/ha). On the other hand, too large plots come along with systematic errors due to occlusion of trees. Therefore, the basal area in the same example of fixed area plots with radius bigger than 20 m is estimated lower than the true value. In order to avoid both types of errors, the figure helps to find a plot size range with stable values.

## Validation with field data and optimizing plot design

When data from field measurements are available for the same sample plots, the TLS-based estimates can be validated and the optimal plot designs can be found applying functions implemented in FORTLS. In the first step of the optimization process, the function **`simulations`** simulates plots with incremental size and computes the corresponding stand-level metrics and variables (similar to the function `metrics.variables`, see "Stand-level" vignette). Based on the simulated data, two different processes can be performed. First, the bias between TLS data and field data for each individual estimated variable can be assessed with the function **`relative.bias`**. Second, correlations between all estimated variables and metrics based on TLS-data (output data of the `simulations` function) and the variables estimated from field data can be calculated with the **`correlations`** function. This function calculates both the Pearson and Spearman correlation coefficients. To visualize the correlation coefficients, heat maps can be drawn with the **`optimize.plot.design`** function.

### Plot simultaion and estimation of metrics and variables

The **`simulations`** function is applied as follows:

```{r eval=FALSE, include=TRUE}
simulations <- simulations(tree.tls = tree.tls, tree.ds = tree.ds, tree.field = tree.field,
            plot.design = c("fixed.area", "k.tree", "angle.count"),
            plot.parameters = data.frame(radius.max = 25, k.max = 50, BAF.max = 4),
            scan.approach = "single", var.metr = list(tls = NULL, field = NULL),
            dbh.min = 4, h.min = 1.3, max.dist = Inf,
            dir.data = dir.data, save.result = FALSE, dir.result = NULL)
```

#### The input data frames

Both TLS and field data from the same sample plots are required to compute the function. The TLS data introduced in **`tree.tls`** should have the same format as the data frame returned from the `tree.detection.single.scan` and `tree.detection.multi.scan` functions. Thus, each row must correspond to a detected tree and it must contain at least the following columns: `id`, `file`, `tree`, `x`, `y`, `phi.left`, `phi.right`, `horizontal.distance`, `dbh`, `num.points`, `num.points.hom`, `num.points.est`, `num.points.hom.est` and `partial.occlusion`. The data frame containing the field data must be inserted in the argument **`tree.field`**. Similar to the TLS data table, each row must correspond to a tree (specified in the columns `id` and `tree`) and the values for `horizontal.distance`, `dbh`, `total.height` and an integer value indicating whether the tree is dead (1) or alive (NA, specified in `dead`) must be included in the data frame. 

When the distance sampling method for correction of occlusion effects was applied (function `distance.sample`, see "Stand-level" vignette), a list with the results in the output data frames from the aforesaid function can be introduced in **`tree.ds`**. The list must contain at least the data frame `tree` with the detection probabilities (`P.hn`, `P.hn.cov`, `P.hr` and `P.hr.cov`) for each tree. By default `tree.ds` is set to `NULL` and as a result, the calculations of the variables based on occlusion correction will not be performed.

#### Specifying designs of simulated plots

A vector containing the names of the plot designs can specify the plot designs (`"fixed.area"`, `"k.tree"`, `"angle.count"`) that are to be considered for the simulations in **`plot.design`**. By default, this argument is set to `NULL` and all three plots designs will be considered. 

Furthermore the argument **`plot.parameters`** allows for manually specifying the design of the simulated plots. Many differently-sized plots of the plot designs specified in `plot.design` are simulated. The list introduced in `plot.parameters` can include the following elements to customize the generated plots. The elements `radius.max`, `k.tree.max` and `BAF.max` define the maximum radius (in m), the maximum number of trees and the maximum BAF (in m$^2$/ha) respectively to which the sizes of circular fixed area, k-tree and angle-count plots respectively should increase. By default the values are set to `radius.max = 25`, `k.tree.max = 50` and `BAF.max = 4`. The increment by which the sizes of circular fixed area and angle-count plots sequentially increase can also be customized by specifying the elements `radius.increment` and `BAF.increment` respectively. The default settings are `radius.increment = 0.1` (in m) and `BAF.increment = 0.1` (in m$^2$/ha). An additional element of the list can be `num.trees` defining the number of dominant trees per hectare (trees/ha). This value is needed for the calculation of dominant diamters and heights and is set to 100 trees/ha by default.

#### Further adjustable arguments

Similar to the other functions, the scan approach can be specified in **`scan.approach`** and is by default set to `"multi"`. Metrics and variables of interest can be defined as a vector in **`var.metr`**. Thus, only those metrics and variables named in the vector are calculated. If not specified, `var.metr` is set to `NULL` and all possible metrics and variables are computed. The arguments **`dbh.min`**, **`h.min`** and **`max.dist`** can optionally define the minimum $dbh$, height and maximum distance of a tree to be included in the calculations. The default values are `dbh.min = 4` (in cm), `h.min = 1.3` (in m) and `max.dist = NULL` (no maximal distance is considered). 

The argument **`dir.data`** should specify the working directory of the .txt files with the normalized reduced point clouds (output of `normalize` function). If not specified, it is set to `NULL` and the current working directory is assigned to it. The output files will be saved by default, since the argument **`save.result`** is set to `TRUE`, to the directory path indicated in **`dir.result`**. For each plot design (circular fixed area, k-tree and angle-count plots), a .csv file is created using the `write.csv` function from the [utils](https://CRAN.R-project.org/package=R.utils) package.

#### Output of the simulations function

The `simulations` function generates a list with one element for each plot design. The elements are data frames containing the simulated plot. Each row represents a simulated plot defined by their respective plot identification number `id` and their size determined by either `radius`, `k` or `BAF` depending on the plot design. The columns `N`, `G`, `V`, `V.user`, `W.user`, `d`, `dg`, `dgeom`, `dharm`, `h`, `hg`, `hgeom`, `hharm`, `d.0`, `dg.0`, `dgeom.0`, `dharm.0`, `h.0`, `hg.0`, `hgeom.0` and `hharm.0` display the stand-level variables based on the field data. The remaining columns contain the stand-level variables and metrics estimated for each plot based on the TLS data. As an example the data frame for circular fixed area plots is shown below.

```{r eval=FALSE, include=TRUE}
head(simulations$fixed.area)
```

```{r echo=FALSE}
kableExtra::scroll_box(kable_input = kableExtra::kable(head(Rioja.simulations$fixed.area), 
                                                       format = "html"), width = "100%")
```

As described above, plots with increasing sizes are estimated and the corresponding variables an metrics are calculated. The plots are ordered from the smallest to the biggest. Plots with small radius can not be simulated for all sample plots (see table above), since trees not always enter, because the plots are too small. But plots with e.g. a radius of 20 m, can be simulated for all sample plots (see end of table below).

```{r eval=FALSE, include=TRUE}
tail(simulations$fixed.area)
```

```{r echo=FALSE}
kableExtra::scroll_box(kable_input = kableExtra::kable(tail(Rioja.simulations$fixed.area), 
                                                       format = "html"), width = "100%")
```

### Calculation of relative bias

In order to estimate the goodness of the TLS-based estimation of the variables, the function **`relative.bias`** computes the relative bias between the variables estimated from field data and their respective TLS-based estimates. The relative bias is calculated for each sample plot and each simulated plot (i.e. different plot sizes and designs). Therefore, the input data for this function (introduced in **`simulations`**) must be a list of data frames containing the estimated variables (based on field and TLS data) for all the simulated plots. Thus, a similar list to the output of the `simulations` function (see above) is required, that has the same description and format. 

Optionally, the variables for which the relative bias will be computed can be specified in a vector in **`variables`**. Only, the names of the field data based estimates can be introduced. If not otherwise specified, the argument will be set to `c("N", "G", "V", "d", "dg", "d.0", "h", "h.0")` by default. Other possible variables are `dgeom`, `dharm`, `dg.0`, `dgeom.0`, `dharm.0`, `hg`, `hgeom`, `hharm`, `hg.0`, `hgeom.0` or `hharm.0`. 

The arguments **`save.result`** and **`dir.result`** define whether and to which directory the output files should be saved. Two different output files are generated. First, the data frames for each plot design (as shown below for circular fixed areas) are saved as .csv files using the `write.csv` function from the [utils](https://CRAN.R-project.org/package=R.utils) package. Second, interactive line charts representing the relative biases are saved as .html files by means of the `saveWidget` function in the [htmlwidgets](https://CRAN.R-project.org/package=htmlwidgets) package. An example of these interactive line charts is provided below.

```{r eval=TRUE}
bias <- relative.bias(simulations = Rioja.simulations,
              variables = c("N", "G", "d", "dg", "d.0", "h", "h.0"),
              save.result = FALSE, dir.result = NULL)
```

The function calculates the relative bias between the field data estimates (specified in `variables`) and the counterpart variables that are estimated based on TLS data. The TLS counterparts for the density (`N`) are the variables `N.tls`, `N.hn`, `N.hr`, `N.hn.cov`, `N.hr.cov` and `N.sh` for circular fixed area and k-tree plots, and `N.tls` and `N.pam` for angle-count plots. The same pattern applies to the basal area (`G`) and the volume (`V`) where the corresponding TLS-based estimates are `G.tls`, `G.hn`, `G.hr`, `G.hn.cov`, `G.hr.cov`, `G.sh` and `G.pam`, and `V.tls`, `V.hn`, `V.hr`, `V.hn.cov`, `V.hr.cov`, `V.sh` and `V.pam` respectively. In case of mean and dominant diameters (`d`, `dg`, `dgeom`, `dharm`, `d.0`, `dg.0`, `dgeom.0`, and `dharm.0`) and heights (`h`, `hg`, `hgeom`, `hharm`, `h.0`, `hg.0`, `hgeom.0` and `hharm.0`), for all three plot designs their respective counterpart variables are `d.tls`, `dg.tls`, `dgeom.tls`, `dharm.tls`, `d.0.tls`, `dg.0.tls`, `dgeom.0.tls` and `dharm.0.tls` (for the diameter), and `h.tls`, `hg.tls`, `hgeom.tls`, `hharm.tls`, `h.0.tls`, `hg.0.tls`, `hgeom.0.tls`, `hharm.0.tls` and in addition `P99` (for the height). The relative bias are calculated as follows

$$ 
\frac{\frac{1}{n} \sum_{i=1}^{n}y_i - \frac{1}{n}\sum_{i=1}^{n}x_i}{\sum_{i=1}^{n}x_i}
$$

where $x_i$ is the value of the field estimate and $y_i$ the value of its TLS counterpart corresponding to plot $i$ of $n$ sample plots. For each plot size defined by the radius, k or BAF, the biases are calculated and stored as a data frame (shown below). Each row represents the a simulated plot of a certain size (here defined by `radius`) and the columns contain the calculate bias between the variables indicated in the column names. The two compared variables are joint with `.` as separation, e.g. `N.N.tls` means that the bias between `N` and `N.tls` was calculated. 

```{r eval=FALSE, include=TRUE}
head(bias$fixed.area)
```

```{r echo=FALSE}
kableExtra::scroll_box(kable_input = kableExtra::kable(head(bias$fixed.area), 
                                                       format = "html"), width = "100%")
```

For better visualization, line charts are created that show the relative bias of a given variable and plot design. As an example, the interactive graphic showing the relative bias of basal area estimations (`G`) for fixed are plots can be seen when following the link ([RB.G.fixed.area.html](https://www.dropbox.com/s/8a0hwh1xpqlprjc/RB.G.fixed.area.plot.html?dl=0)).

### Functions facilitating model-based or model-assisted sampling approaches 

To facilitate the application of model-based sampling, two additional functions are included in the FORTLS package. The function **`correlations`** computes the correlations between variables estimated from field data and those estimated from TLS data and calculates the respective Pearson and Spearman correlation coefficients. The results are saved as .csv files and represented as line charts and heat maps (when applying the function **`optimize.plot.design`**). 

#### Computing correlations

The **`correlations`** function computes the correlations for all the plot designs that are introduces as elements of a list in **`simulations`**. The format and description must be the same as the output list of the `simulations` function. Also similar to the function `relative.bias`, the variables for which the correlations are to be calculated can be specified in **`variables`**. By default, this argument is set to `variables = c("N", "G", "V", "d", "dg", "d.0", "h", "h.0")`. If only one of the two above-mentioned correlation measures should be calculated, it can be specified in **`method`**. This argument is set to `method = c("pearson", "spearman")` by default and both correlation coefficients are computed. 

```{r}
fixed.area.simulations <- list(fixed.area = Rioja.simulations$fixed.area[Rioja.simulations$fixed.area$radius < 7.5, ])
cor <- correlations(simulations = fixed.area.simulations,
             variables = c("N", "G", "d", "dg", "d.0", "h", "h.0"),
             method = c("pearson", "spearman"), 
             save.result = FALSE, dir.result = NULL)
```

In addition to the calculation of the correlation measures, the function also performs tests of association and returns the p-values. The output of this function is a list containing the following three elements `correlations`, `correlations.pval` and `opt.correlations`. Each of them is a list including, if not otherwise specified (in `method`), two elements `pearson` and `spearman`. These two elements are lists again that include separate data frames for each plot design (circular fixed area, k-tree and angle-count plots). The data frames contain the corresponding correlation coefficients (in `correlations`), the calculated p-values (in `correlations.pval`) and the optimal correlations for a given plot size and field data estimate (in `opt.correlations`).

```{r eval=FALSE, include=TRUE}
cor$correlations$pearson$fixed.area[20:26,1:15]
```

```{r echo=FALSE}
kableExtra::scroll_box(kable_input = kableExtra::kable(
  cor$correlations$pearson$fixed.area[20:26,1:15], format = "html"), width = "100%")
```

All mentioned data frames are divided into rows each of which represent a given plot size defined by `radius` (for circular fixed area plots), `k` (for k-tree plots) or `BAF` (for angle-count plots). The columns of the data frames in `correlations` (shown above) and `correlations.pval` (shown below) contain the calculated coefficients and p-values respectively for the corresponding correlation. The column names are composed of the two variables (e.g. `N` and `N.tls`) separated by `.` (giving `N.N.tls`) that were correlated as described for the `relative.bias` function.

```{r eval=FALSE, include=TRUE}
cor$correlations.pval$pearson$fixed.area[20:26,1:15]
```

```{r echo=FALSE}
kableExtra::scroll_box(kable_input = kableExtra::kable(
  cor$correlations.pval$pearson$fixed.area[20:26,1:15], format = "html"), width = "100%")
```

The data frames of the `opt.correlations` list (example shown below) are also divided into rows that represent the different plot sizes. For a given plot size and variable (specified in the argument `variables`), the best correlating TLS-based estimate and the corresponding correlation coefficient is displayed in this table. The columns named `<variable>.metric` (with `<variable>` being here `N`, `G`, `d`, `dg`, `d.0`, `h` and `h.0`) contain the TLS-based variable or metric that yielded the best correlation with the respective field data-based variable of the column name for a certain plot radius. The columns `<variable>.cor` display the measures of the respective correlations. That means, in the example shown here, the TLS-based estimate that yielded the best correlation with the field data based variable density (`N`) for circular fixed area plots with a radius of 4.4 m is `ID.rho`. And the correlation coefficient is 0.7286.

```{r eval=FALSE, include=TRUE}
cor$opt.correlations$pearson$fixed.area[20:26,]
```

```{r echo=FALSE}
kableExtra::scroll_box(kable_input = kableExtra::kable(
  cor$opt.correlations$pearson$fixed.area[20:26,], format = "html"), width = "100%")
```

The `correlations` functions creates different files and saves them (if `save.result = TRUE`, default setting) to the directory indicated in `dir.result`. These files are, on the one hand, .csv files of the data frames in the lists `correlations` and `opt.correlations` created by means of the `write.csv` function from the [utils](https://CRAN.R-project.org/package=R.utils) package. These .csv files will be named as `correlations.<plot design>.<method>.csv` and `opt.correlations.<plot design>.plot.<method>.csv` with `<plot design>` being `fixed.area.plot`, `k.tree.plot` or `angle.count.plot` and `<method>` being `pearson` or `spearman`. On the other hand, interactive line charts representing the correlation coefficients will be created for each variable (selected in `variables`) as .html file using the `saveWidget` function in the [htmlwidgets](https://CRAN.R-project.org/package=htmlwidgets) package. As an example, the interactive line chart for the variable height (`h`) and fixed area plots (pearson measure) is shown ([correlations.h.fixed.area.pearson.html](https://www.dropbox.com/s/etriqgcmbahab7z/correlations.h.fixed.area.pearson.html?dl=0)).

#### Visualizing correlations

In order to visualize the optimal correlations, the function **`optimize.plot.design`** creates heat maps and is applied as follows:

```{r eval=FALSE}
optimize.plot.design(correlations = cor$opt.correlations,
                     variables = c("N", "G", "d", "dg", "d.0", "h", "h.0"),
                     dir.result = NULL)
```

The function creates the heat maps based on the optimal correlation list (`opt.correlations` from the output of the `correlations` function) introduced in **`correlations`**. The introduced list must have the same format and description as the `opt.correlation` list. Similar to the other functions described above, the variables of interest can be selected in **`variables`** (default setting: `variables = c("N", "G", "V", "d", "dg", "d.0", "h", "h.0")`). This function generates interactive heat maps with the `saveWidget` function of the [htmlwidgets](https://CRAN.R-project.org/package=htmlwidgets) package and saves these graphics to the directory indicated in **`dir.result`** (or by default to the working directory). For each plot design and correlation measure a plot is generated and named as `opt.correlations.<plot design>.<method>.html` where `<plot design>` can be `fixed.area.plot`, `k.tree.plot` or `angle.count.plot` and `<method>` either `pearson` or `spearman` according to the plot design and correlation measure. 

As an example, the heat map of the pearson correlation coefficient for fixed area plots and pearson measure is provided and can be seen when opening the link ([opt.correlations.fixed.area.pearson.html](https://www.dropbox.com/s/e0wzq32ltjjeren/opt.correlations.fixed.area.pearson.html?dl=0)).