---
title: "Note on OBI score aggregation"
author: "Brent Riechelman, Yuki Fujita"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: [packages.bib, vignettes_references.bib]
vignette: >
  %\VignetteIndexEntry{Note-on-OBI-score-aggregation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = FALSE,
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE
)
options(rmarkdown.html_vignette.check_title = FALSE)
```

# Introduction
The OBIC is a framework that takes a multitude of soil parameters and variables from agricultural fields and ultimately gives a single value expressing the soil quality of that field. To take this multitude of measured, modeled and calculated values to a single value between 0 and 1, three aggregation steps take place as illustrated below.

```{r include score integration image, echo=FALSE, out.width = '85%', out.height = '85%', fig.cap = 'Figure 1. Graphic representation of how measured soil properties are aggregated to scores.'}
# include graphic
knitr::include_graphics('../vignettes/OBIC_score_integratie_2.png')
```

There is no scientific principle dictating how this aggregation should be done and there are several ways to do the aggregation. For example; averaging, linearly weighted averaging, logarithmically weighted averaging. The last one is used in OBIC.
This document dives deeper into the three aggregation steps within the framework and will explain  why logarithmically weighted aggregation is chosen. We will also compare the three methods of aggregation to illustrate the influence the aggregation method on the final score, which provides a kind of sensitivity analysis.
The aggregation methods will be compared with a mock dataset of a selection of soil functions.
Demonstration of aggregation will be performed using  the dataset `binnenveld`. The dataset contains soil properties from 11 agricultural fields in the surroundings of Wageningen, with different soil texture and land use, and is documented in `?binnenveld`.

```{r load binnenveld}

  # load packages
  library(OBIC); library(data.table); library(ggplot2); library(patchwork)
  setDTthreads(1)

  # load data
  dt <- OBIC::binnenveld[ID==1]
  
  
```

# Aggregation
## Pre-processing
After calculating soil function scores and in prior to the aggregation procedure, a reformatting step takes place. 
The reformatting step consists of the following tasks:

  * it is assessed which indicators are relevant for a field given its soil type and crop category using the `weight.obic` table.
  * year numbers are assigned from 1 to n with one being the most recent year (used later for the aggregation over years)
  * a molten data.table is created with all indicators in a single column and soil type, crop category and year as identifying variables
  * indicators are assigned to  categories (chemical, physical, biological, environmental, management)
  * the number of indicators per category is counted (used later for the aggregation over categories)
  * a correction factor per score is calculated based on the height of the score (used later for the aggregation within each category)
  * soil function scores irrelevant to the land use are set to -999
  
```{r reformatting code, echo = TRUE, eval=FALSE}
  # Step 3 Reformat dt given weighing per indicator and prepare for aggregation  ------------------
    
    # load weights.obic (set indicator to zero when not applicable)
    w <- as.data.table(OBIC::weight.obic)
    
    # Add years per field
    dt[,year := 1:.N, by = ID]
    
    # Select all indicators used for scoring
    cols <- colnames(dt)[grepl('I_C|I_B|I_P|I_E|I_M|year|crop_cat|SOILT',colnames(dt))]

    # Melt dt and assign main categories for OBI
    dt.melt <- melt(dt[,mget(cols)],
                    id.vars = c('B_SOILTYPE_AGR','crop_category','year'), 
                    variable.name = 'indicator')
    
    # add categories relevant for aggregating
    # C = chemical, P = physics, B = biological, BCS = visual soil assessment
    # indicators not used for integrating: IBCS and IM
    dt.melt[,cat := tstrsplit(indicator,'_',keep = 2)]
    dt.melt[grepl('_BCS$',indicator) & indicator != 'I_BCS', cat := 'IBCS']
    dt.melt[grepl('^I_M_',indicator), cat := 'IM']
    
    # Determine number of indicators per category
    dt.melt.ncat <- dt.melt[year==1 & !cat %in% c('IBCS','IM')][,list(ncat = .N),by='cat']
    
    # add weighing factor to indicator values
    dt.melt <- merge(dt.melt,w[,list(crop_category,indicator,weight_nonpeat,weight_peat)], 
                     by = c('crop_category','indicator'), all.x = TRUE)
    
    # calculate correction factor for indicator values (low values have more impact than high values, a factor 5)
    dt.melt[,cf := cf_ind_importance(value)]
    
    # calculate weighted value for crop category
    dt.melt[,value.w := value]
    dt.melt[grepl('veen',B_SOILTYPE_AGR) & weight_peat < 0,value.w := -999]
    dt.melt[!grepl('veen',B_SOILTYPE_AGR) & weight_nonpeat < 0,value.w := -999]
    
```

```{r eval = FALSE, include = FALSE}
# YF: I think this paragraph is not necessary
# After reformatting the data in step 3, an indicator data.table is created in step 4. This data.table uses the soil function scores adjusted for their applicability for the soiltype and crop category. In step 4 indicators are calculated to display them as output. In step 5 the total OBI score is calculated, since part of step 5 overlaps with step 4, we only discuss step 5 onwards. 
```

## Aggregation within category
To aggregate scores, the relevant columns and rows are taken from the molten data.table.

```{r echo = TRUE, eval=FALSE}
 # Step 5 Add scores ------------------
    
    # subset dt.melt for relevant columns only
    out.score <-  dt.melt[,list(cat, year, cf, value = value.w)]
  
    # remove indicator categories that are not used for scoring
    out.score <- out.score[!cat %in% c('IBCS','IM','BCS')]
```    

The indicators within each category are aggregated to a single score per category (chemical, physical, biological, management, environmental) using the correction factor (*vcf*) calculated previously using `cf_ind_importance()`. This correction factor *vcf* gives a higher weight to indicators with a lower score, as:
$$
vcf = 1/(I+0.2)
$$
where *I* is the score of the indicator.
This way, the lowest indicator, supposedly also the most limiting factor for crop production, becomes more important. Consequently, improving a low scoring indicator by 0.1 has a greater impact on the aggregated category score than improving a high scoring indicator by the same amount, making it more worthwhile to invest in the poorest and most limiting indicator.

Subsequently, the score of each category is computed by summing up the weighted values (scaled by the sum of all weights) of all soil indicators within the category:
$$
S =  \sum_{i}(I_{i} \frac{vcf_{i}}{\sum_{i}vcf_{i}})
$$
where *S* is the score of the category, *vcf*<sub>i</sub> is the weighing factor of the indicator *i*, *I*<sub>i</sub> is the score of the indicator *i*. This gives a single score for each of the five indicator categories (chemical, physical, biological, management, and environmental) for a specific year.


```{r echo = TRUE, eval=FALSE}    
    # calculate weighted average per indicator category
    out.score <- out.score[,list(value = sum(cf * pmax(0,value) / sum(cf[value >= 0]))), by = list(cat,year)]
  
      # for case that a cat has one indicator or one year and has NA
      out.score[is.na(value), value := -999]
```

## Aggregation over years
To account for the entire crop rotation, OBIC aggregates scores of multiple years. For the aggregation over years, another correction factor *ycf* is used to give more weight to recent years on a logarithmic scale. OBIC set the maximum length of period as 10 years, as crop rotation in the Netherlands are hardly ever longer than 10 years. When data older than 10 years ago is used, then those years get the same weight as 10 years ago.
*ycf* is formulated as:
$$
ycf = ln(12 - min(y, 10))
$$
where *y* is the length of years before the assessment. *y = 1* means the year for which the assessment is conducted for (i.e. the most recent year).

This gives the correction factors for a period of eleven years as follows (from the most recent years to 11 years before):
```{r output year cf}
  # create data
  y <- 1:11
  cf <- log(12 - pmin(10, y))
  cat(round(cf, 3))
```

The most recent year carries about `r round(cf[1]/cf[10],digits =1)` times the weight of the tenth year. Notice that years ten and eleven have the same correction factor value, the minimum *ycf* value for a year is equal to that of year ten.

More priority (weight) is given to recent years because they better reflect the current situation. Additionally, changes in management or soil properties have a more visible effect on the scores in the recent years.

Aggregation of scores over years is done with the following two lines of code. This is analogue to the aggregation procedure within each category as described above (i.e. sum of weighted score, scaled by the sum of all weighing factors).

The aggregation procedure is coded in the following lines.
      
```{r echo = TRUE, eval=FALSE}
            
    # calculate correction factor per year; recent years are more important
    out.score[,cf := log(12 - pmin(10,year))]
  
    # calculate weighted average per indicator category per year
    out.score <- out.score[,list(value = sum(cf * pmax(0,value)/ sum(cf[value >= 0]))), by = cat]
```

This gives us a single score for each of the five indicator categories (chemical, physical, biological, management, and environmental), without time dimension.

## Aggregation to single OBI score
The scores of five indicator categories are aggregated to a single, holistic, OBI-score. The category scores are weighed logarithmically based on the number of indicators underlying the category. The number of indicators per category was retrieved previously with the line  
`dt.melt.ncat <- dt.melt[year==1 & !cat %in% c('IBCS','IM')][,list(ncat = .N),by='cat']`.  
Now its merged with our score data.table.

```{r echo = TRUE, eval=FALSE}
      # merge out with number per category
      out.score <- merge(out.score,dt.melt.ncat, by='cat')
```

The correction factor for each category, *ccf*, are computed based on the number of indicators as:

$$
ccf = ln(ncat+1)
$$
where *ncat* is the number of the underlying indicators within the category.
The weights for categories with 1 to 10 indicators are: `r round(log(1:10 +1),2)`. Thus, a category based on 10 indicators affects the total score roughly `r round(log(1:10 +1)[10]/log(1:10 +1)[1],1)` times more than a category based on 1 indicator. The idea behind giving more weight to categories with more underlying indicators sprouts from the idea that such a category is better supported by measurable data and better understood. 
Finally, the total OBIC score is calculated by summing up the weighted scores of 5 categories and dividing it by the sum of the weighing factors, in the same way as the other 2 aggregation steps.

```{r eval = FALSE, include = FALSE}
# this text is probably wrong
# Furthermore, by aggregating indicators to categories and then to a score rather than directly from indicators to a score; individual indicators from categories with few underlying indicators, affect the holistic score more than indicators in categories with many indicators. For example, if there is one biological indicator, its weight in affecting the holistic score is log(1+1)= `r round(log(1+1),2)`, while a indicator within a chemical indicator with nine indicators individually only weighs log(9+1)/9= `r round(log(9+1)/9,2)`. While on category level, biology only weighs `r round(log(1+1),2)` and chemical `r round(log(9+1),2)`.
```

This aggregation procedure is coded in the following lines.

```{r echo = TRUE, eval=FALSE}
    # calculate weighing factor depending on number of indicators
    out.score[,cf := log(ncat + 1)]
    # calculated final obi score
    out.score <- rbind(out.score[,list(cat,value)],
                       out.score[,list(cat = "T",value = sum(value * cf / sum(cf)))])
```  

After the aggregation there is just a bit of code to format the names of the scores.
```{r echo = TRUE, eval=FALSE}
    # update element names
    out.score[,cat := paste0('S_',cat,'_OBI_A')]
    out.score[, value := round(value,3)]
```

## Brief recap

* Soil functions with low scores gain more weight than ones with high scores because these soil functions are supposed to be more limiting. This makes it more worthwhile (both in reality as for the OBI score) to invest in improving low scores
* Values from recent years count more than values from long ago. Recent years are more reflective of the current situation and it becomes easier to see the effect of changes in management or soil properties in subsequent years
* Categories with more underlying indicators have more weight in determining the total OBI score. This is because these indicators are better understood and supported and may also be more important.


```{r make mock data, eval= TRUE}

# make mock data and calculate scores with different aggregation methods (averaging without weight and averaging with linearly changing weight)
# data like:
# soil_function_value|indicator|group|year|cf_base|cf_noweight|cf_linearweight|score_base|score_noweight|score_linearweight

# visualise differences

# make veldnr
fieldid <- 1

# define standard deviation 
std <- 0.2

# make indicator
inds <- c('I_C_CEC', 'I_C_CU', 'I_C_K', 'I_C_MG', 'I_C_N', 'I_C_P', 'I_C_PH', 'I_C_S', 'I_C_ZN',
          'I_B_DI', 'I_B_SF', 
          'I_E_NGW', 'I_E_NSW',
          'I_M',
          'I_P_CEC', 'I_P_CO', 'I_P_CR', 'I_P_DS', 'I_P_DU', 'I_P_SE', 'I_P_WRI', 'I_P_WS')

# make jaar
year <- 1:10

# combine in dt
dt <- data.table(field = sort(rep(fieldid,length(inds)*length(year))),
                 indicator = sort(rep(inds, length(year)*length(fieldid))),
                 year = rep(year, length(inds)*length(fieldid))
                 )

# add category
dt <- dt[,cat := tstrsplit(indicator,'_',keep = 2)]

# iteratively add fields
dto <- data.table(field = NULL, indicator = NULL, year = NULL)
for(i in 1:100){
  dtn <- dt
  dtn <- dtn[,field := i]
  dto <- rbindlist(list(dto, dtn))
}

# dto is a almost ready set of 100 fields, only values and value description need to be added
set.seed(222)

# helper function to make random values within 0:1
rtnorm <- function(n, mean = 0, sd = 1, min = 0, max = 1) {
    bounds <- pnorm(c(min, max), mean, sd)
    u <- runif(n, bounds[1], bounds[2])
    qnorm(u, mean, sd)
}

# make baseline
dt1 <- copy(dto)
dt1 <- dt1[,field := field+100-1]
dt1 <- dt1[,treatment := 'baseline']
dt1 <- dt1[,value := rtnorm(n = nrow(dt1),mean = 0.7, sd = std)]

# make treatment where c = 0.3
dt2 <- copy(dto)
dt2 <- dt2[,field := field+200-1]
dt2 <- dt2[,treatment := 'low C values']
dt2 <- dt2[cat == 'C',value := rtnorm(n = nrow(dt2[cat=='C']),mean = 0.3, sd = std)]
dt2 <- dt2[!cat == 'C',value := rtnorm(n = nrow(dt2[!cat=='C']),mean = 0.7, sd = std)]

# make treatment where B = 0.3
dt3 <- copy(dto)
dt3 <- dt3[,field := field+300-1]
dt3 <- dt3[,treatment := 'low B values']
dt3 <- dt3[cat == 'B',value := rtnorm(n = nrow(dt3[cat=='B']),mean = 0.3, sd = std)]
dt3 <- dt3[!cat == 'B',value := rtnorm(n = nrow(dt3[!cat=='B']),mean = 0.7, sd = std)]

# make treatment where one C indicator = 0
dt4 <- copy(dto)
dt4 <- dt4[,field := field+400-1]
dt4 <- dt4[,treatment := 'one low C']
dt4 <- dt4[indicator == 'I_C_CEC',value := 0]
dt4 <- dt4[!indicator == 'I_C_CEC',value := rtnorm(n = nrow(dt4[!indicator == 'I_C_CEC']),mean = 0.7, sd = std)]

# make where one B indicator = 0
dt5 <- copy(dto)
dt5 <- dt5[,field := field+500-1]
dt5 <- dt5[,treatment := 'one low B']
dt5 <- dt5[indicator == 'I_B_DI',value := 0]
dt5 <- dt5[!indicator == 'I_B_DI',value := rtnorm(n = nrow(dt5[!indicator == 'I_B_DI']),mean = 0.7, sd = std)]

# make treatment where recent years score low and old years high
dt6 <- copy(dto)
dt6 <- dt6[,field := field+600-1]
dt6 <- dt6[,treatment := 'Recent years low']
dt6 <- dt6[year %in% 1:5, value := rtnorm(n = nrow(dt6[year %in% 1:5]), mean = 0.3, sd = std)]
dt6 <- dt6[!year %in% 1:5, value := rtnorm(n = nrow(dt6[!year %in% 1:5]), mean = 0.7, sd = std)]

# make treatment where recent years score high and old years low
dt7 <- copy(dto)
dt7 <- dt7[,field := field+700-1]
dt7 <- dt7[,treatment := 'Recent years high']
dt7 <- dt7[!year %in% 1:5, value := rtnorm(n = nrow(dt7[!year %in% 1:5]), mean = 0.3, sd = std)]
dt7 <- dt7[year %in% 1:5, value := rtnorm(n = nrow(dt7[year %in% 1:5]), mean = 0.7, sd = std)]

# combine all data
dta <- rbindlist(list(dt1, dt2, dt3, dt4, dt5, dt6, dt7))

# make sure all values are between 0 and 1
if(any(dta$value >1|dta$value<0)){cat('values outside acceptable bounds')}

```

## Comparison with other aggregation methods
By now, we have some understanding of how measured soil function data are aggregated to an integral score within the current OBIC framework. So, now we can explore and reflect on some of the choices that were made in designing this aggregation process. The first choice we will reflect upon is that of the correction factors. In the OBIC framework, these are determined logarithmically but could also be determined linearly or not be used at all. Second, we will reflect on our choice of 2-step aggregation (i.e. first aggregated to categories and then aggregated to a holistic score),instead of aggregating indicators directly to a holistic score.

### Data description
To reflect on alternative aggregation methods we have made a mock data.table similar to a data.table in the obic_field function just before aggregating scores. We will compare the aggregation methods with seven scenarios or treatments. The treatments are as follows:

1. **Baseline** means of all values are around 0.67
2. **Low C values** means of Chemical indicators are around 0.33, indicators in other categories are around 0.67
3. **Low B values** means of Biological indicators are around 0.33, indicators in other categories are around 0.67
4. **One low C value** one chemical indicator is set to 0, all other values are generated as in the baseline
5. **One low B value** one biological indicator is set to 0, all other values are generated as in the baseline
6. **Recent years low** mean values of the most recent five years are around 0.33, while the five years before that have mean values of around 0.67
7. **Recent years high** mean values of the most recent five years are around 0.67, while the five years before that have mean values of around 0.33

Each treatment has a 100 replicates whose soil function scores are randomly drawn from a distribution with a standard deviation of approximately `r std[1]`. The mean of the distributions depends on the scenario. All values are in the 0 to 1 range.

```{r plot orignal values histogram,fig.width = 7, fig.height = 4,fig.fullwidth = TRUE, fig.cap = 'Figure 5. Distribution of indicator values per scenario as histogram'}
ggplot(dta, aes(x = value, fill = cat)) +
  geom_histogram(bins = 40) + 
  theme_bw() +facet_wrap(~treatment, ncol = 4)

```

```{r table with mock data before aggregation}
# get relevant data from dta
dtat <- dta[, mean(value), by = c( 'cat', 'treatment')]

# rounc V1
dtat <- dtat[, V1 := round(V1, digits = 3)]

# improve category descrition
dtat <- dtat[,cat := paste('mean', cat)]

# dcast
dtat <- dcast(dtat ,treatment~cat, value.var = 'V1')

# factorise and order treatment
dtat <- dtat[, treatment := factor(treatment, levels = c('baseline', 'low B values', 'low C values',
                                                         'one low B', 'one low C',
                                                         'Recent years high', 'Recent years low'))]

# order dtat
setorder(dtat)

# make table 
  knitr::kable(dtat,
               caption = 'Mean scores per category for each scenario')

```


### Correction factors
To compare the influence of different correction factors, we conducted the three aggregation steps with three methods to compute a correction factor (*cf*): 

* logarithmically
* linearly
* no correction (averaging all values, cf = 1)

The log and linear cf's are illustrated in Figure 2 in the range that they operate. **value** is the correction factor for the indicator, which ranges between 0 to 1. **year** is the correction factor for the year a measurement is from, 1 being the most recent year, 10 being ten years earlier. **ncat** is the correction for the number of soil indicators within a category. For Chemical indicators, this typically is `r length(grep('I_C_', names(obic_field_dt(binnenveld[ID==1], output = 'indicators'))))`.
The slope of the linear correction factor is chosen such that the highest and lowest correction factor is identical between linear and logarithmic scale. In practice, another slope could be chosen for linear aggregation.
The no correction method is not presented in Figure 2 as it would be a horizontal line with an arbitrary value.

```{r plot mock cf, fig.width = 7, fig.height = 3,fig.fullwidth = TRUE, fig.cap = 'Figure 2. Correction factors calculated with linear or logarithmic methods per aggregation step.'}

  # plot correction factors
  pdtlog <- data.table(x = c(seq(0,1,0.1),rep(0:10,2)),
                       cf_type = c(rep('value',11),rep('year',11), rep('ncat', 11)))
  # calc cf's log
  pdtlog <- pdtlog[cf_type == 'value', cf := OBIC::cf_ind_importance(x)]
  pdtlog <- pdtlog[cf_type == 'year', cf := log(12 - pmin(10,x))]
  pdtlog <- pdtlog[cf_type == 'ncat', cf := log(x + 1)]
  pdtlog[,cf_method := 'log']

  # calc cf's linear
  pdtlin <- data.table(x = c(seq(0,1,0.1),rep(0:10,2)),
                      cf_type = c(rep('value',11),rep('year',11), rep('ncat', 11)))
  pdtlin <- pdtlin[cf_type == 'value', cf := 5-4.17*x]
  pdtlin <- pdtlin[cf_type == 'year', cf := 2.59-0.19*x]
  pdtlin <- pdtlin[cf_type == 'ncat', cf := x*log(11)/10]
  pdtlin[,cf_method := 'linear'] 

  # combine
  pdt <- rbindlist(list(pdtlog, pdtlin))

  # format pdt
  pdt <- pdt[,cf_type := factor(cf_type, levels = c('value', 'ncat', 'year'))]

# plot
gg <- ggplot(pdt, aes(x = x, y = cf, color = cf_method, group = cf_type))+
      geom_point() +
      theme_bw() + 
      facet_wrap(~cf_type, ncol = 3, scales = 'free') + 
      scale_colour_viridis_d()+
      xlab('') + ylab('cf (weight)')

# plot a line in each
for(i in 1:uniqueN(pdt$cf_method)){
  gg <- gg + geom_line(data = pdt[cf_method == unique(pdt$cf_method)[i]], color = c('#FDE725FF', '#440154FF')[i])
}

# plot gg
gg
```

```{r calc correction factors}

# Use three ways to calc correction factors (giving weight to each value), log (standard in OBIC), lin (linearly increasing/decreasing), non (everything has the same weight)

# value correction factors ======
dt <- copy(dta)

# function to add  correction factors
addcf <- function(dt){
  
  # copy input
  dt.int <- copy(dt)
  
  # add correction factor for indicator value
	dt.int[,log := OBIC::cf_ind_importance(value)]
	dt.int[,lin := 5-4.17*value]
	dt.int[,non := 1]
	
	# melt dt by cf method
	dt.int <- melt(dt.int, measure.vars = c('log', 'lin', 'non'), value.name = 'v_cf', variable.name = 'cf_method')
	
	# calculate cf for cat ====
	dt.int[,ncat := .N,by=c('field','year','cat','cf_method')]
	
	# add correction factor for number of categories
	dt.int[cf_method == 'log',c_cf := log(ncat + 1)]
	dt.int[cf_method == 'lin',c_cf := ncat*log(10 + 1)/10]
	dt.int[cf_method == 'non',c_cf := 1]
	
	# dd correction factor for number of years
	dt.int[cf_method == 'log',y_cf := log(12 - pmin(10,year))]
	dt.int[cf_method == 'lin',y_cf := 2.59-0.19*year]
	dt.int[cf_method == 'non',y_cf := 1]
}

# add correction factors
dt <- addcf(dt)

```

```{r calc correction factors 2}

# Use three ways to calc correction factors (giving weight to each value), log (standard in OBIC), lin (linearly increasing/decreasing), non (everything has the same weight)
# for log and lin, 4 scenarios are added (c_cf only, v_cf only, y-cf only, or all)

# value correction factors ======
dtp <- copy(dta)

# function to add  correction factors
addcf2 <- function(dt){
  
  # copy input
  dt.int <- copy(dt)
  
  # add correction factor for indicator value
	dt.int[,log_all := OBIC::cf_ind_importance(value)]
	dt.int[,log_vcf := OBIC::cf_ind_importance(value)]
	dt.int[,log_ccf := 1]
	dt.int[,log_ycf := 1]
	dt.int[,lin_all := 5-4.17*value]
	dt.int[,lin_vcf := 5-4.17*value]
	dt.int[,lin_ccf := 1]
	dt.int[,lin_ycf := 1]
	dt.int[,non := 1]
	
	# melt dt by cf method
	dt.int <- melt(dt.int, measure.vars = c('log_all', 'log_vcf', 'log_ccf', 'log_ycf',
	                                        'lin_all', 'lin_vcf', 'lin_ccf', 'lin_ycf', 
	                                        'non'), 
	               value.name = 'v_cf', variable.name = 'cf_method')
	
	# calculate cf for cat ====
	dt.int[,ncat := .N,by=c('field','year','cat','cf_method')]
	
	# add correction factor for number of categories
	dt.int[cf_method == 'log_all',c_cf := log(ncat + 1)]
	dt.int[cf_method == 'log_ccf',c_cf := log(ncat + 1)]
	dt.int[cf_method == 'log_vcf',c_cf := 1]
	dt.int[cf_method == 'log_ycf',c_cf := 1]
	dt.int[cf_method == 'lin_all',c_cf := ncat*log(10 + 1)/10]
	dt.int[cf_method == 'lin_ccf',c_cf := ncat*log(10 + 1)/10]
	dt.int[cf_method == 'lin_vcf',c_cf := 1]
	dt.int[cf_method == 'lin_ycf',c_cf := 1]
	dt.int[cf_method == 'non',c_cf := 1]
	
	# dd correction factor for number of years
	dt.int[cf_method == 'log_all',y_cf := log(12 - pmin(10,year))]
	dt.int[cf_method == 'log_ycf',y_cf := log(12 - pmin(10,year))]
	dt.int[cf_method == 'log_vcf',y_cf := 1]
	dt.int[cf_method == 'log_ccf',y_cf := 1]
	dt.int[cf_method == 'lin_all',y_cf := 2.59-0.19*year]
	dt.int[cf_method == 'lin_ycf',y_cf := 2.59-0.19*year]
	dt.int[cf_method == 'lin_vcf',y_cf := 1]
	dt.int[cf_method == 'lin_ccf',y_cf := 1]
	dt.int[cf_method == 'non',y_cf := 1]
	
	return(dt.int)
}

# add correction factors
dt3 <- addcf2(dtp)

```


```{r aggregate scores}

# make function to aggregate scores
aggscores <- function(dt) {
  
  # copy input
  dt.agg <- copy(dt)
  
  # calculate weighted value per category and year
	dt.agg <- dt.agg[,list(w.value = sum(v_cf* pmax(0,value) / sum(v_cf[value >= 0])),
	                       y_cf = mean(y_cf),
	                       c_cf = mean(c_cf)),
	                  by = .(treatment,field, cf_method, cat, year)]
	
	# calculated weighted average value per category (so mean over years)
	dt.agg <- dt.agg[,list(wy.value = sum(y_cf * pmax(0, w.value) / sum(y_cf[w.value >= 0])),
	                       c_cf = mean(c_cf)),
	                  by = .(treatment,field, cf_method, cat)] 
	
	# calculated weighted average value per field (so mean over categories)
	dt.agg.tot <- dt.agg[,list(value = sum(wy.value * c_cf / sum(c_cf))), 
	                      by = .(treatment,field, cf_method)]
	
	# output
	dt.out <- rbind(dt.agg[,.(treatment,field,cf_method,cat,value = wy.value)],
	                dt.agg.tot[,.(treatment,field,cf_method,cat='total',value)])
	
	}

# add scores to dt
dt.out <- aggscores(dt)
dt3.out <- aggscores(dt3)

```


```{r orig score funs of brent, eval=TRUE}


aggscores_brent <- function(dt) {
  # copy input
  dt.agg <- copy(dt)
  
  	# calculate weighted value per category and year
	dt.agg <- dt.agg[,w.value := sum(v_cf* pmax(0,value) / sum(v_cf[value >= 0])), by = .(field, cf_method, cat, year)]
	
	# calculated weighted average value per category (so mean over years)
	dt.agg <- dt.agg[,wy.value := sum(y_cf * pmax(0, w.value) / sum(y_cf[w.value >= 0])), by = .(field, cf_method, cat)] # scores per category
	
	# calculate total obi score (log method)
	dt.agg <- dt.agg[,S_T := sum(wy.value * c_cf / sum(c_cf)), by = .(field, cf_method)]
	# calculate total obi score (lin method)
	dt.agg[, c_cf_lin := ncat*log(10 + 1)/10]
	dt.agg <- dt.agg[,S_T_catlin := sum(wy.value * c_cf_lin / sum(c_cf_lin)), by = .(field, cf_method)]
	# calculate total obi score (non method)
	dt.agg[, c_cf_non := 1]
	dt.agg <- dt.agg[,S_T_catnon := sum(wy.value * c_cf_non / sum(c_cf_non)), by = .(field, cf_method)]
	
	# calculate total obi score if not aggregated by cat
	dt.agg <- dt.agg[, nocat.value := sum(v_cf* pmax(0,value) / sum(v_cf[value >= 0])), by = .(field, cf_method, year)]
	dt.agg <- dt.agg[, S_T_nocat := sum(y_cf * pmax(0, nocat.value) / sum(y_cf[nocat.value >= 0])), by = .(field, cf_method)]
	
	# select data for scores
	dts <- unique(dt.agg[,.(field, indicator,cat, wy.value, S_T ,treatment, cf_method, S_T_nocat, S_T_catlin, S_T_catnon)])
	
	# reshape dts so total scores are in same column as cat scores (with T being a cat)
	dts1 <- unique(dts[,.(field, cat, wy.value, treatment, cf_method)])
	dts2 <- unique(dts[,.(field, S_T, treatment, cf_method)])
	dts3 <- unique(dts[,.(field, S_T_nocat, treatment, cf_method)])
	dts4 <- unique(dts[,.(field, S_T_catlin, treatment, cf_method)])
	dts5 <- unique(dts[,.(field, S_T_catnon, treatment, cf_method)])
	
	# rename  cols
	setnames(dts1, 'wy.value', 'score')
	setnames(dts2, 'S_T', 'score')
	setnames(dts3, 'S_T_nocat', 'score')
	setnames(dts4, 'S_T_catlin', 'score')
	setnames(dts5, 'S_T_catnon', 'score')
	# add cat column to dts2
	dts2$cat <- 'T_cf_log'
	dts3$cat <- 'T_nocat'
	dts4$cat <- 'T_cf_lin'
	dts5$cat <- 'T_cf_non'
	
	# bind scores dt's
	dt.agg <- rbindlist(list(dts1, dts2, dts3, dts4, dts5), use.names = TRUE)
	
	# update element names
	dt.agg[,cat := paste0('S_',cat)]
	dt.agg[, score := round(score,3)]
	
	# factorise cat and cf_method
	dt.agg <- dt.agg[, cat := factor(cat, levels = c('S_T_cf_log', 'S_C', 'S_P',
											'S_B', 'S_E','S_M',
											"S_T_cf_lin", "S_T_cf_non", "S_T_nocat"))]
	dt.agg <- dt.agg[, cf_method := factor(cf_method, levels = c('log', 'lin', 'non'))]
}

dt2 <- aggscores_brent(dt)
```

### Effects of different aggregation method on OBIC total score

The effects of different aggregation methods on total score are shown in the box plot. Red dashed lines show arithmetic means of all 22 indicators.

```{r plot baselines scores, fig.width = 7, fig.height = 7,fig.fullwidth = TRUE, fig.cap = 'Figure 3. Total OBI score boxplots per aggregation method for each scenario.'}

# arithmetric mean of all indicators (this should be equal to lin_ccf?)
arimean <- dt[cf_method == "non", arimean := mean(value), by = c("field", "treatment")]
arimean <- dta[,arimean := mean(value), by = c("field", "treatment")]
arimean2 <- unique(arimean[, .(field, treatment, arimean)])
arimean3 <- arimean2[, median(arimean), by = treatment]

# plot
ggplot(dt.out[cat == 'total'], aes(x = value, y = cf_method)) +
        geom_vline(data = arimean3, aes(xintercept = V1), col = "red", lty = 2) +
        geom_boxplot() +
        theme_bw() + scale_colour_viridis_d() + scale_y_discrete(limits = rev) +
        coord_cartesian(xlim = c(0,1)) +
        facet_wrap(~treatment, ncol = 1)


```

```{r OBI score summary table}

  # get relevant data from dt
  dtt <- dt.out[, list(value = round(mean(value),3)), by = c( 'cat', 'cf_method', 'treatment')]

  # improve category discretion
  dtt <- dtt[,ct := paste('mean', cat)]

  # dcast
  dtt <- dcast(dtt ,treatment+cf_method~cat, value.var = 'value')

  # factorise and order treatment
  dtt <- dtt[, treatment := factor(treatment, levels = c('baseline', 'low B values', 'low C values',
                                                         'one low B', 'one low C',
                                                         'Recent years high', 'Recent years low'))]

  # order 
  setorder(dtt, treatment, cf_method)

# make table 
  knitr::kable(dtt,
               caption = 'Mean scores per category and total per aggregation method')
  
  # make dtt version with just total scores to use for in text reporting
  dttt <- dtt[,.(treatment, cf_method, S_T_OBI_A = total)]
```


```{r plot original values, eval=FALSE, fig.width = 7, fig.height = 20,fig.fullwidth = TRUE, fig.cap = 'Figure 4. Distribution of indicator values  per scenario.'}
# factorise dta cat levels
# dta <- dta[, cat := factor(cat, levels = c('C', 'P', 'B', 'E'))]
# dta <- dta[, indicator := factor(indicator, levels = c("I_C_CEC","I_C_CU", "I_C_K",  "I_C_MG", "I_C_N",  "I_C_P",  "I_C_PH", "I_C_S",  "I_C_ZN", 
#                                                        "I_P_CR", "I_P_DS", "I_P_DU", "I_P_SE", "I_P_WRI", "I_P_WS","I_P_CEC","I_P_CO", 
#                                                        "I_B_DI", "I_B_SF", "I_E_NGW","I_E_NSW"))]
# 
# # plot
# ggplot(dta, aes(x = value, y = indicator, color = cat))+
#   geom_boxplot() +
#   theme_bw() + coord_cartesian(xlim = c(0,1)) + scale_y_discrete(limits = rev)+
#   facet_wrap(~treatment, ncol = 1)

```



In the baseline scenario, total scores are slightly lower when aggregating logarithmically or linearly compared to using no special aggregation method. In all three methods, the indicator values in the baseline are around `r round(mean(dta[treatment == 'baseline', value]),2)`, this number is preserved in the score when averaging all indicator values (cf_method = non) while 'log' and 'lin' scores are on average `r round(mean(dta[treatment == 'baseline', value]) - dtt[treatment == 'baseline'& cf_method == 'log',total],2)` and `r round(mean(dta[treatment == 'baseline', value]) - dtt[treatment == 'baseline'& cf_method == 'lin',total],2)` lower. The change of 0.05 (in the score ranging between 0 and 1) may seem small, but it is quite large compared to the standard deviation of the distribution of the indicator values (which is 0.2). So, the difference in aggregation methods can influence the total score substantially. 

The 'lin' and 'log' methods yield lower average scores because they penalize low indicator values (i.e. left-hand side in their distribution) with the weighing factor *vcf*. Since the 'lin' and 'log' methods were harmonised for their highest value at low indicator value (cf = 5 for value = 0), the penalty at intermediate indicator values are much higher for 'lin' method. This makes the average value of the OBIC score lower for 'lin' than 'log' methods. The other two aggregation steps (year aggregation (with *ycf* ) and category aggregation (with *ccf*)) causes no difference in the baseline scenario. 

Note that the variation of total score is not larger for 'lin' and 'log' than 'non', irrespective of the larger variations in the correction factors for 'lin' and 'log'. To illustrate the patterns in more details, we zoom up to the baseline scenario (Fig 4). Here we split the 'log' method into 4 variations: when 'log' method is applied for all 3 aggregation steps ('log_all'), only for the indicator aggregation ('log_vcf'), year aggregation ('log_ycf'), or category aggregation ('log_ccf'). Same applies to 'lin' method. Standard deviation of total score is slightly smaller when 'log' or 'lin' method is used (SD = `r round(dt3.out[cat == 'total' & treatment =="baseline" & cf_method == "log_all", sd(value)],3)` and `r round(dt3.out[cat == 'total' & treatment =="baseline" & cf_method == "lin_all", sd(value)],3)`, respectively) compared to when no special aggregation method is used ('non', SD = `r round(dt3.out[cat == 'total' & treatment =="baseline" & cf_method == "non", sd(value)],3)`). Looking closer at different steps of aggregation, the variation increases due to indicator aggregation (*vcf*) and the year aggregation (*ycf*), but decreased due to category aggregation (*ccf*).  As a result, the variation becomes smaller when linear or log method is applied to all 3 aggregation steps, largely owing to the category aggregation step. Category aggregation, irrespective of the method used for correction factor, increases variation in total score because an extreme indicator value in categories with a few indicators is strongly reflected in the total score and therefore increase the variation, compared to when total scores is directly computed from all indicators. By using 'log' or 'lin' methods in category aggregation, both of which give smaller weights to categories with a few indicators, this impact is diluted, and therefore the increase in variation becomes limited.

```{r plot baselines scores zoomed in, fig.width = 7, fig.height = 2,fig.fullwidth = TRUE, fig.cap = 'Figure 4. Total OBI score per aggregation method for baseline sceinario.'}

# plot (different colors per method, 'lin' and 'log' are divided into 4 variation)
dt3.out[, method_main := substr(cf_method, 1, 3)]
dt3.out[, alpha := 1]
dt3.out[!grepl("_all|non", cf_method), alpha := 0.5]

ggplot(dt3.out[cat == 'total' & treatment =="baseline"],
       aes(x = value, y = cf_method, fill = method_main, alpha = alpha)) +
  # geom_vline(data = arimean3, aes(xintercept = V1), col = "red", lty = 2) +
  geom_boxplot() +
  theme_bw() + scale_colour_viridis_d() + scale_y_discrete(limits = rev) +
  labs(fill = "")+
  guides(alpha = F)+ xlab("total score") + ylab("scenario") +
  coord_cartesian(xlim = c(0,1))

# check standard deviation
#dt3.out[cat == 'total' & treatment =="baseline", sd(value), by = cf_method]
```


In the two scenario's where one category performs poorly; 'low B values' and 'low C values', we can see the effect of using a correction factor for the number of indicators in a category. The total score in 'low B values' of 'non' method dropped by `r dtt[treatment == 'baseline'& cf_method == 'non', total]-dtt[treatment == 'low B values'& cf_method == 'non', total]` from the baseline. This decrease is larger than the other two methods: 'log' and 'lin'  dropped `r dtt[treatment == 'baseline'& cf_method == 'log', total]-dtt[treatment == 'low B values'& cf_method == 'log', total]` and `r dtt[treatment == 'baseline'& cf_method == 'lin', total]-dtt[treatment == 'low B values'& cf_method == 'lin', total]` from the baseline scenario. In contract, in scenario 'low C values, 'log', 'lin' dropped more drastically than 'non': the decrease in total scores was `r paste0(dtt[treatment == 'baseline'& cf_method == 'log', total]-dtt[treatment == 'low C values'& cf_method == 'log', total], ', ', dtt[treatment == 'baseline'& cf_method == 'lin', total]-dtt[treatment == 'low C values'& cf_method == 'lin', total], ' and ', round(dtt[treatment == 'baseline'& cf_method == 'non', total]-dtt[treatment == 'low C values'& cf_method == 'non', total],3))` points for 'log', 'lin', and 'non' methods, respectively. The contrasting patterns in the change in the total score illustrate the effect of the aggregation from the 5 category scores to a single total score, which is based on the number of indicators in each category. The 'lin' and 'log' aggregation methods are more sensitive to low scores in C (a category with 9 indicators) than low scores in B (which has 2 indicators) compared to 'non' which is equally sensitive to a low B or C value. Furthermore, 'log' gives relatively heavier weight to a category with a few indicators relative to that with many indicators than 'log': the ratio between the weight on C and B scores is `r (9*log(10 + 1)/10)/(2*log(10 + 1)/10)` for 'lin' and `r round(log(9+1)/log(2+1), 1)` for 'log'.

The sensitivity of the scores to the number of indicators in a category is even more pronounced when looking at scenario's 'one low C' and 'one low B', where either one chemical indicator or one biological indicator was set to 0. There is no difference between these scenario's for the 'non' aggregation method, in both scenario's a score of around `r round(mean(dtt[treatment %in% c('one low C', 'one low B')& cf_method == 'non', total]),3)` is achieved, `r round(abs(mean(dtt[treatment %in% c('one low C', 'one low B')& cf_method == 'non', total])-dtt[treatment == 'baseline'& cf_method == 'non', total]),3)` points lower than in the baseline. When using 'log' and 'lin' methods, scores drop `r paste0(round(dtt[treatment == 'baseline'& cf_method == 'log', total]-dtt[treatment == 'one low C'& cf_method == 'log', total], 3), ' and ',dtt[treatment == 'baseline'& cf_method == 'lin', total]-dtt[treatment == 'one low C'& cf_method == 'lin', total])` in one low C and `r paste0(round(dtt[treatment == 'baseline'& cf_method == 'log', total]-dtt[treatment == 'one low B'& cf_method == 'log', total], 3), ' and ',round(dtt[treatment == 'baseline'& cf_method == 'lin', total]-dtt[treatment == 'one low B'& cf_method == 'lin', total],3))` in one low B. Note that the drop in 'one low B' and 'one low C' scenarios compared to the baseline scenario is larger for 'log' than 'lin' method, showing that 'log' method is suitable for highlighting poorly-scoring indicators. In other words, 'lin' method imposes relatively severe penalty on moderately-scoring indicators, whereas it does not let poorly-scoring indicators stand out as much as 'log' method does.

The scenario's 'Recent years high' and 'Recent years low' were included to illustrate the effect of different methods to aggregate multiple year records. In 'Recent years low' scenario, scores calculated with 'lin' and 'log' are substantially lower than 'non' method, because the lower scores in recent years gained heavier weight. In 'Recent years high' scenario, 'log' and 'lin' scores only slightly higher than 'non'. Although the high recent values in 'lin' and 'log' received higher weights in the aggregation of multiple years, they are canceled out by old, low indicator values, which are severely penalized from being low. 

Using correction factors, either logarithmically or linearly, can make scores more responsive to recent year, low indicator values, and categories with many underlying indicators. The magnitude of the sensitivity depends on the parameter values of the correction factors. Thus, the sensitivity presented above is merely the consequence of our current (arbitrary) choice of the parameter values, and it can be adjusted when necessary.
The scores calculated with the logarithmic and linear aggregation method do not deviate largely, yet they tune the scores in slightly different ways. The difference is summarized below:

1. Linear aggregation method penalize intermediately-scoring indicator relatively heavily, whereas logarithmic aggregation method penalize poorly-scoring indicator heavily. If highlighting limiting soil functions is the main purpose, then logarithmic aggregation method is preferred.
2. Both logarithmic and linear methods intend to give heavier weight on categories with many indicators. However, the relative weight on categories with few indicators (compared to that with many indicators) are larger for logarithmic than linear methods. In other words, the contribution of a single indicator to the total score is equal for all indicators in the linear methods, whereas in the logarithmic method that is smaller for the categories with many indicators than those with a few indicators. 
3. Both logarithmic and linear methods gives higher weight on recent years than old years. The relative weight on recent years is slightly higher for 'lin' than 'log' method, but the difference is minor.


### Effects of aggregating categories on OBIC total score
Grouping indicators in categories and aggregating these categories to a single score is a choice, its not mathematically necessary. Here we look deeper at how our choice of the category aggregation influence the behavior of the OBIC score.

In the figure below, we calculated the total OBIC score both with category aggregation ('S_T_cf_log'; with log' method) and without category aggregation ('S_T_nocat'; total score was calculated directly from all indicators).  The other 2 aggregation steps were done with 'log' method for both scenarios. 

When indicators of 1 category have low scores (i.e. 'low B values' and 'low C values'), the total score becomes higher when the category aggregation is executed. This is because the category aggregation assures that the impact of a category on total score does not exceed a fixed proportion: the contribution of C, P, B, E and M category to the total score is `r  round(100*log(9+1)/sum(log(c(9,8,2,2,1)+1)),0)`, `r  round(100*log(8+1)/sum(log(c(9,8,2,2,1)+1)),0)`, `r  round(100*log(2+1)/sum(log(c(9,8,2,2,1)+1)),0)`, `r  round(100*log(2+1)/sum(log(c(9,8,2,2,1)+1)),0)`, `r  round(100*log(1+1)/sum(log(c(9,8,2,2,1)+1)),0)`%, respectively. When no category aggregation is done, then influence of the poor indicator values can influence the total score more prominently.
When a single indicator has low scores ('one low B' and 'one low C') similar patterns were observed: the impact of the poor indicator on total score was smaller when category aggregation was one, especially in 'one low B' scenario.

Another advantage of using category aggregation is that it gives interpretable intermediate products of OBIC, in the form of 5 separate category scores. As shown in the previous section,the effects of the 3 aggregation steps on total score are large, and it is difficult to trace where and how the total score is influenced by different aggregation steps. In this light, computing category scores before aggregating to a total score is a nice way to provide disentangled insights to the users, allowing them to interpret different aspects of soils separately. 


```{r plot scores with and without cat, fig.width = 7, fig.height = 5,fig.fullwidth = TRUE, fig.cap = 'Figure 5. Scores when categories are ignored during aggregation (S_Tnocat_OBI_A) and regular aggregation (S_T_OBI_A).'}

#dt <- copy(dta)
#dt <- addcf(dt)
#dt2 <- aggscores_brent(dt)
dt2[, colv := "no category aggregation"]
dt2[grepl("S_T_cf", cat), colv := "with category aggregation"]
gg3 <- ggplot(dt2[cat %in% c('S_T_cf_log', 
                             #'S_T_cf_lin', 'S_T_cf_non', 
                             'S_T_nocat') & 
                  cf_method == 'log' & 
                  !treatment %in% c('Recent years low', 'Recent years high')],
              aes(x = score, y = cat)) +
  geom_boxplot(aes(fill = colv)) +
  labs(fill="") + xlab("total score") + ylab("")+
  theme_bw() + scale_y_discrete(limits = rev) +
  coord_cartesian(xlim = c(0,1)) +
  facet_wrap(~treatment, ncol = 1) #+ geom_boxplot(data = dt[cat == 'S_Tnocat_OBI_A'], mapping = aes(fill = 'blue')) 
gg3 
```


```{r, eval=FALSE}
  obic_field(B_SOILTYPE_AGR =  dt$B_SOILTYPE_AGR, B_GWL_CLASS =  dt$B_GWL_CLASS,
             B_SC_WENR = dt$B_SC_WENR, B_HELP_WENR = dt$B_HELP_WENR, B_AER_CBS = dt$B_AER_CBS,
             B_LU_BRP = dt$B_LU_BRP, A_SOM_LOI = dt$A_SOM_LOI, A_SAND_MI = dt$A_SAND_MI,
             A_SILT_MI = dt$A_SILT_MI, A_CLAY_MI = dt$A_CLAY_MI, A_PH_CC = dt$A_PH_CC,
             A_N_RT = dt$A_N_RT, A_CN_FR = dt$A_CN_FR,
             A_S_RT = dt$A_S_RT, A_N_PMN = dt$A_N_PMN,
             A_P_AL = dt$A_P_AL, A_P_CC = dt$A_P_CC, A_P_WA = dt$A_P_WA, A_CEC_CO = dt$A_CEC_CO,
             A_CA_CO_PO = dt$A_CA_CO_PO, A_MG_CO_PO = dt$A_MG_CO_PO, A_K_CO_PO = dt$A_K_CO_PO,
             A_K_CC = dt$A_K_CC, A_MG_CC = dt$A_MG_CC, A_MN_CC = dt$A_MN_CC,
             A_ZN_CC = dt$A_ZN_CC, A_CU_CC = dt$A_CU_CC, output = 'obic_score')
```

## Alternative aggregation methods on Binnenveld fields
The scenario's in the experiment above use artificial data and may not be representative of actual fields in the Netherlands. Therefore we will explore the aggregation methods described above using indicator values of fields in the `binnenveld` dataset.

```{r get binnenveld indicator values, eval=TRUE}
# cleanup bini if required
if(exists('bini')){rm(bini)}

# select columns
bcols <- names(binnenveld)[!grepl('BCS$', names(binnenveld))]

# get indicator values per field, for first 10 fields
for(i in unique(binnenveld$ID)[1:10]){
  # calc OBIC inidicators for i
  bini.n <- obic_field_dt(binnenveld[ID == i,..bcols], output = 'unaggregated')
  # re add ID
  bini.n <- bini.n[,ID := i]
  if(!exists('bini')){
    # if bini doesn't exist yet make it
    bini <- bini.n
  } else{
    # if bini exists, add bini.n to bini (binnenveld indicators)
    bini <- rbindlist(list(bini, bini.n))
  }
}

# remove inidicators not used for scoring
bini <- bini[!cat %in% c('BCS', 'IBCS', 'IM')]

# remove irrelevant columns
rmcols <- names(bini)[!grepl('^weight|cf|w$', names(bini))]
bini <- bini[,..rmcols]

# rename ID to field
setnames(bini, 'ID', 'field')
# add treatment
bini$treatment <- bini$field

```

```{r aggregate binneveld, eval=TRUE}
# add correction factors
bini <- addcf(bini)

# add scores
bini <- aggscores(bini)

```

```{r plot binnenveld scores, eval=TRUE, message = FALSE, fig.height= 9, fig.width= 8 ,fig.cap="Figure 6. Total and category OBI scores of binnenveld fields aggregated with 'log', 'lin' or 'non' method, as well as total scores when disregarding categories in aggregating scores."}
# make labels
ldt <- binnenveld[,.(ID, B_LU_BRP, B_SOILTYPE_AGR, B_GWL_CLASS)]
ldt <- ldt[ID %in% unique(ID)[1:10]]

# get most occurring soil type and crop type
ldt <- ldt[, lapply(.SD, function (x) names(sort(table(x),decreasing = TRUE)[1])), 
              .SDcols = c('B_LU_BRP','B_SOILTYPE_AGR', 'B_GWL_CLASS'),by = ID]
ldt[, B_LU_BRP := as.integer(B_LU_BRP)]

# add crop name
ldt <- merge(ldt, crops.obic[,.(crop_code, crop_name)], by.x = 'B_LU_BRP', by.y = 'crop_code')

# order ldt
setorder(ldt, ID)

# make cat labels more readable
bini[grepl('^T', cat), lcat := "Total"]
bini[grepl('^C', cat), lcat := "Chemical"]
bini[grepl('^B', cat), lcat := "Biological"]
bini[grepl('^P', cat), lcat := "Physical"]
bini[grepl('^M', cat), lcat := "Management"]
bini[grepl('^E', cat), lcat := "Environmental"]
bini[, lcat := factor(lcat, levels = c('Chemical', 'Physical', 'Biological', 'Environmental',
                                       'Management','Total'))]


# make plot
gg <- ggplot(bini, aes(x= lcat, y= value, color = cf_method)) +
  geom_point(size = 2,alpha = 0.5) +
  ylab('OBI-score') + xlab('') +
  theme_bw(12) + theme(panel.grid.major.x = element_blank(),
                       panel.grid.major.y = element_line(size = 0.5),
                       panel.grid.minor.x = element_blank()) +
  coord_cartesian(xlim = c(0,1))+ 
  scale_x_discrete(limits = rev)+ scale_y_continuous(breaks = c(0, 0.25, 0.5, 0.75, 1))+
  coord_flip() +
  facet_wrap(~field, ncol = 3) +
  theme(legend.position = c(0.8,0.1))

# show gg
gg

# show table with data on binnenveld fields
knitr::kable(ldt[,.(ID, B_SOILTYPE_AGR, B_GWL_CLASS, crop_name)], caption = 'Soiltype, groundwaterclass and most frequent crop per binnenveld field.')
```
The `binnenveld` dataset lacks actual information on management parameters. In the dataset these were all set to `FALSE`, resulting in low management scores for all fields. For most total and categorical scores, the highest scores are obtained with the 'non' aggregation method and the lowest with 'lin'.

When the aggregation step to categories is omitted, total scores differ little for some fields (eg. 2, 4, 5, 8), but are lower when using 'log' or 'lin' in other fields (eg. 3, 6, 7, 9)

## Aggregation in other soil quality assessment frameworks
The OBI is not the first attempt to express soil quality with a single score, see for example: @Rutgers2012 and @VanWijnen2012. 
These authors described a method to numerically express the performance of ecosystem service provision of fields with what they call: the Ecosystem Performance Index (EPX). An EPX was calculated using a set of measured properties of a field and of a reference. The reference, dubbed the Maximum Ecological Potential (MEP) was derived by selecting the best performing fields in a sample for a given land-use and soil type. Depending on the ecosystem service, a selecting of soil properties was made that acted as proxy. An EPX was calculated by comparing the selected soil properties of a field with those of the reference/MEP like this:  
$$
EPX = 10^{-{\left(\frac{+\left|\log\left(\frac{VAR^i_{obs}}{VAR^i_{ref}}\right)\right|+\dots-\left|\log\left(\frac{VAR^{j}_{obs}}{VAR^{j}_{ref}}\right)\right|}{n}\right)}}
$$
Where VAR~obs~ is a soil property of an observed field and VAR~ref~ a soil property of the MEP, n is the number of distinct soil properties used to derive the EPX. Normally a variable's contribution to the EPX is calculated using the i-type, where any deviation of the observed property from the MEP negatively affects the EPX. For some properties*ecosystem services a positive or negative deviation of the observed value from the MEP can be positive for the EPX, in such a case, the j-type has to be used. @Rutgers2012 provides as example, where more soil organic matter (SOM) is always positive for providing a certain service, if an observed SOM content is higher than the reference, the j-type is used and SOM contributes negatively to the deviation from the MEP. Consequently, if the j-type is applied, the EPX of a field can be larger then one (provided that all i-type variables are at or very close to the reference).
The indices of different services can be aggregated to a single score by taking the arithmetic mean. A weighted mean was also calculated based on the relative importance stakeholders assign to each service (here, a farmer, water manager and national government representative were used).  

This method relies on statistical modeling for the selection of soil properties relevant for specific services rather then empirical relations. Furthermore, the reliance on BPJ for determining the reference makes the method vulnerable to the available experts and data. Both are drawbacks the authors recommend addressing.

In addition, we found that this method is sensitive to the numeric spread, values of a soil property are likely to have. When calculating natural attenuation capacity according to @VanWijnen2012 using observed and reference values from @Rutgers2008, we found that one of the parameters FMA, had a much larger impact on the EPX than the others. Values of FMA within the 5^th^ and 95^th^ percentile ranged from 14 to 3960, while pH for example, ranged from 7.3 to 7.7 

```{r graphical representation of EPX , fig.cap= "NAC scores where one variable varies and the others are set to Maximum ecological potential (MEP), variable values range from 5th percentile to 95th percentile reported by Rutgers (2008). Type indicates whether standard calculation (i) is used or not(j) where j improves on the NAC."}
# make table with reference data from Rutgers2008
ac.mep <- data.table(var = c('FMA', 'PotC', 'PotN', 'SOM', 'pH', 'PAL'),
                     mep.v = c(2700, 18, 2.0, 2.2, 7.6, 47),
                     soiltype = rep('klei', 6),
                     landuse = rep('akkerbouw', 6),
                     nl.mean = c(1150,22, 2, 2.5, 7.5, 47),
                     nl5perc = c(14, 9, 0.5, 1.6, 7.3, 31),
                     nl95perc = c(3960, 48, 3.7, 3.6, 7.7, 62))

# make data
dtS <- data.table(x = seq(1.6,3.6,length.out = 100), var = 'SOM')
dtF <- data.table(x=seq(10,3990,length.out = 100), var = 'FMA')
dtF <- dtF[x>200]
dtH <- data.table(x=seq(7.3,7.7,length.out = 100), var = 'pH')
dtP <- data.table(x=seq(31,62,length.out = 100), var = 'PAL')
dtC <- data.table(x=seq(9,48,length.out = 100), var = 'PotC')
dtN <- data.table(x=seq(0.5,3.7,length.out = 100), var = 'PotN')

# combine data
dta <- rbindlist(list(dtS, dtF, dtH, dtP, dtC, dtN))

# add mep values
dta <- merge(dta, ac.mep[,.(var, mep.v)], by = 'var')

# indicate when to use i or j type
dta[,type := 'i']
dta[(var == 'FMA' & x<mep.v) | var == 'SOM'& x>mep.v, type := 'j']

# calc y
dta[,y := fifelse(type == 'i',10^-(abs(log10(x/mep.v)/6)), 10^(abs(log10(x/mep.v)/6)))]

# plot
ggplot(dta) + 
  geom_line(aes(x = x, y = y, col = type))+
  ylab("NAC score if all other variables are at MEP") +
  xlab("Variable value") +
  # labs(title = 'j (red) and i (black) type plot, j-type is applied') +
  facet_wrap(facets = ~var, scales = 'free_x', ncol = 3) +
  theme_bw()
```

 
```{r eval = FALSE, include=FALSE}
# e.g. (van Wijnen et. al. 2012 and  Rutgers et. al. 2012) or Moebius-Clune 2016
```



## References