---
title: "Filter helpers"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{04_filter_summarised_result}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
```

## Introduction

Dealing with an `<summarised_result>` object can be difficult to handle specially when we are trying to filter. For example, difficult tasks would be to filter to a certain result_type or when there are many strata joined together filter only one of the variables. On the other hand it exists the [tidy](https://darwin-eu.github.io/visOmopResults/articles/tidy.html) format that makes it easy to filter, but then you loose the `<summarised_result>` object. There exist some functions that only work with `<summarised_result>` objects.

**visOmopResults** package contains some functionalities that helps on this process:

- `filterSettings` to filter the `<summarised_result>` object using the `settings()` attribute.
- `filterGroup` to filter the `<summarised_result>` object using the group_name-group_level tidy columns.
- `filterStrata` to filter the `<summarised_result>` object using the strata_name-starta_level tidy columns.
- `filterAdditional` to filter the `<summarised_result>` object using the additional_name-additional_level tidy columns.

In this vignette we will also cover two types of utility functions:

- `unite*` type functions: to join multiple columns in the name-level structure.
- `*Columns` type functions: to identify the columns that are contained in a name-level structure.

Now we will see some examples.

## filterSettings

For this example we will use some mock data.

```{r}
library(visOmopResults)
library(dplyr, warn.conflicts = FALSE)
```

Let's generate two sets of results:
```{r}
result1 <- mockSummarisedResult()
result2 <- mockSummarisedResult()
```

We can change the settings of the second set of results simulating that results come from a different package ans set of results:
```{r}
result2 <- result2 |>
  omopgenerics::newSummarisedResult(settings = tibble(
    result_id = 1L,
    result_type = "second_mock_result",
    package_name = "omopgenerics",
    package_version = "1.0.0",
    my_parameter = TRUE
  ))
```

We can now merge both results in a unique `<summarised_result>` object:
```{r}
result <- bind(result1, result2)
settings(result)
result
```

You could potentially add the settings with `addSettings()`, then filter and finally eliminate the columns. Let's see an example where we subset to the results that have *my_parameter == TRUE*:

```{r}
resultMyParam <- result |>
  addSettings(settingsColumns = "my_parameter") |>
  filter(my_parameter == TRUE) |>
  select(!"my_parameter")
resultMyParam
```

This approach has some problems:

- It is not efficient.
- We have to use three different functions.
- The settings attribute still contains both sets:

```{r}
settings(resultMyParam)
```

We can do the same solving the three problems using `filterSettings()`:

```{r}
resultMyParam <- result |>
  filterSettings(my_parameter == TRUE)
resultMyParam
settings(resultMyParam)
```

## filterStrata

Using the same mock data we can try to filter the rows that contain data related to 'Female', the problematic with the strata_name-strata_level display is that it is difficult to easily filter the "Female" columns:

```{r}
result |>
  select(strata_name, strata_level) |>
  distinct()
```

One option that we could use is `splitStrata()`, `filter()` and then `uniteStrata()` again:

```{r}
result |>
  splitStrata() |>
  filter(sex == "Female") |>
  uniteStrata(c("age_group", "sex"))
```

Problem of this is that:

- It is extremely inefficient (all the rows must be splitted and united back).
- You need to know which are the strata columns (you could potentially use `strataColumns()`).
- You have to use multiple functions.

We could do exactly the same with the function `filterStrata()`:

```{r}
result |>
  filterStrata(sex == "Female")
```

`filterGroup()` and `filterAdditional()` work exactly in the same way than `filterStrata()` but with their analogous columns.

A nice functionality that you may have is that is that if you filter by a column/setting that does not exist the output will be warning + return emptySummarisedResult() which can be quite hekpful in some occasions.

```{r}
result |>
  filterSettings(setting_that_does_not_exist == 1)
```

## unite functions

In this previous section we have mentioned the function `uniteStrata()` without explaining its functionality so let's cover it with a couple of examples. The `uniteGroup()`, `uniteStrata()` and `uniteAdditional()` functions are the opposite to the `split*()` type functions:

```{r}
result |>
  splitStrata()
```

```{r}
result |>
  splitStrata() |>
  uniteStrata(c("age_group", "sex"))
```

Note that missing will be not included (if all missing: overall-overall will be considered), for example:
```{r, echo = FALSE}
dplyr::tibble(
  age_group = c(NA, NA, "<40", ">40"),
  sex = c(NA, "Male", "Female", NA),
  year = c(NA, NA, 2010L, 2011L)
) |> 
  gt::gt()
```

With `uniteStrata(cols = c("age_group", "sex", "year"))` the output would be:
```{r, echo = FALSE}
dplyr::tibble(
  strata_name = c("overall", "sex", "age_group &&& sex &&& year", "age_group &&& year"),
  strata_level = c("overall", "Male", "<40 &&& Female &&& 2010", ">40 &&& 2011")
) |> 
  gt::gt()
```

Note that is we split again then year will be a character vector instead of an integer. In future releases conserving type may be possible.

`uniteGroup()` and `uniteAdditional()` work exactly in the same way than `uniteStrata()` but with their analogous columns.

## Columns

Splitting and tidying your `<summarised_result>` can give you many advantages, as we have seen across the different vignettes. One of the problems that you may face is the ability to know which are the available `settings` to add, or how many columns will be generated when splitting group. That's why visOmopResults created some helper functions:

- `settingsColumns()` gives you the setting names that are available in a `<summarised_result>` object.
- `groupColumns()` gives you the new columns that will be generated when splitting group_name-group_level pair into different columns.
- `strataColumns()` gives you the new columns that will be generated when splitting strata_name-strata_level pair into different columns.
- `additionalColumns()` gives you the new columns that will be generated when splitting additional_name-additional_level pair into different columns.
- `tidyColumns()` gives you the columns that will have the object if you tidy it (`tidy(result)`). This function in very useful to know which are the columns that can be included in **plot** and **table** functions.

Let's see the different values with out example mock data set:

```{r}
settingsColumns(result)
groupColumns(result)
strataColumns(result)
additionalColumns(result)
tidyColumns(result)
```