--- title: "Filter helpers" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{04_filter_summarised_result} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE) ``` ## Introduction Dealing with an `` object can be difficult to handle specially when we are trying to filter. For example, difficult tasks would be to filter to a certain result_type or when there are many strata joined together filter only one of the variables. On the other hand it exists the [tidy](https://darwin-eu.github.io/visOmopResults/articles/tidy.html) format that makes it easy to filter, but then you loose the `` object. There exist some functions that only work with `` objects. **visOmopResults** package contains some functionalities that helps on this process: - `filterSettings` to filter the `` object using the `settings()` attribute. - `filterGroup` to filter the `` object using the group_name-group_level tidy columns. - `filterStrata` to filter the `` object using the strata_name-starta_level tidy columns. - `filterAdditional` to filter the `` object using the additional_name-additional_level tidy columns. In this vignette we will also cover two types of utility functions: - `unite*` type functions: to join multiple columns in the name-level structure. - `*Columns` type functions: to identify the columns that are contained in a name-level structure. Now we will see some examples. ## filterSettings For this example we will use some mock data. ```{r} library(visOmopResults) library(dplyr, warn.conflicts = FALSE) ``` Let's generate two sets of results: ```{r} result1 <- mockSummarisedResult() result2 <- mockSummarisedResult() ``` We can change the settings of the second set of results simulating that results come from a different package ans set of results: ```{r} result2 <- result2 |> omopgenerics::newSummarisedResult(settings = tibble( result_id = 1L, result_type = "second_mock_result", package_name = "omopgenerics", package_version = "1.0.0", my_parameter = TRUE )) ``` We can now merge both results in a unique `` object: ```{r} result <- bind(result1, result2) settings(result) result ``` You could potentially add the settings with `addSettings()`, then filter and finally eliminate the columns. Let's see an example where we subset to the results that have *my_parameter == TRUE*: ```{r} resultMyParam <- result |> addSettings(settingsColumns = "my_parameter") |> filter(my_parameter == TRUE) |> select(!"my_parameter") resultMyParam ``` This approach has some problems: - It is not efficient. - We have to use three different functions. - The settings attribute still contains both sets: ```{r} settings(resultMyParam) ``` We can do the same solving the three problems using `filterSettings()`: ```{r} resultMyParam <- result |> filterSettings(my_parameter == TRUE) resultMyParam settings(resultMyParam) ``` ## filterStrata Using the same mock data we can try to filter the rows that contain data related to 'Female', the problematic with the strata_name-strata_level display is that it is difficult to easily filter the "Female" columns: ```{r} result |> select(strata_name, strata_level) |> distinct() ``` One option that we could use is `splitStrata()`, `filter()` and then `uniteStrata()` again: ```{r} result |> splitStrata() |> filter(sex == "Female") |> uniteStrata(c("age_group", "sex")) ``` Problem of this is that: - It is extremely inefficient (all the rows must be splitted and united back). - You need to know which are the strata columns (you could potentially use `strataColumns()`). - You have to use multiple functions. We could do exactly the same with the function `filterStrata()`: ```{r} result |> filterStrata(sex == "Female") ``` `filterGroup()` and `filterAdditional()` work exactly in the same way than `filterStrata()` but with their analogous columns. A nice functionality that you may have is that is that if you filter by a column/setting that does not exist the output will be warning + return emptySummarisedResult() which can be quite hekpful in some occasions. ```{r} result |> filterSettings(setting_that_does_not_exist == 1) ``` ## unite functions In this previous section we have mentioned the function `uniteStrata()` without explaining its functionality so let's cover it with a couple of examples. The `uniteGroup()`, `uniteStrata()` and `uniteAdditional()` functions are the opposite to the `split*()` type functions: ```{r} result |> splitStrata() ``` ```{r} result |> splitStrata() |> uniteStrata(c("age_group", "sex")) ``` Note that missing will be not included (if all missing: overall-overall will be considered), for example: ```{r, echo = FALSE} dplyr::tibble( age_group = c(NA, NA, "<40", ">40"), sex = c(NA, "Male", "Female", NA), year = c(NA, NA, 2010L, 2011L) ) |> gt::gt() ``` With `uniteStrata(cols = c("age_group", "sex", "year"))` the output would be: ```{r, echo = FALSE} dplyr::tibble( strata_name = c("overall", "sex", "age_group &&& sex &&& year", "age_group &&& year"), strata_level = c("overall", "Male", "<40 &&& Female &&& 2010", ">40 &&& 2011") ) |> gt::gt() ``` Note that is we split again then year will be a character vector instead of an integer. In future releases conserving type may be possible. `uniteGroup()` and `uniteAdditional()` work exactly in the same way than `uniteStrata()` but with their analogous columns. ## Columns Splitting and tidying your `` can give you many advantages, as we have seen across the different vignettes. One of the problems that you may face is the ability to know which are the available `settings` to add, or how many columns will be generated when splitting group. That's why visOmopResults created some helper functions: - `settingsColumns()` gives you the setting names that are available in a `` object. - `groupColumns()` gives you the new columns that will be generated when splitting group_name-group_level pair into different columns. - `strataColumns()` gives you the new columns that will be generated when splitting strata_name-strata_level pair into different columns. - `additionalColumns()` gives you the new columns that will be generated when splitting additional_name-additional_level pair into different columns. - `tidyColumns()` gives you the columns that will have the object if you tidy it (`tidy(result)`). This function in very useful to know which are the columns that can be included in **plot** and **table** functions. Let's see the different values with out example mock data set: ```{r} settingsColumns(result) groupColumns(result) strataColumns(result) additionalColumns(result) tidyColumns(result) ```