--- title: "Tidy your summarised result object" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{03_tidy_summarised_result} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) options(rmarkdown.html_vignette.check_title = FALSE) ``` ## `` format The `` format is a standard output defined in [omopgenerics](https://darwin-eu-dev.github.io/omopgenerics/articles/summarised_result.html). The fact that it is standardised output make it a very powerful tool so multiple functions can export on the same format and built functionalities on top of it, as it can be seen in tables and [plots](https://darwin-eu.github.io/visOmopResults/articles/plots.html) vignettes. This standard output it can be some times hard to manipulate to do your custom analysis. `visOmopResults` contains tools to **tidy** your `` object that are covered in this vignette. ## Tidy `` `visOmopResults` defines the method tidy for `` object, what this function does is to: ### 1. Split *group*, *strata*, and *additional* pairs into separate columns: The `` object has the following pair columns: group_name-group_level, strata_name-strata_level, and additional_name-additional_level. These pairs use the `&&&` separator to combine multiple fields, for example if you want to combine cohort_name and age_group in group_name-group_level pair: `group_name = "cohort_name &&& age_group"` and `group_level = "my_cohort &&& <40"`. By default if no aggregation is produced in group_name-group_level pair: `group_name = "overall"` and `group_level = "overall"`. **ORIGINAL FORMAT:** ```{r, echo = FALSE} dplyr::tibble( group_name = c("cohort_name", c("cohort_name &&& sex"), c("sex &&& age_group")), group_level = c("acetaminophen", c("acetaminophen &&& Female"), c("Male &&& <40")) ) |> gt::gt() ``` The tidy format puts each one of the values as a columns. Making it easier to manipulate but at the same time the output is not standardised anymore as each `` object will have a different number and names of columns. Missing values will be filled with the "overall" label. **TIDY FORMAT:** ```{r, echo = FALSE} dplyr::tibble( group_name = c("cohort_name", c("cohort_name &&& sex"), c("sex &&& age_group")), group_level = c("acetaminophen", c("acetaminophen &&& Female"), c("Male &&& <40")) ) |> visOmopResults::splitGroup() |> gt::gt() ``` ### 2. Add settings of the `` object as columns: Each `` object has a setting attribute that relates the 'result_id' column with each different set of settings. The columns 'result_type', 'package_name' and 'package_version' are always present in settings, but then we may have some extra parameters depending how the object was created. So in the `` format we need to use these `settings()` functions to see those variables: **ORIGINAL FORMAT:** `settings`: ```{r, echo = FALSE} dplyr::tibble( result_id = c(1L, 2L), my_setting = c(TRUE, FALSE), package_name = "visOmopResults" ) |> gt::gt() ``` ``: ```{r, echo = FALSE} dplyr::tibble( result_id = c("1", "...", "2", "..."), cdm_name = c("omop", "...", "omop", "..."), " " = c("..."), additional_name = c("overall", "...", "overall", "...") ) |> gt::gt() ``` But in the tidy format we add the settings as columns, making that their value is repeated multiple times (there is only one row per result_id in settings, whereas there can be multiple rows in the `` object). The column 'result_id' is eliminated as it does not provide information anymore. Again we loose on standardisation (multiple different settings), but we gain in flexibility: **TIDY FORMAT:** ```{r, echo = FALSE} dplyr::tibble( cdm_name = c("omop", "...", "omop", "..."), " " = c("..."), additional_name = c("overall", "...", "overall", "..."), my_setting = c("TRUE", "...", "FALSE", "..."), package_name = c("visOmopResults", "...", "visOmopResults", "...") ) |> gt::gt() ``` ### 3. Pivot estimates as columns: In the `` format estimates are displayed in 3 columns: - 'estimate_name' indicates the name of the estimate. - 'estimate_type' indicates the type of the estimate (as all of them will be casted to character). Possible values are: *`r omopgenerics::estimateTypeChoices()`*. - 'estimate_value' value of the estimate as ``. **ORIGINAL FORMAT:** ```{r, echo = FALSE} dplyr::tibble( variable_name = c("number individuals", "age", "age"), estimate_name = c("count", "mean", "sd"), estimate_type = c("integer", "numeric", "numeric"), estimate_value = c("100", "50.3", "20.7") ) |> gt::gt() ``` In the tidy format we pivot the estimates, creating a new column for each one of the 'estimate_name' values. The columns will be casted to 'estimate_type'. If there are multiple estimate_type(s) for same estimate_name they won't be casted and they will be displayed as character (a warning will be thrown). Missing data are populated with NAs. **TIDY FORMAT:** ```{r, echo = FALSE} dplyr::tibble( variable_name = c("number individuals", "age"), count = c(100L, NA), mean = c(NA, 50.3), sd = c(NA, 20.7) ) |> gt::gt() ``` ### Example Let's see a simple example with some toy data: ```{r} library(visOmopResults) result <- mockSummarisedResult() result |> tidy() ``` ## Customise your tidy summarised_result We have several functions to customise the tidy version of the `` object. ### Split The functions split are provided independent: - `splitGroup()` only splits the pair group_name-group_level columns. - `splitStrata()` only splits the pair strata_name-strata_level columns. - `splitAdditional()` only splits the pair additional_name-additional_level columns. There is also the function: - `splitAll()` that splits any pair x_name-x_level that is found on the data. ```{r} splitAll(result) ``` ### Pivot estimates `pivotEstimates()` can be used to pivot the variables that we are interested in. The argument `pivotEstimatesBy` specifies which are the variables that we want to use to pivot by, there are four options: - `NULL/character()` to not pivot anything. - `c("estimate_name")` to pivot only estimate_name. - `c("variable_level", "estimate_name")` to pivot estimate_name and variable_level. - `c("variable_name", "variable_level", "estimate_name")` to pivot estimate_name, variable_level and variable_name. Note that `variable_level` can contain NA values, these will be ignored on the naming part. ```{r} pivotEstimates( result, pivotEstimatesBy = c("variable_name","variable_level", "estimate_name") ) ``` ### Add settings `addSettings()` is used to add the settings that we want as new columns to our `` object. The `settingsColumns` argument is used to choose which are the settings we want to add. ```{r} addSettings( result, settingsColumns = "result_type" ) ```