--- title: "How to use autoslider.core" date: "`r Sys.Date()`" output: rmarkdown::html_document: theme: "spacelab" highlight: "kate" toc: true toc_float: true author: - Stefan Thoma, Joe Zhu ([`shajoezhu`](https://github.com/shajoezhu)) vignette: > %\VignetteIndexEntry{Introduction to `AutoslideR.core`} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} editor_options: markdown: wrap: 72 --- In this vignette we show the general `autoslider.core` workflow, how you can create functions that produce study-specific outputs, and how you can integrate them into the `autoslider.core` framework. ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Requirements Of course, you need to have the `autoslider.core` package installed, and you need to have data available. In this example I use example data stored in the `autoslider.core` package. The data needs to be stored in a named list where the names should correspond to ADaM data sets. ```{r, echo = FALSE, include = FALSE} library(autoslider.core) library(dplyr) ``` ```{r, eval = TRUE, include = FALSE} # hidden setup # Install and load the necessary packages library(yaml) # Create the YAML content yaml_content <- ' ITT: title: Intent to Treat Population condition: ITTFL == "Y" target: adsl type: slref SAS: title: Secondary Analysis Set condition: SASFL == "Y" target: adsl type: slref SE: title: Safety Evaluable Population condition: SAFFL== "Y" target: adsl type: slref SER: title: Serious Adverse Events condition: AESER == "Y" target: adae type: anl LBCRP: title: CRP Values condition: PARAMCD == "CRP" target: adlb type: slref LBNOBAS: title: Only Visits After Baseline condition: ABLFL != "Y" & !(AVISIT %in% c("SCREENING", "BASELINE")) target: adlb type: slref ' # Create a temporary YAML file filters <- tempfile(fileext = ".yaml") # Write the YAML content to the temporary file write(yaml_content, file = filters) # Create the specs entry specs_entry <- ' - program: t_ds_slide titles: Patient Disposition ({filter_titles("adsl")}) footnotes: "t_ds footnotes" paper: L6 suffix: ITT - program: t_dm_slide titles: Patient Demographics and Baseline Characteristics footnotes: "t_dm_slide footnote" paper: L6 suffix: ITT args: arm: "TRT01A" vars: ["SEX", "AGE", "RACE", "ETHNIC", "COUNTRY"] - program: lbt06 titles: Patient Disposition ({filter_titles("adsl")}) footnotes: "t_ds footnotes" paper: L6 suffix: ITT_LBCRP_LBNOBAS ' # Create a temporary specs entry file spec_file <- tempfile(fileext = ".yaml") # Write the specs entry to the temporary file write(specs_entry, file = spec_file) # Clean up the temporary files # file.remove(filters) # file.remove(spec_file) ``` # Workflow The folder structure could look something like: ``` ├── programs │ ├── run_script.R │ ├── R | | ├── helping_functions.R | | ├── output_functions.R ├── outputs ├── specs.yml ├── filters.yml ``` The `autoslideR` workflow would be implemented in the `run_script.R` file. This workflow does not require the files in `R/`. However, if custom output-creating functions are implemented, `R/` would be the place to put them. The `autoslideR` workflow has four main aspects: ## The specifications `specs.yml` This file contains the specifications of all outputs you would like to create. For each output we define specific information, namely the program name, the footnotes & titles, the paper (this indicates the orientation, P for portrait and L for landscape, the number indicates the font size), the suffix and `args`. It could look something like that: ``` - program: t_ds_slide titles: Patient Disposition ({filter_titles("adsl")}) footnotes: 't_ds footnotes' paper: L6 suffix: ITT - program: t_dm_slide titles: Patient Demographics and Baseline Characteristics footnotes: 't_dm_slide footnote' paper: L6 suffix: ITT args: arm: "TRT01A" vars: ["SEX", "AGE", "RACE", "ETHNIC", "COUNTRY"] ``` The program name refers to a function that produces an output. This could be one of the `_slide` functions in `autoslider.core` or a custom function. Titles and footnotes are added once the outputs are created. We refer to that as decorating the outputs. The suffix specifies the name of the filters that are applied to the data, before the data is funneled into the function (program). The filters themselves are specified in the `filters.yml` file. ## The filters `filters.yml` In `filters.yml` we specify the names of the filters used across the outputs. Each filter has a name (e.g. `FAS`), a title (`Full Analysis Set`), and then the filtering condition on a target dataset. The filter title may be appended to the output title. For the `t_ds_slides` slide above all filter titles that target the adsl dataset would be included in the brackets. We would thus expect the title to read: "Patient Disposition (Full Analysis Set)" [what is the type?] As you can see, we don't just have population filters, but also filters on serious adverse events. We can thus produce SAE tables by just supplying the serious adverse events to the AE table function. This concept generalizes also to `PARAMCD` values. ``` ITT: title: Intent to Treat Population condition: ITTFL =='Y' target: adsl type: slref SAS: title: Secondary Analysis Set condition: SASFL == 'Y' target: adsl type: slref SE: title: Safety Evaluable Population condition: SAFFL=='Y' target: adsl type: slref SER: title: Serious Adverse Events condition: AESER == 'Y' target: adae type: anl ``` ## The functions You can find an overview of all `autoslider.core` functions [here](https://insightsengineering.github.io/autoslider.core/latest-tag/reference/index.html). Note that all output-producing functions end with `_slide` while the prefix (i.e. `t_`, `l_`, `g_`) specify the type of output (i.e. table, listing, or graph respectively). Custom functions are needed if the `autoslider.core` functions do not cover the outputs you need. More on that further down. ## The backend machinery A typical workflow could look something like this: ```{r, eval = FALSE} # define path to the yml files spec_file <- "spec.yml" filters <- "filters.yml" ``` ```{r} library("dplyr") # load all filters filters::load_filters(filters, overwrite = TRUE) # read data data <- list( "adsl" = eg_adsl %>% mutate( FASFL = SAFFL, # add FASFL for illustrative purpose for t_pop_slide # DISTRTFL is needed for t_ds_slide but is missing in example data DISTRTFL = sample(c("Y", "N"), size = length(TRT01A), replace = TRUE, prob = c(.1, .9)) ) %>% preprocess_t_ds(), # this preproccessing is required by one of the autoslider.core functions "adae" = eg_adae, "adtte" = eg_adtte, "adrs" = eg_adrs, "adlb" = eg_adlb ) # create outputs based on the specs and the functions outputs <- spec_file %>% read_spec() %>% # we can also filter for specific programs, if we don't want to create them all filter_spec(., program %in% c( "t_ds_slide", "t_dm_slide" )) %>% # these filtered specs are now piped into the generate_outputs function. # this function also requires the data generate_outputs(datasets = data) %>% # now we decorate based on the specs, i.e. add footnotes and titles decorate_outputs( version_label = NULL ) ``` We can have a look at one of the outputs stored in the outputs file: ```{r} outputs$t_dm_slide_ITT ``` Now we can save it to a slide. For this example I store the output in a tempfile, you would likely store it in the `outputs/` folder. ```{r, eval = FALSE} # Output to slides with template and color theme outputs %>% generate_slides( outfile = tempfile(fileext = ".ppts"), template = file.path(system.file(package = "autoslider.core"), "/theme/basic.pptx"), table_format = autoslider_format ) ``` # Writing custom functions Unless your requirements are really specific, the most efficient way to write a study function is to base it off of a template function from the [TLG catalogue](https://insightsengineering.github.io/tlg-catalog/stable/). The function you would want to create should take as input a list of datasets and potentially additional arguments. Within the function, you should not worry about filtering the data, as this should be taken care of with the `filters.yml` file and the general workflow. To work properly with `autoslider.core` your function should return either: - A `ggplot2` object for graphs - An `rtables` object for tables - An `rlistings` object for listings That's it! Now let's see how this works in practice. ## create example function A function that works within the `autoslideR` workflow should also work on its own. This makes it straightforward to develop and test. As an example, lets create a function corresponding to a [TLG catalogue](https://insightsengineering.github.io/tlg-catalog/stable/) output. We are going create a table based on [LBT06 Laboratory Abnormalities by Visit and Baseline Status](https://insightsengineering.github.io/tlg-catalog/stable/tables/lab-results/lbt06.html): ```{r} lbt06 <- function(datasets) { # Ensure character variables are converted to factors and empty strings and NAs are explicit missing levels. adsl <- datasets$adsl %>% tern::df_explicit_na() adlb <- datasets$adlb %>% tern::df_explicit_na() # Please note that df_explict_na has a na_level argument defaulting to "", # Please don't change the na_level to anything other than NA, empty string or the default "". adlb_f <- adlb %>% dplyr::filter(ABLFL != "Y") %>% dplyr::filter(!(AVISIT %in% c("SCREENING", "BASELINE"))) %>% dplyr::mutate(AVISIT = droplevels(AVISIT)) %>% formatters::var_relabel(AVISIT = "Visit") adlb_f_crp <- adlb_f %>% dplyr::filter(PARAMCD == "CRP") # Define the split function split_fun <- rtables::drop_split_levels lyt <- rtables::basic_table(show_colcounts = TRUE) %>% rtables::split_cols_by("ARM") %>% rtables::split_rows_by("AVISIT", split_fun = split_fun, label_pos = "topleft", split_label = formatters::obj_label(adlb_f_crp$AVISIT) ) %>% tern::count_abnormal_by_baseline( "ANRIND", abnormal = c(Low = "LOW", High = "HIGH"), .indent_mods = 4L ) %>% tern::append_varlabels(adlb_f_crp, "ANRIND", indent = 1L) %>% rtables::append_topleft(" Baseline Status") result <- rtables::build_table( lyt = lyt, df = adlb_f_crp, alt_counts_df = adsl ) %>% rtables::trim_rows() result } ``` Let's see if this works: ```{r} lbt06(data) ``` This works! To increase code-reusability and to have filtering control centralised in the `filters.yml` file, I would recommend to remove most filtering processes from the function, namely the following chunk: ```{r} adlb_f <- eg_adlb %>% dplyr::filter(ABLFL != "Y") %>% dplyr::filter(!(AVISIT %in% c("SCREENING", "BASELINE"))) %>% dplyr::mutate(AVISIT = droplevels(AVISIT)) %>% formatters::var_relabel(AVISIT = "Visit") adlb_f_crp <- adlb_f %>% dplyr::filter(PARAMCD == "CRP") ``` For this, I'll add two separate filters into the `filters.yml` file; one to filter the right parameter, and one to take care of the AVISIT and ABLFL. ``` LBCRP: title: CRP Values condition: PARAMCD == 'CRP' target: adlb type: slref LBNOBAS: title: Only Visits After Baseline condition: ABLFL != "Y" & !(AVISIT %in% c("SCREENING", "BASELINE")) target: adlb type: slref ``` And the corresponding specs entry: ``` - program: lbt06 titles: Patient Disposition ({filter_titles("adsl")}) footnotes: 't_ds footnotes' paper: L6 suffix: FAS_LBCRP_LBNOBAS ``` Now we can rewrite the function. But keep in mind: We are applying a filter to the ADSL data set but create the output on ADLB. To forward the filter to ADLB, we must semi-join the ADSL to ADLB. ```{r} lbt06 <- function(datasets) { # Ensure character variables are converted to factors and empty strings and NAs are explicit missing levels. adsl <- datasets$adsl %>% tern::df_explicit_na() adlb <- datasets$adlb %>% tern::df_explicit_na() # join adsl to adlb adlb <- adlb %>% semi_join(adsl, by = "USUBJID") # Please note that df_explict_na has a na_level argument defaulting to "", # Please don't change the na_level to anything other than NA, empty string or the default "". adlb_f <- adlb %>% dplyr::mutate(AVISIT = droplevels(AVISIT)) %>% formatters::var_relabel(AVISIT = "Visit") # Define the split function split_fun <- rtables::drop_split_levels lyt <- rtables::basic_table(show_colcounts = TRUE) %>% rtables::split_cols_by("ARM") %>% rtables::split_rows_by("AVISIT", split_fun = split_fun, label_pos = "topleft", split_label = formatters::obj_label(adlb_f_crp$AVISIT) ) %>% tern::count_abnormal_by_baseline( "ANRIND", abnormal = c(Low = "LOW", High = "HIGH"), .indent_mods = 4L ) %>% tern::append_varlabels(adlb_f, "ANRIND", indent = 1L) %>% rtables::append_topleft(" Baseline Status") result <- rtables::build_table( lyt = lyt, df = adlb_f, alt_counts_df = adsl ) %>% rtables::trim_rows() result } ``` lets do a dry-run before we integrate this function into the workflow: ```{r} # filter data | this step will be performed by the workflow later on adsl <- eg_adsl adlb <- eg_adlb adlb_f <- adlb %>% dplyr::filter(ABLFL != "Y") %>% dplyr::filter(!(AVISIT %in% c("SCREENING", "BASELINE"))) adlb_f_crp <- adlb_f %>% dplyr::filter(PARAMCD == "CRP") adsl_f <- adsl %>% filter(ITTFL == "Y") ``` ```{r} lbt06(list(adsl = adsl_f, adlb = adlb_f_crp)) ``` Looks like it works! ## Integrate it into the general workflow You have to keep in mind that the function you created must be in the global environment when calling the `create_outputs` function. This is the case for all `autoslider.core` functions, as you attach the `autoslider.core` package (with your `library(autoslider.core)` call), so all (exported) function of the `autoslider.core` package are available. If you store your custom function in a separate script, you would need to source that script at some point before calling the function, i.e.: ```{r, eval = FALSE} source("programs/R/output_functions.R") ``` Now you just have to make sure the two `.yml` files are correctly specified. Set the path to the `.yml` files. ```{r, eval = FALSE} filters <- "filters.yml" spec_file <- "specs.yml" ``` Then load the filters and generate the outputs. ```{r} filters::load_filters(filters, overwrite = TRUE) outputs <- spec_file %>% read_spec() %>% generate_outputs(data) %>% decorate_outputs() outputs$lbt06_ITT_LBCRP_LBNOBAS ``` Once this works, we can finally generate the slides. ```{r} filepath <- tempfile(fileext = ".pptx") generate_slides(outputs, outfile = filepath) ``` Of course, you would not use a temporary file, and you might want to use a custom `.pptx` template for your slides.