---
title: "Creating ADFACE"
output: 
  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Creating ADFACE}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

This article describes about creating `ADFACE` ADaM dataset which is
part of Vaccine - Reactogenicity based on the Center for Biologics Evaluation and Research (CBER) guidelines.

See the below links for more information:

[Center for Biologics Evaluation and Research (CBER) Guidelines](https://www.fda.gov/media/112581/download)

[Therapeutic Area Data Standards User Guide for Vaccines (TAUG-Vax)](https://www.cdisc.org/standards/therapeutic-areas/vaccines/vaccines-therapeutic-area-user-guide-v11/html)

Examples are currently tested using `ADSL` (ADaM) and
`face`, `vs`, `ex` (SDTM) inputs.

**Note**: *All examples assume CDISC SDTM and/or ADaM format as input
unless otherwise specified.*

# Programming Workflow

-   [Read in Data](#readdata)
-   [Pre-processing of Input Dataset](#input)
-   [Merge `FACE` with `EX`](#merge)
-   [Merge Required `ADSL` Variables Needed for Analysis](#adsl)
-   [Derive Fever Records from `VS` Domain](#fever)
-   [Derive/Impute Numeric Date/Time and Analysis Day (`ADT`, `ADTM`, `ADY`, `ADTF`, `ATMF`)](#datetime)
-   [Derive Period Variables (e.g. `APxxSDT`, `APxxEDT`, ...)](#periodvars)
-   [Derive Direct Mapping Variables](#mapping)
-   [Derive Severity Records for Administration Site Events](#sev)
-   [Derive Maximum Records](#max)
-   [Assign `PARAMCD`, `PARAM`, `PARAMN`, `PARCAT1`](#paramcd)
-   [Derive Maximum Severity Flag](#maxflag)
-   [Derive Event Occurrence Flag](#eventflag)
-   [Post-processing of the Dataset](#post)
-   [Add ADSL Variables](#adsl_vars)

## Read in Data {#readdata}

To start, all data frames needed for the creation of `ADFACE` should be read into
the environment. Some of the data frames needed are `VS`,`EX` and `FACE`.

```{r message=FALSE}
library(admiral)
library(admiralvaccine)
library(admiraldev)
library(pharmaversesdtm)
library(dplyr, warn.conflicts = FALSE)
library(lubridate)
library(stringr)
library(tidyr)
library(tibble)

data("face_vaccine")
data("suppface_vaccine")
data("ex_vaccine")
data("suppex_vaccine")
data("vs_vaccine")
data("admiralvaccine_adsl")

face <- convert_blanks_to_na(face_vaccine)
ex <- convert_blanks_to_na(ex_vaccine)
vs <- convert_blanks_to_na(vs_vaccine)
suppface <- convert_blanks_to_na(suppface_vaccine)
suppex <- convert_blanks_to_na(suppex_vaccine)
adsl <- convert_blanks_to_na(admiralvaccine_adsl)
```

## Pre-processing of Input Dataset {#input}

This step involves company-specific pre-processing of required input dataset for
further analysis. In this step, we will filter records that has only reactogenicity events and
combine the `face` and `ex` with their supplementary datasets `suppface` and `suppex` respectively.

```{r eval=TRUE}
face <- face %>%
  filter(FACAT == "REACTOGENICITY" & grepl("ADMIN|SYS", FASCAT)) %>%
  mutate(FAOBJ = str_to_upper(FAOBJ)) %>%
  metatools::combine_supp(suppface)
ex <- metatools::combine_supp(ex, suppex)
```

```{r, echo=FALSE}
dataset_vignette(
  face,
  display_vars = exprs(USUBJID, FAOBJ, FATESTCD, FACAT, FASCAT, FATPTREF, FADTC)
)
```

## Merge `FACE` with `EX` {#merge}

In this step, we will merge `face` with `ex` domain and add required variables from `ex` domain to the input dataset. If subjects have multiple vaccination at same visit then this function will not merge input dataset with `ex` dataset and throws a warning.

The function `derive_vars_merged_vaccine()` is used to merge `face` with `ex` domain.

```{r eval=TRUE}
adface <- derive_vars_merged_vaccine(
  dataset = face,
  dataset_ex = ex,
  by_vars_sys = exprs(USUBJID, FATPTREF = EXLNKGRP),
  by_vars_adms = exprs(USUBJID, FATPTREF = EXLNKGRP, FALOC = EXLOC, FALAT = EXLAT),
  ex_vars = exprs(EXTRT, EXDOSE, EXSEQ, EXSTDTC, EXENDTC, VISIT, VISITNUM)
)
```
  
```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, FAOBJ, FATESTCD, FATPTREF, EXTRT)
)
```  

This call would return the input dataset with columns from `ex_vars` added if the subjects does not have multiple vaccination at same visit.

Though the function will throw warning if subjects have multiple vaccination at same visit, this call would return the input dataset merging it with supplementary dataset.

## Merge Required `ADSL` Variables Needed for Analysis {#adsl}

At this step, it may be useful to join `ADSL` to your `face` domain. Only the 
`ADSL` variables used for derivations are selected at this step. The rest of the
relevant `ADSL` variables would be added later.

```{r eval=TRUE}
adsl_vars <- exprs(RFSTDTC, RFENDTC)

adface <- derive_vars_merged(
  face,
  dataset_add = adsl,
  new_vars = adsl_vars,
  by_vars = get_admiral_option("subject_keys")
)
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, RFSTDTC, RFENDTC)
)
```  

This call would return the input dataset with columns `RFSTDTC`, `RFENDTC` added.

## Derive Fever Records from `VS` Domain {#fever}

In this step, we will merge fever records from the `VS` domain with the input dataset if the fever records does not present in the input dataset.

The function `derive_fever_records()` is used to merge fever records. These records will also be used in maximum temperature calculation.

```{r eval=TRUE}
adface <- derive_fever_records(
  dataset = adface,
  dataset_source = ungroup(vs),
  filter_source = VSCAT == "REACTOGENICITY" & VSTESTCD == "TEMP",
  faobj = "FEVER"
)
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, FAOBJ, FATESTCD, FAORRES, VSSTRESN),
  filter = FAOBJ == "FEVER"
)
``` 

This call returns the input dataset with `FEVER` records added if the input dataset does not have `FEVER` records. If the input dataset has `FEVER` records, the output dataset will be same as the input dataset.


## Derive/Impute Numeric Date/Time and Analysis Day (`ADT`, `ADTM`, `ADTF`, `ATMF`, `ADY`) {#datetime}

The function `derive_vars_dt()` can be used to derive `ADT`. This function allows 
the user to impute the date as well.

Similarly, `ADTM` can be created using the function `derive_vars_dtm()`. 
Imputation can be done on both the date and time components of `ADTM`.

Example calls:

```{r eval=TRUE}
adface <- adface %>%
  derive_vars_dt(
    new_vars_prefix = "A",
    dtc = FADTC
  ) %>%
  derive_vars_dtm(
    new_vars_prefix = "A",
    dtc = FADTC,
    highest_imputation = "n"
  )
```

Once `ADT` is derived, the function `derive_vars_dy()` can be used to derive `ADY`.
This example assumes both `ADT` and `RFSTDTC` exist on the data frame.

```{r eval=TRUE}
adface <- adface %>%
  mutate(RFSTDTC = as.Date(RFSTDTC)) %>%
  derive_vars_dy(reference_date = RFSTDTC, source_vars = exprs(ADT))
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, FATPTREF, FAOBJ, ADT, ADTM, ADY)
)
```

## Derive Period Variables (e.g. `APxxSDT`, `APxxEDT`, ...) {#periodvars}

The `{admiral}` core package has separate functions to handle period variables since these variables are study specific.

See the ["Visit and Period Variables"
vignette](https://pharmaverse.github.io/admiral/articles/visits_periods.html) for more information.

If the variables are not derived based on a period reference dataset, they may
be derived at a later point of the flow. For example, phases like "Treatment
Phase" and "Follow up" could be derived based on treatment start and end date.

```{r eval=TRUE}
period_ref <- create_period_dataset(
  dataset = adsl,
  new_vars = exprs(
    APERSDT = APxxSDT, APEREDT = APxxEDT, TRTA = TRTxxA,
    TRTP = TRTxxP
  )
)

adface <- derive_vars_joined(
  adface,
  dataset_add = period_ref,
  by_vars = get_admiral_option("subject_keys"),
  filter_join = ADT >= APERSDT & ADT <= APEREDT,
  join_type = "all"
)
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, APERSDT, APEREDT, TRTA, TRTP)
)
```

## Derive Direct Mapping Variables {#mapping}

In this step,we will create the user defined function to assign `AVAL` values from `AVALC` which
will be used in further steps. 

The user defined functions would look like the following:

```{r eval=TRUE}
sev_to_numeric <- function(x, y) {
  case_when(
    x == "NONE" ~ 0,
    x == "MILD" ~ 1,
    x == "MODERATE" ~ 2,
    x == "SEVERE" ~ 3,
    TRUE ~ y
  )
}
```

The mapping of these variables is left to the User. An 
example mapping may be:

```{r eval=TRUE}
adface <- adface %>%
  mutate(
    AVALC = as.character(FASTRESC),
    AVAL = suppressWarnings(as.numeric(FASTRESN)),
    AVAL = sev_to_numeric(AVALC, AVAL),
    ATPTREF = FATPTREF,
    ATPT = FATPT,
    ATPTN = FATPTNUM
  )
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, FAOBJ, AVAL, AVALC, ATPTREF, ATPTN)
)
```
## Derive AN01FL
Creating ANL01FL which would flag the records that will be considered for analysis and if there is any Investigator and Subject record for the same day, it would flag the Investigator record over the subject record.

Note: Please, consider which assessment is needed for your analysis. If you want to prioritize Investigator assessment, please proceed as follows. Otherwise, change FAEVAL order.

```{r eval = TRUE}
adface <- adface %>% derive_var_extreme_flag(
  by = exprs(STUDYID, USUBJID, FATPTREF, FAOBJ, FATESTCD, FATPTNUM),
  order = exprs(STUDYID, USUBJID, FATPTREF, FAOBJ, FATESTCD, FATPTNUM, FAEVAL),
  new_var = ANL01FL,
  mode = "first",
  true_value = "Y",
  false_value = NA_character_
)
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, FAOBJ, ATPTREF, FATESTCD, FATEST, AVAL, ANL01FL),
  filter = ANL01FL == "Y"
)
```

## Derive Severity Records for Administration Site Events {#sev}

The function `derive_diam_to_sev_records()` is used to derive the severity records from the diameter records for an event.

The severity records created will be useful for calculating the maximum severity.

```{r eval=TRUE}
adface <- derive_diam_to_sev_records(
  dataset = adface,
  filter_add = ANL01FL == "Y",
  diam_code = "DIAMETER",
  faobj_values = c("REDNESS", "SWELLING"),
  testcd_sev = "SEV",
  test_sev = "Severity/Intensity",
  none = 0,
  mild = 2,
  mod = 5,
  sev = 10
)
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, FAOBJ, ATPTREF, FATESTCD, FATEST, AVAL, AVALC),
  filter = FATESTCD == "SEV"
)
```

This call returns the input dataset with severity records derived from the diameter records for an event.

By default, we will populate `SEV` and `Severity/Intensity` as the `FATESTCD` and `FATEST` for the newly added records. The function allows the user to change the `FATESTCD` and `FATEST` as well.

## Derive Maximum Records{#max}

In this step, we will derive maximum records for severity, diameter, temperature  using the function `derive_extreme_records()`.

```{r eval=TRUE}
adface <- derive_extreme_records(
  dataset = adface,
  dataset_add = adface,
  filter_add = FATESTCD == "SEV" & ANL01FL == "Y",
  by_vars = exprs(USUBJID, FAOBJ, ATPTREF),
  order = exprs(AVAL),
  check_type = "none",
  mode = "last",
  set_values_to = exprs(
    FATEST = "Maximum Severity",
    FATESTCD = "MAXSEV"
  )
)

adface <- derive_extreme_records(
  dataset = adface,
  dataset_add = adface,
  filter_add = FAOBJ %in% c("REDNESS", "SWELLING") & FATESTCD == "DIAMETER" & ANL01FL == "Y",
  by_vars = exprs(USUBJID, FAOBJ, FALNKGRP),
  order = exprs(AVAL),
  check_type = "none",
  mode = "last",
  set_values_to = exprs(
    FATEST = "Maximum Diameter",
    FATESTCD = "MAXDIAM"
  )
)

adface <- derive_extreme_records(
  dataset = adface,
  dataset_add = adface,
  filter_add = FAOBJ == "FEVER" & ANL01FL == "Y",
  by_vars = exprs(USUBJID, FAOBJ, ATPTREF),
  order = exprs(VSSTRESN),
  check_type = "none",
  mode = "last",
  set_values_to = exprs(
    FATEST = "Maximum Temperature",
    FATESTCD = "MAXTEMP"
  )
)
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, FAOBJ, ATPTREF, FATESTCD, FATEST, AVAL, AVALC),
  filter = FATESTCD %in% c("MAXSEV", "MAXDIAM", "MAXTEMP")
)
```

This call returns the input dataset with maximum records added for the severity, diameter, temperature.

## Assign `PARAMCD`, `PARAM`, `PARAMN`, `PARCAT1` {#paramcd}

To assign parameter level values such as `PARAMCD`, `PARAM`, `PARAMN`,
etc., a lookup needs to be created to join to the source data.
`PARCAT1`, `PARCAT2` variables are assigned from `FACAT`, `FASCAT` variables.



For example, when creating `ADFACE` dataset, a lookup based on the SDTM `--TESTCD` value can be created:

`FATESTCD` | `PARAMCD` | `PARAMN` |`FATEST` | `FAOBJ`
--------- | --------- | -------- | ------- 
SEV | SEVREDN | 1 | Severity | Redness 
DIAMETER | DIARE | 2 | Diameter | Redness 
MAXDIAM | MDIRE | 3 | Maximum Diameter cm | Redness
MAXTEMP | MAXTEMP | 4 | Maximum Temperature | Fever
MAXSEV | MAXSWEL | 5 | Maximum Severity | Swelling
OCCUR | OCFEVER | 6 | Occurrence Indicator | Fever
OCCUR | OCERYTH | 7 | Occurrence Indicator | Erythema
SEV | SEVPAIN | 8 | Severity | Pain at Injection site
OCCUR | OCPAIN | 9 | Occurrence Indicator | Pain at Injection site
OCCUR | OCSWEL | 10 | Occurrence Indicator | Swelling

This lookup can now be joined to the source data:

```{r eval=TRUE, include=FALSE}
lookup_dataset <- tribble(
  ~FATESTCD, ~PARAMCD, ~PARAMN, ~FATEST, ~FAOBJ,
  "SEV", "SEVREDN", 1, "Severity/Intensity", "REDNESS",
  "DIAMETER", "DIARE", 2, "Diameter", "REDNESS",
  "MAXDIAM", "MDIRE", 3, "Maximum Diameter", "REDNESS",
  "MAXTEMP", "MAXTEMP", 4, "Maximum Temperature", "FEVER",
  "OCCUR", "OCFEVER", 5, "Occurrence Indicator", "FEVER",
  "OCCUR", "OCERYTH", 6, "Occurrence Indicator", "ERYTHEMA",
  "MAXSEV", "MAXSWEL", 7, "Maximum Severity", "SWELLING",
  "MAXSEV", "MAXREDN", 8, "Maximum Severity", "REDNESS",
  "MAXSEV", "MAXSFAT", 9, "Maximum Severity", "FATIGUE",
  "MAXSEV", "MAXSHEA", 10, "Maximum Severity", "HEADACHE",
  "MAXSEV", "MSEVNWJP", 11, "Maximum Severity", "NEW OR WORSENED JOINT PAIN",
  "MAXSEV", "MSEVNWMP", 12, "Maximum Severity", "NEW OR WORSENED MUSCLE PAIN",
  "OCCUR", "OCISR", 13, "Occurrence Indicator", "REDNESS",
  "OCCUR", "OCINS", 14, "Occurrence Indicator", "SWELLING",
  "OCCUR", "OCPIS", 15, "Occurrence Indicator", "PAIN AT INJECTION SITE",
  "OCCUR", "OCFATIG", 16, "Occurrence Indicator", "FATIGUE",
  "OCCUR", "OCHEAD", 17, "Occurrence Indicator", "HEADACHE",
  "OCCUR", "OCCHILLS", 18, "Occurrence Indicator", "CHILLS",
  "OCCUR", "OCDIAR", 19, "Occurrence Indicator", "DIARRHEA",
  "OCCUR", "OCCNWJP", 20, "Occurrence Indicator", "NEW OR WORSENED JOINT PAIN",
  "OCCUR", "OCCNWMP", 21, "Occurrence Indicator", "NEW OR WORSENED MUSCLE PAIN",
  "SEV", "SEVSWEL", 22, "Severity/Intensity", "SWELLING",
  "SEV", "SEVPIS", 23, "Severity/Intensity", "PAIN AT INJECTION SITE",
  "SEV", "SEVFAT", 24, "Severity/Intensity", "FATIGUE",
  "SEV", "SEVHEAD", 25, "Severity/Intensity", "HEADACHE",
  "SEV", "SEVDIAR", 26, "Severity/Intensity", "DIARRHEA",
  "SEV", "SEVNWJP", 27, "Severity/Intensity", "NEW OR WORSENED JOINT PAIN",
  "SEV", "SEVNWMP", 28, "Severity/Intensity", "NEW OR WORSENED MUSCLE PAIN",
  "MAXDIAM", "MDISW", 29, "Maximum Diameter", "SWELLING",
  "MAXSEV", "MAXSPIS", 30, "Maximum Severity", "PAIN AT INJECTION SITE",
  "OCCUR", "OCCVOM", 31, "Occurrence Indicator", "VOMITING",
  "DIAMETER", "DIASWEL", 32, "Diameter", "SWELLING"
)
```

```{r eval=TRUE}
adface <- derive_vars_params(
  dataset = adface,
  lookup_dataset = lookup_dataset,
  merge_vars = exprs(PARAMCD, PARAMN)
)
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, FAOBJ, ATPTREF, FATEST, PARAMCD, PARAM, PARAMN, PARCAT1, PARCAT2)
)
```

`PARAMCD` will be always derived from lookup dataset whereas `PARAMN`, `PARAM`, `PARCAT1`, `PARCAT2` can be either derived from lookup dataset if mentioned in `merge_vars` argument or derived in the function.

## Derive Maximum Severity Flag {#maxflag}

The function `derive_vars_max_flag()` is used to derive flag variable for the maximum values of an event.

`flag1` - Flags the maximum value per subject per event per Vaccination.
`flag2` - Flags the maximum value per subject per event for Overall.

```{r eval=TRUE}
adface <- derive_vars_max_flag(
  dataset = adface,
  flag1 = "ANL02FL",
  flag2 = "ANL03FL"
)
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, FAOBJ, ATPTREF, AVAL, AVALC, ANL01FL, ANL02FL)
)
```

This call would return the input dataset with columns `ANL02FL`, `ANL03FL` added by default. 
This function allows the user to change the name of the new variables created.

## Derive Event Occurrence Flag {#eventflag}

The function `derive_vars_event_flag()` is used to derive flag variable for the events that occurred.

`new_var1` - Flags the record if at least one of the event occurred within the observation period. 
`new_var2` - Flags the record if the event is occurred.

```{r eval=TRUE}
adface <- derive_vars_event_flag(
  dataset = adface,
  by_vars = exprs(USUBJID, FAOBJ, ATPTREF),
  aval_cutoff = 2.5,
  new_var1 = EVENTFL,
  new_var2 = EVENTDFL
)
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, FAOBJ, ATPTREF, AVAL, AVALC, EVENTFL, EVENTDFL)
)
```

This call would return the input dataset with columns `EVENTFL`, `EVENTDFL` added by default. 
This function allows the user to change the name of the new variables created as well.

## Post-processing of the Dataset {#post}

In this step, we will remove values for all the derived records in SDTM variables.

```{r eval=TRUE}
adface <- post_process_reacto(
  dataset = adface,
  filter_dataset = FATESTCD %in% c("MAXDIAM", "MAXSEV", "MAXTEMP") |
    (FATESTCD %in% c("OCCUR", "SEV") & FAOBJ %in% c("FEVER", "REDNESS", "SWELLING"))
)
```

## Add ADSL variables {#adsl_vars}

If needed, the other `ADSL` variables can now be added.
List of ADSL variables already merged held in vector `adsl_vars`

```{r eval=TRUE}
adsl <- adsl %>%
  convert_blanks_to_na() %>%
  filter(!is.na(USUBJID))

adface <- derive_vars_merged(
  dataset = adface,
  dataset_add = select(adsl, !!!negate_vars(adsl_vars)),
  by_vars = get_admiral_option("subject_keys")
)
```

```{r, echo=FALSE}
dataset_vignette(
  adface,
  display_vars = exprs(USUBJID, TRTSDT, TRTEDT, AGE, SEX)
)
```

# Example Script 

ADaM | Sample Code
---- | --------------
ADFACE | [ad_adface.R](https://github.com/pharmaverse/admiralvaccine/blob/main/inst/templates/ad_adface.R){target="_blank"}