Conditioned Data Frames

Introduction

Conditioned data frames, or cnd_df, are a powerful tool in the {sdtm.oak} package designed to facilitate conditional transformations on data frames. This article explains how to create and use conditioned data frames, particularly in the context of SDTM domain derivations.

Creating Conditioned Data Frames

A conditioned data frame is a regular data frame extended with a logical vector cnd that marks rows for subsequent conditional transformations. The condition_add() function is used to create these conditioned data frames.

Simple Example

Consider a simple data frame df:

(df <- tibble(x = 1L:3L, y = letters[1L:3L]))
## # A tibble: 3 × 2
##       x y    
##   <int> <chr>
## 1     1 a    
## 2     2 b    
## 3     3 c

We can create a conditioned data frame where only rows where x > 1 are marked:

(cnd_df <- condition_add(dat = df, x > 1L))
## # A tibble:  3 × 2
## # Cond. tbl: 2/1/0
##         x y    
##     <int> <chr>
## 1 F     1 a    
## 2 T     2 b    
## 3 T     3 c

Here, only the second and third rows are marked as TRUE.

Usage in SDTM Domain Derivations

The real power of conditioned data frames manifests when they are used with functions such as assign_no_ct, assign_ct, hardcode_no_ct, and hardcode_ct. These functions perform derivations only for the records that match the pattern of TRUE values in conditioned data frames.

Example with Concomitant Medications (CM) Domain

Consider a simplified dataset of concomitant medications, where we want to derive a new variable CMGRPID (Concomitant Medication Group ID) based on the condition that the medication treatment (CMTRT) is "BENADRYL".

Here is a simplified raw Concomitant Medications data set (cm_raw):

cm_raw <- tibble::tibble(
  oak_id = seq_len(14L),
  raw_source = "ConMed",
  patient_number = c(375L, 375L, 376L, 377L, 377L, 377L, 377L, 378L, 378L, 378L, 378L, 379L, 379L, 379L),
  MDNUM = c(1L, 2L, 1L, 1L, 2L, 3L, 5L, 4L, 1L, 2L, 3L, 1L, 2L, 3L),
  MDRAW = c(
    "BABY ASPIRIN", "CORTISPORIN", "ASPIRIN",
    "DIPHENHYDRAMINE HCL", "PARCETEMOL", "VOMIKIND",
    "ZENFLOX OZ", "AMITRYPTYLINE", "BENADRYL",
    "DIPHENHYDRAMINE HYDROCHLORIDE", "TETRACYCLINE",
    "BENADRYL", "SOMINEX", "ZQUILL"
  )
)
cm_raw
## # A tibble: 14 × 5
##    oak_id raw_source patient_number MDNUM MDRAW                        
##     <int> <chr>               <int> <int> <chr>                        
##  1      1 ConMed                375     1 BABY ASPIRIN                 
##  2      2 ConMed                375     2 CORTISPORIN                  
##  3      3 ConMed                376     1 ASPIRIN                      
##  4      4 ConMed                377     1 DIPHENHYDRAMINE HCL          
##  5      5 ConMed                377     2 PARCETEMOL                   
##  6      6 ConMed                377     3 VOMIKIND                     
##  7      7 ConMed                377     5 ZENFLOX OZ                   
##  8      8 ConMed                378     4 AMITRYPTYLINE                
##  9      9 ConMed                378     1 BENADRYL                     
## 10     10 ConMed                378     2 DIPHENHYDRAMINE HYDROCHLORIDE
## 11     11 ConMed                378     3 TETRACYCLINE                 
## 12     12 ConMed                379     1 BENADRYL                     
## 13     13 ConMed                379     2 SOMINEX                      
## 14     14 ConMed                379     3 ZQUILL

To derive the CMTRT variable we use the assign_no_ct() function to map the MDRAW variable to the CMTRT variable:

tgt_dat <- assign_no_ct(
  tgt_var = "CMTRT",
  raw_dat = cm_raw,
  raw_var = "MDRAW"
)
tgt_dat
## # A tibble: 14 × 4
##    oak_id raw_source patient_number CMTRT                        
##     <int> <chr>               <int> <chr>                        
##  1      1 ConMed                375 BABY ASPIRIN                 
##  2      2 ConMed                375 CORTISPORIN                  
##  3      3 ConMed                376 ASPIRIN                      
##  4      4 ConMed                377 DIPHENHYDRAMINE HCL          
##  5      5 ConMed                377 PARCETEMOL                   
##  6      6 ConMed                377 VOMIKIND                     
##  7      7 ConMed                377 ZENFLOX OZ                   
##  8      8 ConMed                378 AMITRYPTYLINE                
##  9      9 ConMed                378 BENADRYL                     
## 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE
## 11     11 ConMed                378 TETRACYCLINE                 
## 12     12 ConMed                379 BENADRYL                     
## 13     13 ConMed                379 SOMINEX                      
## 14     14 ConMed                379 ZQUILL

Then we create a conditioned data frame from the target data set (tgt_dat), meaning we create a conditioned data frame where only rows with CMTRT equal to "BENADRYL" are marked:

(cnd_tgt_dat <- condition_add(tgt_dat, CMTRT == "BENADRYL"))
## # A tibble:  14 × 4
## # Cond. tbl: 2/12/0
##      oak_id raw_source patient_number CMTRT                        
##       <int> <chr>               <int> <chr>                        
## 1  F      1 ConMed                375 BABY ASPIRIN                 
## 2  F      2 ConMed                375 CORTISPORIN                  
## 3  F      3 ConMed                376 ASPIRIN                      
## 4  F      4 ConMed                377 DIPHENHYDRAMINE HCL          
## 5  F      5 ConMed                377 PARCETEMOL                   
## 6  F      6 ConMed                377 VOMIKIND                     
## 7  F      7 ConMed                377 ZENFLOX OZ                   
## 8  F      8 ConMed                378 AMITRYPTYLINE                
## 9  T      9 ConMed                378 BENADRYL                     
## 10 F     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE
## 11 F     11 ConMed                378 TETRACYCLINE                 
## 12 T     12 ConMed                379 BENADRYL                     
## 13 F     13 ConMed                379 SOMINEX                      
## 14 F     14 ConMed                379 ZQUILL

Finally, we derive the CMGRPID variable conditionally. Using assign_no_ct(), we derive CMGRPID which indicates the group ID for the medication, based on the conditioned target data set:

derived_tgt_dat <- assign_no_ct(
  tgt_dat = cnd_tgt_dat,
  tgt_var = "CMGRPID",
  raw_dat = cm_raw,
  raw_var = "MDNUM"
)
derived_tgt_dat
## # A tibble: 14 × 5
##    oak_id raw_source patient_number CMTRT                         CMGRPID
##     <int> <chr>               <int> <chr>                           <int>
##  1      1 ConMed                375 BABY ASPIRIN                       NA
##  2      2 ConMed                375 CORTISPORIN                        NA
##  3      3 ConMed                376 ASPIRIN                            NA
##  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
##  5      5 ConMed                377 PARCETEMOL                         NA
##  6      6 ConMed                377 VOMIKIND                           NA
##  7      7 ConMed                377 ZENFLOX OZ                         NA
##  8      8 ConMed                378 AMITRYPTYLINE                      NA
##  9      9 ConMed                378 BENADRYL                            1
## 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
## 11     11 ConMed                378 TETRACYCLINE                       NA
## 12     12 ConMed                379 BENADRYL                            1
## 13     13 ConMed                379 SOMINEX                            NA
## 14     14 ConMed                379 ZQUILL                             NA

Conditioned data frames in the {sdtm.oak} package provide a flexible way to perform conditional transformations on data sets. By marking specific rows for transformation, users can efficiently derive SDTM variables, ensuring that only relevant records are processed.