Splitting cohorts

For this example we’ll use the Eunomia synthetic data from the CDMConnector package.

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomiaDir())
cdm <- CDMConnector::cdmFromCon(con, cdmSchema = "main", 
                    writeSchema = "main", writePrefix = "my_study_")

Let’s start by creating two drug cohorts, one for users of diclofenac and another for users of acetaminophen.

cdm$medications <- conceptCohort(cdm = cdm, 
                                 conceptSet = list("diclofenac" = 1124300,
                                                   "acetaminophen" = 1127433), 
                                 name = "medications")
cohortCount(cdm$medications)
#> # A tibble: 2 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1           9365            2580
#> 2                    2            830             830

We can stratify cohorts based on specified columns using the function stratifyCohorts(). In this example, let’s stratify the medications cohort by age and sex.

cdm$stratified <- cdm$medications |>
  addAge(ageGroup = list("Child" = c(0,17), "18 to 65" = c(18,64), "65 and Over" = c(65, Inf))) |>
  addSex(name = "stratified") |>
  stratifyCohorts(strata = list("sex", "age_group", c("sex", "age_group")), name = "stratified")

settings(cdm$stratified)
#> # A tibble: 22 × 10
#>    cohort_definition_id cohort_name          target_cohort_id target_cohort_name
#>                   <int> <chr>                           <int> <chr>             
#>  1                    1 acetaminophen_female                1 acetaminophen     
#>  2                    2 acetaminophen_male                  1 acetaminophen     
#>  3                    3 diclofenac_female                   2 diclofenac        
#>  4                    4 diclofenac_male                     2 diclofenac        
#>  5                    5 acetaminophen_18_to…                1 acetaminophen     
#>  6                    6 acetaminophen_65_an…                1 acetaminophen     
#>  7                    7 acetaminophen_child                 1 acetaminophen     
#>  8                    8 diclofenac_18_to_65                 2 diclofenac        
#>  9                    9 diclofenac_65_and_o…                2 diclofenac        
#> 10                   10 diclofenac_child                    2 diclofenac        
#> # ℹ 12 more rows
#> # ℹ 6 more variables: cdm_version <chr>, vocabulary_version <chr>,
#> #   target_cohort_table_name <chr>, strata_columns <chr>, sex <chr>,
#> #   age_group <chr>

The age and sex columns are added using functions from the package PatientProfiles. The ‘stratified’ table includes 22 cohorts, representing various combinations of sex and age groups.

We can also split cohorts for specified years using the function yearCohorts().

cdm$years <- cdm$medications |>
  yearCohorts(years = 2005:2010, name = "years")

settings(cdm$years)
#> # A tibble: 12 × 7
#>    cohort_definition_id cohort_name        target_cohort_definitio…¹ cdm_version
#>                   <int> <chr>                                  <int> <chr>      
#>  1                    1 acetaminophen_2005                         1 5.3        
#>  2                    2 diclofenac_2005                            2 5.3        
#>  3                    3 acetaminophen_2006                         1 5.3        
#>  4                    4 diclofenac_2006                            2 5.3        
#>  5                    5 acetaminophen_2007                         1 5.3        
#>  6                    6 diclofenac_2007                            2 5.3        
#>  7                    7 acetaminophen_2008                         1 5.3        
#>  8                    8 diclofenac_2008                            2 5.3        
#>  9                    9 acetaminophen_2009                         1 5.3        
#> 10                   10 diclofenac_2009                            2 5.3        
#> 11                   11 acetaminophen_2010                         1 5.3        
#> 12                   12 diclofenac_2010                            2 5.3        
#> # ℹ abbreviated name: ¹target_cohort_definition_id
#> # ℹ 3 more variables: vocabulary_version <chr>, year <int>,
#> #   target_cohort_name <chr>

The ‘years’ table includes 12 cohorts, with each cohort representing a specific drug and year.