--- title: "Introduction to CodelistGenerator" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{01_Introduction_to_CodelistGenerator} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Creating a code list for dementia For this example we are going to generate a candidate codelist for dementia, only looking for codes in the condition domain. Let's first load some libraries ```{r, message=FALSE, warning=FALSE,echo=FALSE} library(DBI) library(dplyr) library(CodelistGenerator) library(CDMConnector) ``` ## Connect to the OMOP CDM vocabularies CodelistGenerator works with a `cdm_reference` to the vocabularies tables of the OMOP CDM using the CDMConnector package. ```{r, eval=FALSE} # example with postgres database connection details db <- DBI::dbConnect(RPostgres::Postgres(), dbname = Sys.getenv("server"), port = Sys.getenv("port"), host = Sys.getenv("host"), user = Sys.getenv("user"), password = Sys.getenv("password") ) # create cdm reference cdm <- CDMConnector::cdm_from_con( con = db, cdm_schema = Sys.getenv("vocabulary_schema") ) ``` ## Check version of the vocabularies It is important to note that the results from CodelistGenerator will be specific to a particular version of the OMOP CDM vocabularies. We can see the version of the vocabulary being used like so ```{r, eval=FALSE} getVocabVersion(cdm = cdm) ``` ```{r, message=FALSE, warning=FALSE,echo=FALSE} vocabVersion <- load(system.file("introVocab.RData", package = "CodelistGenerator")) vocabVersion ``` ## A code list from "Dementia" (4182210) and its descendants The simplest approach to identifying potential codes is to take a high-level code and include all its descendants. ```{r, eval=FALSE} codesFromDescendants <- tbl( db, sql(paste0( "SELECT * FROM ", vocabularyDatabaseSchema, ".concept_ancestor" )) ) |> filter(ancestor_concept_id == "4182210") |> select("descendant_concept_id") |> rename("concept_id" = "descendant_concept_id") |> left_join(tbl(db, sql(paste0( "SELECT * FROM ", vocabularyDatabaseSchema, ".concept" )))) |> select( "concept_id", "concept_name", "domain_id", "vocabulary_id" ) |> collect() ``` ```{r, message=FALSE, warning=FALSE,echo=FALSE} codesFromDescendants <- readRDS(system.file("introData01.RData", package = "CodelistGenerator")) ``` ```{r, message=FALSE, warning=FALSE } codesFromDescendants |> glimpse() ``` This looks to pick up most relevant codes. But, this approach misses codes that are not a descendant of 4182210. For example, codes such as "Wandering due to dementia" (37312577; https://athena.ohdsi.org/search-terms/terms/37312577) and "Anxiety due to dementia" (37312031; https://athena.ohdsi.org/search-terms/terms/37312031) are not picked up. ## Generating a candidate code list using CodelistGenerator To try and include all such terms that could be included we can use CodelistGenerator. First, let's do a simple search for a single keyword of "dementia", including descendants of the identified codes. ```{r, eval=FALSE } dementiaCodes1 <- getCandidateCodes( cdm = cdm, keywords = "dementia", domains = "Condition", includeDescendants = TRUE ) ``` ```{r, message=FALSE, warning=FALSE,echo=FALSE} dementiaCodes1 <- readRDS(system.file("introData02.RData", package = "CodelistGenerator")) ``` ```{r, message=FALSE, warning=FALSE } dementiaCodes1|> glimpse() ``` ## Comparing code lists What is the difference between this code list and the one from 4182210 and its descendants? ```{r, eval=FALSE } codeComparison <- compareCodelists( codesFromDescendants, dementiaCodes1 ) ``` ```{r, message=FALSE, warning=FALSE,echo=FALSE} codeComparison <- readRDS(system.file("introData03.RData", package = "CodelistGenerator")) ``` ```{r, message=FALSE, warning=FALSE } codeComparison |> group_by(codelist) |> tally() ``` What are these extra codes picked up by CodelistGenerator? ```{r, message=FALSE, warning=FALSE } codeComparison |> filter(codelist == "Only codelist 2") |> glimpse() ``` ## Review mappings from non-standard vocabularies Perhaps we want to see what ICD10CM codes map to our candidate code list. We can get these by running ```{r, message=FALSE, warning=FALSE,echo=FALSE} icdMappings <- readRDS(system.file("introData04.RData", package = "CodelistGenerator")) ``` ```{r, eval=FALSE } icdMappings <- getMappings( cdm = cdm, candidateCodelist = dementiaCodes1, nonStandardVocabularies = "ICD10CM" ) ``` ```{r, message=FALSE, warning=FALSE } icdMappings |> glimpse() ``` ```{r, message=FALSE, warning=FALSE,echo=FALSE} readMappings <- readRDS(system.file("introData05.RData", package = "CodelistGenerator")) ``` ```{r, eval=FALSE } readMappings <- getMappings( cdm = cdm, candidateCodelist = dementiaCodes1, nonStandardVocabularies = "Read" ) ``` ```{r, message=FALSE, warning=FALSE } readMappings |> glimpse() ```