--- title: "The cdm reference" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{a01_cdm_reference} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` A cdm reference is a single R object that represents OMOP CDM data. The tables in the cdm reference may be in a database, but a cdm reference may also contain OMOP CDM tables that are in dataframes or tibbles, or in arrow. In the latter cases the cdm reference would typically be a subset of an original cdm reference that has been derived as part of a particular analysis. omopgenerics provides a general class definition a cdm reference and a dataframe/ tibble implementation. For creating a cdm reference using a database, see the CDMConnector package (). A cdm reference is a list of tables. These tables come in three types: standard OMOP CDM tables, cohort tables, and other auxiliary tables. ### 1) Standard OMOP CDM tables There are multiple versions of the OMOP CDM. The list of tables included in version 5.3 are as follows. ```{r} library(omopgenerics) omopTables() ``` The standard OMOP tables have required fields. We can check the required column of the person table, for example, like so ```{r} omopColumns(table = "person", version = "5.3") ``` ```{r} omopColumns(table = "observation_period", version = "5.3") ``` ### 2) Cohort tables Studies using the OMOP CDM often create study-specific cohort tables. We also consider these as part of the cdm reference once created. Each cohort table is associated with a specific class of its own, a `generatedCohortSet`, which is described more in a subsequent vignette. As with the standard OMOP CDM tables, cohort tables are expected to contain a specific set of fields (with no restriction placed on whether they include additional fields or not). ```{r} cohortColumns(table = "cohort", version = "5.3") cohortColumns(table = "cohort_set", version = "5.3") cohortColumns(table = "cohort_attrition", version = "5.3") ``` ### 3) Achilles result tables The Achilles R package provides descriptive statistics on an OMOP CDM database. The results from Achilles are stored in tables in the database. The following tables are created with the given columns. ```{r} achillesTables() achillesColumns("achilles_analysis") achillesColumns("achilles_results") achillesColumns("achilles_results_dist") ``` ### 4) Other tables Beyond the standard OMOP CDM tables and cohort tables, additional tables can be added to the cdm reference. These tables could, for example, be OMOP extension/ expansion tables or extra tables containing data required to perform a study but not normally included as part of the OMOP CDM. These tables could contain any set of fields. ## General rules for a cdm reference Any table to be part of a cdm object has to fulfill the following conditions: - All tables must share a common source (that is, a mix of tables in the database and in-memory is not permitted). - The name of the tables must be lower snake_case. - The name of the column names of each table must be lower snake_case. - The `person` and `observation_period` tables must be present. - The cdm reference must have an attribute "cdmName" that gives the name associated with the data contained there within. ## Export metadata about the cdm reference When the export method is applied to a cdm reference, metadata about that cdm will be written to a csv. The csv contains the following columns | Variable | Description | Datatype | Required | |:----------------|:-----------------|:----------------|:--------------------| | result_type | Always "Snapshot". Identifies this result as a summary of a cdm reference. | Character | Yes | | cdm_name | The name of the data source. | Character | Yes | | cdm_source_name | Value of cdm source name taken from the cdm source table (if present in the cdm reference). | Character | No | | cdm_description | Value of cdm description taken from the cdm source table (if present in the cdm reference). | Character | No | | cdm_documentation_reference | Value of cdm documentation reference taken from the cdm source table (if present in the cdm reference). | Character | No | | cdm_version | The cdm version associated with the cdm reference. | Character | Yes | | cdm_holder | Value of cdm holder reference taken from the cdm source table (if present in the cdm reference). | Character | No | | cdm_release_date | Value of cdm release date taken from the cdm source table (if present in the cdm reference). | Date | No | | vocabulary_version | Version of the vocabulary being used taken from the concept table (if present in the cdm reference). | Character | No | | person_count | Number of records in the person table. | Integer | Yes | | observation_period_count | Number of records in the observation period table. | Integer | Yes | | earliest_observation_period_start_date | Earliest date in the observation period start date field from the observation period table. | Date | Yes | | latest_observation_period_end_date | Latest date in the observation period start date field from the observation period table. | Date | Yes | | snapshot_date | Date at which this snapshot was created. | Date | Yes |