--- title: "Summarise population characteristics" output: html_document: pandoc_args: [ "--number-offset=1,0" ] number_sections: yes toc: yes vignette: > %\VignetteIndexEntry{C-summarise_pop_characteristics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(CDMConnector) if (Sys.getenv("EUNOMIA_DATA_FOLDER") == "") Sys.setenv("EUNOMIA_DATA_FOLDER" = tempdir()) if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))) dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER")) if (!eunomia_is_available()) downloadEunomiaData() ``` # Introduction In this vignette, we will explore the *OmopSketch* functions that provide information about individuals characteristics at specific points in time. We will employ `summarisePopulationCharacteristics()` to generate a summary of the demographic details within the database population. Additionally, we will tidy and present the results using `tablePopulationCharacteristics()`, which supports either [gt](https://gt.rstudio.com/) or [flextable](https://davidgohel.github.io/flextable/) for formatting the output. ## Create a mock cdm Before we dive into *OmopSketch* functions, we need first to load the essential packages and create a mock CDM using the Eunomia database. ```{r, warning=FALSE} library(dplyr) library(CDMConnector) library(DBI) library(duckdb) library(OmopSketch) # Connect to Eunomia database con <- DBI::dbConnect(duckdb::duckdb(), CDMConnector::eunomia_dir()) cdm <- CDMConnector::cdmFromCon( con = con, cdmSchema = "main", writeSchema = "main" ) cdm ``` # Summarise population characteristics To start, we will use `summarisePopulationCharacteristics()` function to generate a summarised result object, capturing demographic characteristics at both `observation_period_start_date` and `observation_period_end_date`. ```{r, warning=FALSE} summarisedResult <- summarisePopulationCharacteristics(cdm) summarisedResult |> glimpse() ``` To tidy and display the summarised result using a [gt](https://gt.rstudio.com/) table, we can use `tablePopulationCharacteristics()` function. ```{r, warning=FALSE} summarisedResult |> tablePopulationCharacteristics(type = "flextable") ``` To obtain a [flextable](https://davidgohel.github.io/flextable/) instead of a [gt](https://gt.rstudio.com/), you can simply change the `type` argument to `"flextable"`. Additionally, it is important to note that age at start, prior observation, and future observation are calculated at the start date defined (in this case, at individuals observation_period_start_date). On the other hand, age at end is calculated at the end date defined (i.e., individuals observation_period_end_date). ## Trim study period To focus on a specific period within the observation data, rather than analysing the entire individuals' observation period, we can trim the study period by using the `studyPeriod` argument. This allows to analyse the demographic metrics within a defined time range rather than the default observation start and end dates. ```{r, warning=FALSE} summarisePopulationCharacteristics(cdm, studyPeriod = c("1950-01-01", "1999-12-31")) |> tablePopulationCharacteristics() ``` However, if you are interested in analysing the demographic characteristics starting from a specific date without restricting the study end, you can define just the start of the study period. By default, `summarisePopulationCharacteristics()` function will use the observation_period_end_date to calculate the end-point statistics when the end date is not defined. ```{r, warning=FALSE} summarisePopulationCharacteristics(cdm, studyPeriod = c("1950-01-01", NA)) |> tablePopulationCharacteristics() ``` Similarly, if you are only interested in analysing the population characteristics up to a specific end date, you can define only the end date and set the `startDate = NA`. By default the observation_period_start_date will be used. ## Stratify by age groups and sex Population characteristics can also be estimated by stratifying the data based on age and sex using `ageGroups` and `sex` arguments. ```{r, warning=FALSE} summarisePopulationCharacteristics(cdm, sex = TRUE, ageGroup = list("<60" = c(0,59), ">=60" = c(60, Inf))) |> tablePopulationCharacteristics() ```