---
title: "Summarise clinical tables records"
output: 
  html_document:
    pandoc_args: [
      "--number-offset=1,0"
      ]
    number_sections: yes
    toc: yes
vignette: >
  %\VignetteIndexEntry{A-summarise_clinical_tables_records}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

In this vignette, we will explore the *OmopSketch* functions designed to provide an overview of the clinical tables within a CDM object (*observation_period*, *visit_occurrence*, *condition_occurrence*, *drug_exposure*, *procedure_occurrence*, *device_exposure*, *measurement*, *observation*, and *death*). Specifically, there are four key functions that facilitate this:

-   `summariseClinicalRecords()` and `tableClinicalRecords()`: Use them to create a summary statistics with key basic information of the clinical table (e.g., number of records, number of concepts mapped, etc.)

-   `summariseRecordCount()` and `plotRecordCount()`: Use them to summarise the number of records within a specific time interval.

## Create a mock cdm

Let's see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database.

```{r, warning=FALSE}
library(dplyr)
library(OmopSketch)

# Connect to mock database
cdm <- mockOmopSketch()
```

# Summarise clinical tables

Let's now use `summariseClinicalTables()`from the OmopSketch package to help us have an overview of one of the clinical tables of the cdm (i.e., **condition_occurrence**).

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence")

summarisedResult |> print()
```

Notice that the output is in the summarised result format.

We can use the arguments to specify which statistics we want to perform. For example, use the argument `recordsPerPerson` to indicate which estimates you are interested regarding the number of records per person.

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, 
                                             "condition_occurrence",
                                             recordsPerPerson =  c("mean", "sd", "q05", "q95"))

summarisedResult |> 
    filter(variable_name == "records_per_person") |>
    select(variable_name, estimate_name, estimate_value)
```

You can further specify if you want to include the number of records in observation (`inObservation = TRUE`), the number of concepts mapped (`standardConcept = TRUE`), which types of source vocabulary does the table contain (`sourceVocabulary = TRUE`), which types of domain does the vocabulary have (`domainId = TRUE`) or the concept's type (`typeConcept = TRUE`).

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, 
                                             "condition_occurrence",
                                             recordsPerPerson =  c("mean", "sd", "q05", "q95"),
                                             inObservation = TRUE,
                                             standardConcept = TRUE,
                                             sourceVocabulary = TRUE,
                                             domainId = TRUE,
                                             typeConcept = TRUE)

summarisedResult |> 
  select(variable_name, estimate_name, estimate_value) |> 
  glimpse()
```

Additionally, you can also stratify the previous results by sex and age groups:

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, 
                                             "condition_occurrence",
                                             recordsPerPerson =  c("mean", "sd", "q05", "q95"),
                                             inObservation = TRUE,
                                             standardConcept = TRUE,
                                             sourceVocabulary = TRUE,
                                             domainId = TRUE,
                                             typeConcept = TRUE,
                                             sex = TRUE,
                                             ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)))

summarisedResult |> 
  select(variable_name, strata_level, estimate_name, estimate_value) |> 
  glimpse()
```

Notice that, by default, the "overall" group will be also included, as well as crossed strata (that means, sex == "Female" and ageGroup == "\>35").

Also, see that the analysis can be conducted for multiple OMOP tables at the same time:

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, 
                                             c("observation_period","drug_exposure"),
                                             recordsPerPerson =  c("mean","sd"),
                                             inObservation = FALSE,
                                             standardConcept = FALSE,
                                             sourceVocabulary = FALSE,
                                             domainId = FALSE,
                                             typeConcept = FALSE)

summarisedResult |> 
  select(group_level, variable_name, estimate_name, estimate_value) |> 
  glimpse()
```

## Tidy the summarised object

`tableClinicalRecords()` will help you to tidy the previous results and create a gt table.

```{r, warning=FALSE}
summarisedResult <- summariseClinicalRecords(cdm, 
                                             "condition_occurrence",
                                             recordsPerPerson =  c("mean", "sd", "q05", "q95"),
                                             inObservation = TRUE,
                                             standardConcept = TRUE,
                                             sourceVocabulary = TRUE,
                                             domainId = TRUE,
                                             typeConcept = TRUE, 
                                             sex = TRUE)

summarisedResult |> 
  tableClinicalRecords()
```

# Summarise record counts

OmopSketch can also help you to summarise the trend of the records of an OMOP table. See the example below, where we use `summariseRecordCount()` to count the number of records within each year, and then, we use `plotRecordCount()` to create a ggplot with the trend.

```{r, warning=FALSE}
summarisedResult <- summariseRecordCount(cdm, "drug_exposure", unit = "year", unitInterval = 1)

summarisedResult |> print()

summarisedResult |> plotRecordCount()
```

Note that you can adjust the time interval period using the `unit` argument, which can be set to either "year" or "month", and the `unitInterval` argument, which must be an integer specifying the number of years or months which to count the records. See the example below, where it shows the number of records every 18 months:

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure", unit = "month", unitInterval = 18) |> 
  plotRecordCount()
```

We can further stratify our counts by sex (setting argument `sex = TRUE`) or by age (providing an age group). Notice that in both cases, the function will automatically create a group called *overall* with all the sex groups and all the age groups.

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure",
                      unit = "month", 
                      unitInterval = 18, 
                      sex = TRUE, 
                      ageGroup = list("<30" = c(0,29),
                                     ">=30" = c(30,Inf))) |> 
  plotRecordCount()
```

By default, `plotRecordCount()` does not apply faceting or colour to any variables. This can result confusing when stratifying by different variables, as seen in the previous picture. We can use [VisOmopResults](https://darwin-eu.github.io/visOmopResults/) package to help us know by which columns we can colour or face by:

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure",
                     unit = "month", 
                     unitInterval = 18, 
                     sex = TRUE,
                     ageGroup = list("0-29" = c(0,29),
                                     "30-Inf" = c(30,Inf)))  |>
  visOmopResults::tidyColumns()
```

Then, we can simply specify this by using the `facet` and `colour` arguments from `plotRecordCount()`

```{r, warning=FALSE}
summariseRecordCount(cdm, "drug_exposure",
                     unit = "month", 
                     unitInterval = 18, 
                     sex = TRUE,
                     ageGroup = list("0-29" = c(0,29),
                                     "30-Inf" = c(30,Inf))) |>
    plotRecordCount(facet = omop_table ~ age_group, colour = "sex")
```

Finally, disconnect from the cdm

```{r, warning=FALSE}
  PatientProfiles::mockDisconnect(cdm = cdm)
```