The ATQ
package provides tools for public health
institutions to detect epidemics using school absenteeism data. It
offers functions to simulate regional populations of households,
elementary schools, and epidemics, and to calculate alarm metrics from
these simulations.
This package builds on the work of Ward et al. and Vanderkruk et al. It introduces the Alert Time Quality (ATQ) metrics such as the Average ATQ (AATQ) and First ATQ (FATQ), to evaluate the timeliness and accuracy of epidemic alerts. This vignette demonstrates the package’s use through a simulation study based on Vanderkruk et al., modeling yearly influenza epidemics and their alarm metrics in the Wellington-Dufferin-Guelph public health unit, Canada.
To use the package, install and load it with:
tryCatch({
devtools::install_github("vjoshy/ATQ_Surveillance_Package")
}, error = function(e) {
message("Unable to install package from GitHub. Using local version if available.")
})
#> colorspace (2.1-0 -> 2.1-1) [CRAN]
#> rlang (1.1.1 -> 1.1.4) [CRAN]
#> glue (1.6.2 -> 1.7.0) [CRAN]
#> cli (3.6.1 -> 3.6.3) [CRAN]
#> withr (3.0.0 -> 3.0.1) [CRAN]
#> utf8 (1.2.3 -> 1.2.4) [CRAN]
#> fansi (1.0.4 -> 1.0.6) [CRAN]
#> dplyr (1.1.2 -> 1.1.4) [CRAN]
#>
#> There are binary versions available but the source versions are later:
#> binary source needs_compilation
#> colorspace 2.1-0 2.1-1 TRUE
#> rlang 1.1.3 1.1.4 TRUE
#> cli 3.6.2 3.6.3 TRUE
#> withr 3.0.0 3.0.1 FALSE
#>
#> package 'glue' successfully unpacked and MD5 sums checked
#> package 'utf8' successfully unpacked and MD5 sums checked
#> package 'fansi' successfully unpacked and MD5 sums checked
#> package 'dplyr' successfully unpacked and MD5 sums checked
#>
#> The downloaded binary packages are in
#> C:\Users\Vinay\AppData\Local\Temp\RtmpsZG2vi\downloaded_packages
#> ── R CMD build ─────────────────────────────────────────────────────────────────
#>
#>
C:\Users\Vinay\AppData\Local\Temp\RtmpsZG2vi\file8e8c3c7263c3>doskey make=mingw32-make.exe
#>
checking for file 'C:\Users\Vinay\AppData\Local\Temp\RtmpsZG2vi\remotes8e8c2b36593\vjoshy-ATQ_Surveillance_Package-db6f668/DESCRIPTION' ...
checking for file 'C:\Users\Vinay\AppData\Local\Temp\RtmpsZG2vi\remotes8e8c2b36593\vjoshy-ATQ_Surveillance_Package-db6f668/DESCRIPTION' ...
✔ checking for file 'C:\Users\Vinay\AppData\Local\Temp\RtmpsZG2vi\remotes8e8c2b36593\vjoshy-ATQ_Surveillance_Package-db6f668/DESCRIPTION'
#>
─ preparing 'ATQ':
#> checking DESCRIPTION meta-information ...
checking DESCRIPTION meta-information ...
✔ checking DESCRIPTION meta-information
#>
─ checking for LF line-endings in source and make files and shell scripts
#>
─ checking for empty or unneeded directories
#>
Omitted 'LazyData' from DESCRIPTION
#>
─ building 'ATQ_0.2.3.tar.gz'
#>
#>
library(ATQ)
The following sections will guide you through population simulation,
epidemic modeling, and alarm metric calculation using the
ATQ
package.
ATQ
provides a simulation model that consists of three
sequential parts: 1) a population of individuals, 2) annual influenza
epidemics, 3) school absenteeism and laboratory confirmed influenza case
data. The final part of this section will include alarm metrics
evaluation.
To simulate the population of the Wellington-Dufferin-Guelph (WDG) region in Ontario, Canada, the package offers the following functions:
catchment_sim
: Simulates catchment areas using a
default gamma distribution for the number of schools in each area. The
dist_func
argument allows for specifying other
distributions.elementary_pop
: Simulates elementary school enrollment
and assigns students to catchments using a default gamma distribution.
This function requires the output of catchment_sim.
The
dist_func
argument can be modified for other
distributions.subpop_children
: Simulates households with children
using the output of elementary_pop.
It requires specifying
population proportions such as coupled parents, number of children per
household type, and proportion of elementary school-age children.
Distributions for parent, child, and age simulations can be
specified.subpop_noChildren
: Simulates households without
children using the outputs of subpop_children
and
elementary_pop.
It requires specifying proportions of
household sizes and the overall proportion of households with
children.simulate_households
: Creates a list containing two
simulated populations: households and individuals.If population proportions are not provided to
subpop_children
and subpop_noChildren
, the
functions will prompt the user for input.
set.seed(123)
# Simulate 16 catchments of 80x80 squares and the number of schools they contain
catchment <- catchment_sim(16, 80, dist_func = stats::rgamma, shape = 4.313, rate = 3.027)
# Simulate population size of elementary schools
elementary<- elementary_pop(catchment, dist_func = stats::rgamma, shape = 5.274, rate = 0.014)
# Simulate households with children
house_children <- subpop_children(elementary, n = 5,
prop_parent_couple = 0.7668901,
prop_children_couple = c(0.3634045, 0.4329440, 0.2036515),
prop_children_lone = c(0.5857832, 0.3071523, 0.1070645),
prop_elem_age = 0.4976825)
# Simulate households without children using pre-specified proportions
house_noChild <- subpop_noChildren(house_children, elementary,
prop_house_size = c(0.23246269, 0.34281716, 0.16091418, 0.16427239, 0.09953358),
prop_house_Children = 0.4277052)
# Combine household simulations and generate individual-level data
households <- simulate_households(house_children, house_noChild)
The package simulates epidemics using a stochastic Susceptible-Infected-Removed (SIR) framework. This approach differs from Vanderkruk et al., who used a spatial and network-based individual-level model.
Transmission Probability (p_inf): Calculated as \(1 - e^{-\alpha {\frac{I[t-1]}{N}}}\), where \(\alpha\) is the transmission rate, \(I[t-1]\) is the number of infectious individuals at the previous time step, and \(N\) is the total population.
New Infections (new_inf): Determined by drawing from a binomial distribution with parameters n (number of susceptible individuals) and p (transmission probability).
Compartment Updates:
Reported Cases: A subset of new infections is reported based on the reporting rate, with delays added using an exponential distribution to reflect reporting lag.
The summary and plot methods can be used to visualize and summarize the simulated epidemics:
# isolate individuals data
individuals <- households$individual_sim
# simulate epidemics for 10 years, each with a period of 300 days and 32 individuals infected initially
# infection period of 4 days
epidemic <- ssir(nrow(individuals), T = 300, alpha = 0.298, avg_start = 45,
min_start = 20, inf_period = 4, inf_init = 32, report = 0.02, lag = 7, rep = 10)
# Summarize and plot the epidemic simulation results
summary(epidemic)
#> SSIR Epidemic Summary (Multiple Simulations):
#> Number of simulations: 10
#>
#> Average total infected: 20714.2
#> Average total reported cases: 414.7
#> Average peak infected: 1589.4
#>
#> Model parameters:
#> $N
#> [1] 67822
#>
#> $T
#> [1] 300
#>
#> $alpha
#> [1] 0.298
#>
#> $inf_period
#> [1] 4
#>
#> $inf_init
#> [1] 32
#>
#> $report
#> [1] 0.02
#>
#> $lag
#> [1] 7
#>
#> $rep
#> [1] 10
plot(epidemic)
The compile_epi
function in this code compiles and
processes epidemic data, simulating school absenteeism using epidemic
and individual data. It creates a data set for actual cases, absenteeism
and laboratory confirmed cases, this data set will also include a “True
Alarm Window”, reference dates for each epidemic year and seasonal lag
values.
Absenteeism data is simulated as follows:
For each day, the proportion of infected individuals based on new infection over the past few days
Whether each child is absent or not is determined using the logic:
95% of infected children stay home
5% of healthy children are absent for reasons other than sickness
The data is aggregated across all schools.
absent_data <- compile_epi(epidemic, individuals)
dplyr::glimpse(absent_data)
#> Rows: 3,000
#> Columns: 28
#> $ Date <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
#> $ ScYr <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ pct_absent <dbl> 0.05184225, 0.05135436, 0.05130075, 0.04814988, 0.05181312…
#> $ absent <dbl> 388, 403, 404, 367, 397, 388, 384, 376, 434, 378, 360, 404…
#> $ absent_sick <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ new_inf <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ lab_conf <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Case <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ sinterm <dbl> 0.01720158, 0.03439806, 0.05158437, 0.06875541, 0.08590610…
#> $ costerm <dbl> 0.9998520, 0.9994082, 0.9986686, 0.9976335, 0.9963032, 0.9…
#> $ window <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ ref_date <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ lag0 <dbl> 0.05184225, 0.05135436, 0.05130075, 0.04814988, 0.05181312…
#> $ lag1 <dbl> NA, 0.05184225, 0.05135436, 0.05130075, 0.04814988, 0.0518…
#> $ lag2 <dbl> NA, NA, 0.05184225, 0.05135436, 0.05130075, 0.04814988, 0.…
#> $ lag3 <dbl> NA, NA, NA, 0.05184225, 0.05135436, 0.05130075, 0.04814988…
#> $ lag4 <dbl> NA, NA, NA, NA, 0.05184225, 0.05135436, 0.05130075, 0.0481…
#> $ lag5 <dbl> NA, NA, NA, NA, NA, 0.05184225, 0.05135436, 0.05130075, 0.…
#> $ lag6 <dbl> NA, NA, NA, NA, NA, NA, 0.05184225, 0.05135436, 0.05130075…
#> $ lag7 <dbl> NA, NA, NA, NA, NA, NA, NA, 0.05184225, 0.05135436, 0.0513…
#> $ lag8 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 0.05184225, 0.05135436, 0.…
#> $ lag9 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05184225, 0.05135436…
#> $ lag10 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05184225, 0.0513…
#> $ lag11 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05184225, 0.…
#> $ lag12 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05184225…
#> $ lag13 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0518…
#> $ lag14 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.…
#> $ lag15 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
The eval_metrics
function assesses the performance of
epidemic alarm systems across various lags and thresholds using school
absenteeism data. It evaluates the following key metrics:
False Alarm Rate (FAR): Proportion of alarms raised outside the true alarm window.
Added Days Delayed (ADD): Measures how many days after the optimal alarm day the first true alarm was raised.
Average Alarm Time Quality (AATQ): Mean quality of all alarms raised, considering their timing relative to the optimal alarm day.
First Alarm Time Quality (FATQ): Quality of the first alarm raised, based on its timing.
Weighted versions (WAATQ, WFATQ): Apply year-specific weights to AATQ and FATQ.
A logistic regression model with lagged absenteeism and fixed seasonal terms given by:
\[ \text{logit}(\theta_{tj}) = \beta_0 + \sum_{k=0}^l \beta_{k+1}x_{(t-k)j} + \beta_{l+2}\sin\Bigg(\frac{2\pi t^*}{T^*}\Bigg) + \beta_{l+3}\cos\Bigg(\frac{2\pi t^*}{T^*}\Bigg) + \gamma_j \]
where \(\theta_{tj}\) represents the probability of having at least one reported case on day \(t\) for school year \(j\). The predictor variables \(x_{(t-k)j}\) are the mean daily absenteeism percentages with lag times ranging from 0 to \(l\). To account for seasonal variations in influenza, trigonometric functions with sine and cosine terms are included, where \(t^*\) denotes the calendar day of the year when \(x_{tj}\) is observed and \(T^* = 365.25\) days represents the length of a year. Additionally, \(\gamma_j\) captures random effects specific to each school year \(j\).
eval_metrics
also identifies the best model parameters
(lag & threshold) for each metric. The output is a list with three
main components:
metrics
: An object containing:
matrices of each metric (FAR, ADD, AATQ, FATQ, WAATQ, WFATQ) for all lag and threshold combinations.
best models according to each metric, including lag and threshold values.
plot_data
: plot object to visualize epidemic data
and the best model for each metric
results
: provides summary statistics
In the example provided, alarms are calculated for school years 2 to 10, considering lags up to 15 days and threshold values ranging from 0.1 to 0.6 in 0.05 increments. Year weights are assigned proportionally to the school year number.
# Evaluate alarm metrics for epidemic detection
# lag of 15
alarms <- eval_metrics(absent_data, maxlag = 15, thres = seq(0.1,0.6,by = 0.05))
summary(alarms$results)
#> Alarm Metrics Summary
#> =====================
#>
#> FAR :
#> Mean: 0.5065
#> Variance: 0.0493
#> Best lag: 1
#> Best threshold: 0.45
#> Best value: 0.1111
#>
#> ADD :
#> Mean: 17.701
#> Variance: 63.9814
#> Best lag: 1
#> Best threshold: 0.1
#> Best value: 5.3333
#>
#> AATQ :
#> Mean: 0.5027
#> Variance: 0.0506
#> Best lag: 15
#> Best threshold: 0.1
#> Best value: 0.1382
#>
#> FATQ :
#> Mean: 0.4949
#> Variance: 0.0497
#> Best lag: 1
#> Best threshold: 0.15
#> Best value: 0.1477
#>
#> WAATQ :
#> Mean: 0.5308
#> Variance: 0.0701
#> Best lag: 12
#> Best threshold: 0.1
#> Best value: 0.1474
#>
#> WFATQ :
#> Mean: 0.5293
#> Variance: 0.0689
#> Best lag: 1
#> Best threshold: 0.15
#> Best value: 0.171
#>
#> Reference Dates:
#> epidemic_years ref_dates
#> 1 1 35
#> 2 2 26
#> 3 3 49
#> 4 4 36
#> 5 5 38
#> 6 6 40
#> 7 7 33
#> 8 8 20
#> 9 9 50
#> 10 10 46
#>
#> Best Prediction Dates:
#> FAR :
#> [1] NA 26 35 36 29 40 33 NA 40 46
#>
#> ADD :
#> [1] NA 26 21 29 27 30 19 NA 6 22
#>
#> AATQ :
#> [1] NA 26 32 29 24 30 28 19 6 15
#>
#> FATQ :
#> [1] NA 26 29 29 27 30 19 NA 32 32
#>
#> WAATQ :
#> [1] NA NA 32 29 20 33 28 19 13 1
#>
#> WFATQ :
#> [1] NA 26 29 29 27 30 19 NA 32 32
# Plot various alarm metrics values
plot(alarms$metrics, "FAR") # False Alert Rate
Vanderkruk, K.R., Deeth, L.E., Feng, Z. et al. ATQ: alert time quality, an evaluation metric for assessing timely epidemic detection models within a school absenteeism-based surveillance system. BMC Public Health 23, 850 (2023). https://doi.org/10.1186/s12889-023-15747-z