--- title: "Use Case 03: Estimation of intracluster correlations (ICC) by cluster size" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Use Case 03: Estimation of intracluster correlations (ICC) by cluster size} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The Intracluster Correlation Coefficient (ICC) is one of the inputs to standard power and sample size calculations for CRTs. Trialists often have difficulty identifying an appropriate source for their ICC calculations, or use a value from a source of questionable relevance. The [`CRTanalysis`](../reference/CRTanalysis.html) function has an option to use Generalised Estimating Equations, which provide an estimate of the ICC. This can be applied to baseline data, and hence to different cluster configurations. This makes it possible to estimate the ICC which is appropriate for any given cluster definition, in the chosen geography, assuming baseline data are available. ``` r # use the same dataset as for Use Case 1. library(CRTspat) example_locations <- readdata('example_site.csv') example_locations$base_denom <- 1 library(dplyr) example <- CRTsp(example_locations) %>% aggregateCRT(auxiliaries = c("RDT_test_result", "base_denom")) summary(example) ``` ``` ## ===============================CLUSTER RANDOMISED TRIAL =========================== ## ## Summary of coordinates ## ---------------------- ## Min. : 1st Qu.: Median : Mean : 3rd Qu.: Max. : ## x -3.20 -1.40 -0.30 -0.07 1.26 5.16 ## y -5.08 -2.84 0.19 0.05 2.49 6.16 ## ## Total area (within 0.2 km of a location) : 27.6 sq.km ## Total area (convex hull) : 48.2 sq.km ## ## Locations and Clusters ## ---------------------- - ## Coordinate system (x, y) ## Locations: 1181 ## Available clusters (across both arms) Not assigned ## No randomization - ## No power calculations to report - ## ## Other variables in dataset ## -------------------------- RDT_test_result base_denom ``` ``` r # randomly sample an array of values of c (use a small sample size for testing # the plots were produced with n=5000) set.seed(5) c_vec <- round(runif(50, min = 6, max = 150)) # a user function randomizes and analyses each simulated trial CRTscenario3 <- function(c, CRT) { ex <- specify_clusters(CRT, c = c, algo = "kmeans") %>% randomizeCRT() GEEanalysis <- CRTanalysis(ex, method = "GEE", baselineOnly = TRUE, excludeBuffer = FALSE, baselineNumerator = "RDT_test_result", baselineDenominator = "base_denom") locations <- GEEanalysis$description$locations ICC <- GEEanalysis$pt_ests$ICC value <- c(c = c, ICC = ICC, mean_h = locations/c) return(value) } # The results are collected in a data frame results <- t(sapply(c_vec, FUN = CRTscenario3, simplify = "array", CRT = example)) %>% data.frame() ``` There is a clear downward trend in the ICC estimates, as cluster size increases (Figure 3.1). The ICC expected for a trial in this, or similar, geographies can be read off the curve. Note that the ICC is expected to vary not just with cluster size, but also to vary between different outcomes.
Fig 3.1 Intracluster correlation by size of cluster