--- title: "The making-of the figures in the article" bibliography: "../inst/REFERENCES.bib" csl: "../inst/apa-6th.csl" output: rmarkdown::html_vignette description: > This vignette provides all the steps needed to make the figures from the article. vignette: > %\VignetteIndexEntry{The making-of the figures in the article} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- Herein we provide the full code to realize Figures 1 to 4 that illustrate the article describing the ``superb`` framework [@cgh21]. These figures are based on ficticious data sets that are provided with the package superb (``dataFigure1`` to ``dataFigure4``). Before we begin, we need to load a few packages, including ``superb`` of course: ```{r, message=FALSE, warning=FALSE, echo=TRUE, eval=TRUE} ## Load relevant packages library(superb) # for superbPlot library(ggplot2) # for all the graphic directives library(gridExtra) # for grid.arrange ``` If they are not present on your computer, first upload them to your computer with ``install.packages("name of the package")``. # Making Figure 1 The purpose of Figure 1 is to illustrate the difference in error bars when the purpose of these measures of precision is to perform pair-wise comparisons. It is based on the data from ``dataFigure1``, whose columns are ```{r, message=FALSE, echo=TRUE, eval=TRUE} head(dataFigure1) ``` where ``id`` is just a participant identifier, ``grp`` indicate group membership (here only group 1 and group 2), and finally, ``score`` is the dependent variable. The first panel on the left is based on stand-alone confidence intervals and is obtained with : ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 1a**. Left panel of Figure 1."} plt1a <- superbPlot(dataFigure1, BSFactors = "grp", variables = "score", plotStyle = "line" ) plt1a ``` Note that these *stand-alone* error bars could have been obtained by adding the argument ``adjustments = list(purpose = "single")`` but as it is the default value, it can be omitted. The default ``theme`` to ``ggplot``s is not very attractive. Let's decorate the plot a bit! To that end, I collected some additional ggplot directives in a list: ```{r, message=FALSE, echo=TRUE, eval=TRUE} ornateBS <- list( xlab("Group"), ylab("Attitude towards class activities"), scale_x_discrete(labels = c("Collaborative\ngames", "Unstructured\nactivities")), #new! coord_cartesian( ylim = c(70,130) ), geom_hline(yintercept = 100, colour = "black", size = 0.5, linetype=2), theme_light(base_size = 10) + theme( plot.subtitle = element_text(size=12)) ) ``` so that the first plot, with these ornaments and a title, is: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 1b**. Decorating left panel of Figure 1."} plt1a <- plt1a + ornateBS + labs(subtitle="(stand-alone)\n95% CI") plt1a ``` The second plot is obtained in a simar fashion with just one additional argument requesting difference-adjusted confidence intervals: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 1c**. Making and decorating central panel of Figure 1."} plt1b <- superbPlot(dataFigure1, BSFactors = "grp", variables = "score", adjustments = list(purpose = "difference"), #new! plotStyle = "line" ) plt1b <- plt1b + ornateBS + labs(subtitle="Difference-adjusted\n95% CI") plt1b ``` Finally, the *raincloud* plot is obtained by changing the ``plotStyle`` argument: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 1d**. Making and decorating right panel of Figure 1."} plt1c <- superbPlot(dataFigure1, BSFactors = "grp", variables = "score", adjustments = list(purpose = "difference"), plotStyle = "raincloud", # new layout! violinParams = list(fill = "green", alpha = 0.2) ) # changed color to the violin plt1c <- plt1c + ornateBS + labs(subtitle="Difference-adjusted\n95% CI") plt1c ``` All three plots are shown side-by-side with: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=9, fig.height=4, fig.cap="**Figure 1**. The complete Figure 1."} grid.arrange(plt1a, plt1b, plt1c, ncol=3) ``` and exported to a file, with, e.g.: ```{r, message=FALSE, echo=TRUE, eval=FALSE} png(filename = "Figure1.png", width = 640, height = 320) grid.arrange(plt1a, plt1b, plt1c, ncol=3) dev.off() ``` You can change the smooth density estimator (the ``kernel`` function) by adding to the ``violinParams`` list the following attribute: ``kernel = "rectangular"`` for example to have a ragged density estimator. In the end, which error bars are correct? Remember that the *golden rule of adjusted confidence intervals* is to look for **inclusion**: for example, is the mean of the second group part of the possible results suggested by the first group's confidence interval? yes it is in the central plot. The central plot therefore indicate an absence of signficant difference. The confidence intervals being 95%, this conclusion is statistically significant at the 5% level. Still not convinced? Let's do a *t* test (actually, a Welch test; add ``var.equal = TRUE`` for the regular *t* test; @dll17): ```{r, message=FALSE, echo=TRUE, eval=TRUE} t.test(dataFigure1$score[dataFigure1$grp==1], dataFigure1$score[dataFigure1$grp==2], ) ``` There are no significant difference in these data at the .05 level, so that the error bar of a 95% confidence interval should contain the other result, as is the case in the central panel. Finally, you could try here the Tryon's adjustments (changing to ``adjustments = list(purpose = "tryon")``). However, you will notice no difference at all. Indeed, the variances are almost identical in the two groups (`r round(sd(dataFigure1[dataFigure1$grp ==1,]$score),2)` vs. `r round(sd(dataFigure1[dataFigure1$grp ==2,]$score),2)`). # Making Figure 2 Figure 2 is made in a similar fashion, and using the same decorations ``ornate`` as above with just a different name for the variable on the horizontal axis: ```{r, message=FALSE, echo=TRUE, eval=TRUE} ornateWS <- list( xlab("Moment"), #different! scale_x_discrete(labels=c("Pre\ntreatment", "Post\ntreatment")), ylab("Statistics understanding"), coord_cartesian( ylim = c(75,125) ), geom_hline(yintercept = 100, colour = "black", linewidth = 0.5, linetype=2), theme_light(base_size = 10) + theme( plot.subtitle = element_text(size=12)) ) ``` The difference in the present example is that the data are from a within-subject design with two repeated measures. The dataset must be in a wide format, e.g., ```{r} head(dataFigure2) ``` This makes the left panel: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 2a**. Making left panel of Figure 2."} plt2a <- superbPlot(dataFigure2, WSFactors = "Moment(2)", variables = c("pre","post"), adjustments = list(purpose = "single"), plotStyle = "line" ) plt2a <- plt2a + ornateWS + labs(subtitle="Stand-alone\n95% CI") plt2a ``` ... and this makes the central panel, specifying the ``CA`` (*correlation-adjusted*) decorelation technique: ```{r, message=TRUE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 2b**. Making central panel of Figure 2."} plt2b <- superbPlot(dataFigure2, WSFactors = "Moment(2)", variables = c("pre","post"), adjustments = list(purpose = "difference", decorrelation = "CA"), #new plotStyle = "line" ) plt2b <- plt2b + ornateWS + labs(subtitle="Correlation and difference-\nadjusted 95% CI") plt2b ``` As seen, ``superbPlot`` issues relevant information (indicated with ``FYI`` messages), here the correlation. To get a sense of the general trends in the data, we can examine the data participants per participants, joining their results with a line. As seen below, for most participants, the trend is upward, suggesting a strongly reliable effect of the moment: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 2c**. Making third panel of Figure 2."} plt2c <- superbPlot(dataFigure2, WSFactors = "Moment(2)", variables = c("pre","post"), adjustments = list(purpose = "difference", decorrelation = "CA"), plotStyle = "pointindividualline" ) #new plt2c <- plt2c + ornateWS + labs(subtitle="Correlation and difference-\nadjusted 95% CI") plt2c ``` Just for the exercice, we also compute the plot of the difference between the scores. To that end, we need different labels on the x-axis and a different range: ```{r} ornateWS2 <- list( xlab("Difference"), scale_x_discrete(labels=c("Post minus Pre\ntreatment")), ylab("Statistics understanding"), coord_cartesian( ylim = c(-25,+25) ), geom_hline(yintercept = 0, colour = "black", linewidth = 0.5, linetype=2), theme_light(base_size = 10) + theme( plot.subtitle = element_text(size=12)) ) ``` We then compute the differences and make the plot: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 2d**. Making right panel of Figure 2."} dataFigure2$diff <- dataFigure2$post - dataFigure2$pre plt2d <- superbPlot(dataFigure2, WSFactor = "Moment(1)", variables = c("diff"), adjustments = list(purpose = "single", decorrelation = "none"), plotStyle = "raincloud", violinParams = list(fill = "green") ) #new plt2d <- plt2d + ornateWS2 + labs(subtitle="95% CI \nof the difference") plt2d ``` This last plot does not require decorrelation and is not adjusted for difference. Decorrelation would not do anything as ``diff`` is a single column; difference-adjustment is inadequate here as the difference is to be compared to a fix value (namely 0, for zero improvement) Assembling all four panels, we get: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=9, fig.height=4, fig.cap="**Figure 2**. The complete Figure 2."} grid.arrange(plt2a, plt2b, plt2c, plt2d, ncol=4) ``` ... which can be exported to a file as usual: ```{r, message=FALSE, echo=TRUE, eval=FALSE} png(filename = "Figure2.png", width = 850, height = 320) grid.arrange(plt2a, plt2b, plt2c, plt2d, ncol=4) dev.off() ``` Which error bars are depicting the significance of the result most aptly? The adjusted ones seen in the central panel, as confirmed by a t-test on paired data: ```{r, message=FALSE, echo=TRUE, eval=TRUE} t.test(dataFigure2$pre, dataFigure2$post, paired=TRUE) ``` The confidence interval of one moment does not *include* the result from the other moment, indicating a significant difference between the two moments. # Making Figure 3 The novel element in Figure 3 is the fact that the participants have been recruited by clusters of participants. We first adapt the ornaments for this example: ```{r, message=FALSE, echo=TRUE, eval=TRUE} ornateCRS <- list( xlab("Group"), ylab("Quality of policies"), scale_x_discrete(labels=c("From various\nfields", "From the\nsame field")), #new! coord_cartesian( ylim = c(75,125) ), geom_hline(yintercept = 100, colour = "black", linewidth = 0.5, linetype=2), theme_light(base_size = 10) + theme( plot.subtitle = element_text(size=12)) ) ``` Then, we get an unadjusted plot as usual: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 3a**. The left panel of Figure 3."} plt3a <- superbPlot(dataFigure3, BSFactors = "grp", variables = "VD", adjustments = list(purpose = "single", samplingDesign = "SRS"), plotStyle = "line" ) plt3a <- plt3a + ornateCRS + labs(subtitle="Stand-alone\n95% CI") plt3a ``` Here, the option ```samplingDesign = "SRS"``` is the default and can be omitted. To indicate the presence of *cluster-randomized sampling*, the ``samplingDesign`` option is set to ``"CRS"`` and an additional information, ``clusterColumn`` is indicated to identify the column containing the cluster membership information: ```{r, message=TRUE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 3b**. The central panel of Figure 3."} plt3b <- superbPlot(dataFigure3, BSFactors = "grp", variables = "VD", adjustments = list(purpose = "difference", samplingDesign = "CRS"), #new plotStyle = "line", clusterColumn = "cluster" ) #new plt3b <- plt3b + ornateCRS + labs(subtitle="Cluster and difference-\nadjusted 95% CI") plt3b ``` An inspection of the distribution does not make the cluster structure evident and therefore a raincloud plot is maybe little informative in the context of cluster randomized sampling... ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=8, fig.height=4, fig.cap="**Figure 3c**. The right panel of Figure 3."} plt3c <- superbPlot(dataFigure3, BSFactors = "grp", variables = "VD", adjustments = list(purpose = "difference", samplingDesign = "CRS"), plotStyle = "raincloud", violinParams = list(fill = "green", alpha = 0.2), clusterColumn = "cluster" ) plt3c <- plt3c + ornateCRS + labs(subtitle="Cluster and difference-\nadjusted 95% CI") ``` Here is the complete Figure 3: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=8, fig.height=4, fig.cap="**Figure 3**. The complete Figure 3."} grid.arrange(plt3a, plt3b, plt3c, ncol=3) ``` This figure is saved as before with ```{r, message=FALSE, echo=TRUE, eval=FALSE} png(filename = "Figure3.png", width = 640, height = 320) grid.arrange(plt3a, plt3b, plt3c, ncol=3) dev.off() ``` To make the correct *t* test in the present case, we need a correction factor called $\lambda$. An easy way is the following [see @cl16 for more] ```{r, message=FALSE, echo=TRUE, eval=TRUE} res <- t.test( dataFigure3$VD[dataFigure3$grp==1], dataFigure3$VD[dataFigure3$grp==2], ) # mean ICCs per group, as given by superbPlot micc <- mean(c(0.491335, 0.203857)) # lambda from five clusters of 5 participants each lambda <- CousineauLaurencelleLambda(c(micc, 5, 5, 5, 5, 5, 5)) tcorrected <- res$statistic / lambda pcorrected <- 1 - pt(tcorrected, 4) cat(paste("t-test corrected for cluster-randomized sampling: t(", 2*(dim(dataFigure3)[1]-2),") = ", round(tcorrected, 3), ", p = ", round(pcorrected, 3),"\n", sep= "")) ``` As seen, the proper test is returning a coherent decision with the proper error bars. # Making Figure 4 Figure 4 is an illustration of the impact of sampling among a finite population. ```{r, message=FALSE, echo=TRUE, eval=TRUE} ornateBS <- list( xlab(""), ylab("Metabolic score"), scale_x_discrete(labels=c("Response to treatment")), #new! coord_cartesian( ylim = c(75,125) ), geom_hline(yintercept = 100, colour = "black", linewidth = 0.5, linetype=2), theme_light(base_size = 10) + theme( plot.subtitle = element_text(size=12)) ) ``` Lets do Figure 4 (see below for each plot in a single figure). ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 4a**. The left panel of Figure 4."} plt4a <- superbPlot(dataFigure4, BSFactors = "group", variables = "score", adjustments=list(purpose = "single", popSize = Inf), plotStyle="line" ) plt4a <- plt4a + ornateBS + labs(subtitle="Stand-alone\n95% CI") ``` The option ``popSize = Inf`` is the default; it indicates that the population is presumed of infinite size. A finite size can be given, as ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 4b**. The central panel of Figure 3b."} plt4b <- superbPlot(dataFigure4, BSFactors = "group", variables = "score", adjustments=list(purpose = "single", popSize = 50 ), # new! plotStyle="line" ) plt4b <- plt4b + ornateBS + labs(subtitle="Population size-\nadjusted 95% CI") ``` We illustrate the plot along some distribution information with a violin plot: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=3, fig.height=4, fig.cap="**Figure 4c**. The right panel of Figure 3b."} plt4c <- superbPlot(dataFigure4, BSFactors = "group", variables = "score", adjustments=list(purpose = "single", popSize = 50 ), # new! plotStyle="pointjitterviolin", violinParams = list(fill = "green", alpha = 0.2) ) plt4c <- plt4c + ornateBS + labs(subtitle="Population size-\nadjusted 95% CI") ``` Which are reunited as usual: ```{r, message=FALSE, echo=TRUE, eval=TRUE, fig.width=9, fig.height=4, fig.cap="**Figure 4**. The complete Figure 4."} plt4 <- grid.arrange(plt4a, plt4b, plt4c, ncol=3) ``` ...and saved with: ```{r, message=FALSE, echo=TRUE, eval=FALSE} png(filename = "Figure4.png", width = 640, height = 320) grid.arrange(plt4a, plt4b, plt4c, ncol=3) dev.off() ``` The corrected *t* test, performed by adjusting for the proportion of the population examined [see @t12], confirms the presence of a significant difference: ```{r, message=FALSE, echo=TRUE, eval=TRUE} res <- t.test(dataFigure4$score, mu=100) tcorrected <- res$statistic /sqrt(1-nrow(dataFigure4) / 50) pcorrected <- 1-pt(tcorrected, 24) cat(paste("t-test corrected for finite-population size: t(", nrow(dataFigure4)-1,") = ", round(tcorrected, 3), ", p = ", round(pcorrected, 3),"\n", sep= "")) ``` # References