--- title: "Compute cross-tabulation statistics with `stat_cross()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Compute cross-tabulation statistics with `stat_cross()`} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(ggstats) library(ggplot2) ``` This statistic is intended to be used with two discrete variables mapped to **x** and **y** aesthetics. It will compute several statistics of a cross-tabulated table using `broom::tidy.test()` and `stats::chisq.test()`. More precisely, the computed variables are: - **observed**: number of observations in x,y - **prop**: proportion of total - **row.prop**: row proportion - **col.prop**: column proportion - **expected**: expected count under the null hypothesis - **resid**: Pearson's residual - **std.resid**: standardized residual - **row.observed**: total number of observations within row - **col.observed**: total number of observations within column - **total.observed**: total number of observations within the table - **phi**: phi coefficients, see `augment_chisq_add_phi()` By default, `stat_cross()` is using `ggplot2::geom_points()`. If you want to plot the number of observations, you need to map `after_stat(observed)` to an aesthetic (here **size**): ```{r} d <- as.data.frame(Titanic) ggplot(d) + aes(x = Class, y = Survived, weight = Freq, size = after_stat(observed)) + stat_cross() + scale_size_area(max_size = 20) ``` Note that the **weight** aesthetic is taken into account by `stat_cross()`. We can go further using a custom shape and filling points with standardized residual to identify visually cells who are over- or underrepresented. ```{r fig.height=6, fig.width=6} ggplot(d) + aes( x = Class, y = Survived, weight = Freq, size = after_stat(observed), fill = after_stat(std.resid) ) + stat_cross(shape = 22) + scale_fill_steps2(breaks = c(-3, -2, 2, 3), show.limits = TRUE) + scale_size_area(max_size = 20) ``` We can easily recreate a cross-tabulated table. ```{r} ggplot(d) + aes(x = Class, y = Survived, weight = Freq) + geom_tile(fill = "white", colour = "black") + geom_text(stat = "cross", mapping = aes(label = after_stat(observed))) + theme_minimal() ``` Even more complicated, we want to produce a table showing column proportions and where cells are filled with standardized residuals. Note that `stat_cross()` could be used with facets. In that case, computation is done separately in each facet. ```{r} ggplot(d) + aes( x = Class, y = Survived, weight = Freq, label = scales::percent(after_stat(col.prop), accuracy = .1), fill = after_stat(std.resid) ) + stat_cross(shape = 22, size = 30) + geom_text(stat = "cross") + scale_fill_steps2(breaks = c(-3, -2, 2, 3), show.limits = TRUE) + facet_grid(rows = vars(Sex)) + labs(fill = "Standardized residuals") + theme_minimal() ```