--- title: "Compute custom proportions with `stat_prop()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Compute custom proportions with `stat_prop()`} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(ggstats) library(ggplot2) ``` `stat_prop()` is a variation of `ggplot2::stat_count()` allowing to compute custom proportions according to the **by** aesthetic defining the denominator (i.e. all proportions for a same value of **by** will sum to 1). The **by** aesthetic should be a factor. Therefore, `stat_prop()` requires the **by** aesthetic and this **by** aesthetic should be a factor. ## Adding labels on a percent stacked bar plot When using `position = "fill"` with `geom_bar()`, you can produce a percent stacked bar plot. However, the proportions corresponding to the **y** axis are not directly accessible using only `ggplot2`. With `stat_prop()`, you can easily add them on the plot. In the following example, we indicated `stat = "prop"` to `ggplot2::geom_text()` to use `stat_prop()`, we defined the **by** aesthetic (here we want to compute the proportions separately for each value of **x**), and we also used `ggplot2::position_fill()` when calling `ggplot2::geom_text()`. ```{r} d <- as.data.frame(Titanic) p <- ggplot(d) + aes(x = Class, fill = Survived, weight = Freq, by = Class) + geom_bar(position = "fill") + geom_text(stat = "prop", position = position_fill(.5)) p ``` Note that `stat_prop()` has properly taken into account the **weight** aesthetic. `stat_prop()` is also compatible with faceting. In that case, proportions are computed separately in each facet. ```{r} p + facet_grid(cols = vars(Sex)) ``` ## Displaying proportions of the total If you want to display proportions of the total, simply map the **by** aesthetic to `1`. Here an example using a stacked bar chart. ```{r} ggplot(d) + aes(x = Class, fill = Survived, weight = Freq, by = 1) + geom_bar() + geom_text( aes(label = scales::percent(after_stat(prop), accuracy = 1)), stat = "prop", position = position_stack(.5) ) ``` ## A dodged bar plot to compare two distributions A dodged bar plot could be used to compare two distributions. ```{r} ggplot(d) + aes(x = Class, fill = Sex, weight = Freq, by = Sex) + geom_bar(position = "dodge") ``` On the previous graph, it is difficult to see if first class is over- or under-represented among women, due to the fact they were much more men on the boat. `stat_prop()` could be used to adjust the graph by displaying instead the proportion within each category (i.e. here the proportion by sex). ```{r} ggplot(d) + aes(x = Class, fill = Sex, weight = Freq, by = Sex, y = after_stat(prop)) + geom_bar(stat = "prop", position = "dodge") + scale_y_continuous(labels = scales::percent) ``` The same example with labels: ```{r} ggplot(d) + aes(x = Class, fill = Sex, weight = Freq, by = Sex, y = after_stat(prop)) + geom_bar(stat = "prop", position = "dodge") + scale_y_continuous(labels = scales::percent) + geom_text( mapping = aes( label = scales::percent(after_stat(prop), accuracy = .1), y = after_stat(0.01) ), vjust = "bottom", position = position_dodge(.9), stat = "prop" ) ``` ## Displaying unobserved levels With the `complete` argument, it is possible to indicate an aesthetic for those statistics should be completed for unobserved values. ```{r} d <- diamonds |> dplyr::filter(!(cut == "Ideal" & clarity == "I1")) |> dplyr::filter(!(cut == "Very Good" & clarity == "VS2")) |> dplyr::filter(!(cut == "Premium" & clarity == "IF")) p <- ggplot(d) + aes(x = clarity, fill = cut, by = clarity) + geom_bar(position = "fill") p + geom_text( stat = "prop", position = position_fill(.5) ) ``` Adding `complete = "fill"` will generate "0.0%" labels where relevant. ```{r} p + geom_text( stat = "prop", position = position_fill(.5), complete = "fill" ) ``` ## Using `geom_prop_bar()` and `geom_prop_text()` The dedicated geometries `geom_prop_bar()` and `geom_prop_text()` could be used for quick and easy proportional bar plots. They use by default `stat_prop()` with relevant default values. For example, proportions are computed by **x** or **y** if the `by` aesthetic is not specified. It allows to generate a quick proportional bar plot. ```{r} ggplot(diamonds) + aes(y = clarity, fill = cut) + geom_prop_bar() + geom_prop_text() ``` You can specify a `by` aesthetic. For example, to reproduce the comparison of the two distributions presented earlier. ```{r} d <- as.data.frame(Titanic) ggplot(d) + aes(x = Class, fill = Sex, weight = Freq, by = Sex) + geom_prop_bar(position = "dodge") + geom_prop_text( position = position_dodge(width = .9), vjust = - 0.5 ) + scale_y_continuous(labels = scales::percent) ``` You can also display counts instead of proportions. ```{r} ggplot(diamonds) + aes(x = clarity, fill = cut) + geom_prop_bar(height = "count") + geom_prop_text( height = "count", labels = "count", labeller = scales::number ) ```