---
title: "Plot Likert-type items with `gglikert()`"
author: Joseph Larmarange
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Plot Likert-type items with `gglikert()`}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(ggstats)
library(dplyr)
library(ggplot2)
```

The purpose of `gglikert()` is to generate a centered bar plot comparing the answers of several questions sharing a common Likert-type scale.

## Generating an example dataset

```{r}
likert_levels <- c(
  "Strongly disagree",
  "Disagree",
  "Neither agree nor disagree",
  "Agree",
  "Strongly agree"
)
set.seed(42)
df <-
  tibble(
    q1 = sample(likert_levels, 150, replace = TRUE),
    q2 = sample(likert_levels, 150, replace = TRUE, prob = 5:1),
    q3 = sample(likert_levels, 150, replace = TRUE, prob = 1:5),
    q4 = sample(likert_levels, 150, replace = TRUE, prob = 1:5),
    q5 = sample(c(likert_levels, NA), 150, replace = TRUE),
    q6 = sample(likert_levels, 150, replace = TRUE, prob = c(1, 0, 1, 1, 0))
  ) |>
  mutate(across(everything(), ~ factor(.x, levels = likert_levels)))

likert_levels_dk <- c(
  "Strongly disagree",
  "Disagree",
  "Neither agree nor disagree",
  "Agree",
  "Strongly agree",
  "Don't know"
)
df_dk <-
  tibble(
    q1 = sample(likert_levels_dk, 150, replace = TRUE),
    q2 = sample(likert_levels_dk, 150, replace = TRUE, prob = 6:1),
    q3 = sample(likert_levels_dk, 150, replace = TRUE, prob = 1:6),
    q4 = sample(likert_levels_dk, 150, replace = TRUE, prob = 1:6),
    q5 = sample(c(likert_levels_dk, NA), 150, replace = TRUE),
    q6 = sample(
      likert_levels_dk, 150,
      replace = TRUE, prob = c(1, 0, 1, 1, 0, 1)
    )
  ) |>
  mutate(across(everything(), ~ factor(.x, levels = likert_levels_dk)))
```

## Quick plot

Simply call `gglikert()`.

```{r}
gglikert(df)
```

The list of variables to plot (all by default) could by specify with `include`. This argument accepts tidy-select syntax.

```{r}
gglikert(df, include = q1:q3)
```

## Customizing the plot

The generated plot is a standard `ggplot2` object. You can therefore use `ggplot2` functions to custom many aspects.

```{r}
gglikert(df) +
  ggtitle("A Likert-type items plot", subtitle = "generated with gglikert()") +
  scale_fill_brewer(palette = "RdYlBu")
```

### Sorting the questions

You can sort the plot with `sort`.

```{r}
gglikert(df, sort = "ascending")
```

By default, the plot is sorted based on the proportion being higher than the center level, i.e. in this case the proportion of answers equal to "Agree" or "Strongly Agree". Alternatively, the questions could be transformed into a score and sorted accorded to their mean.

```{r}
gglikert(df, sort = "ascending", sort_method = "mean")
```

### Sorting the answers

You can reverse the order of the answers with `reverse_likert`.

```{r}
gglikert(df, reverse_likert = TRUE)
```

### Proportion labels

Proportion labels could be removed with `add_labels = FALSE`.

```{r}
gglikert(df, add_labels = FALSE)
```

or customized.

```{r}
gglikert(
  df,
  labels_size = 3,
  labels_accuracy = .1,
  labels_hide_below = .2,
  labels_color = "white"
)
```

### Totals on each side

By default, totals are added on each side of the plot. In case of an uneven number of answer levels, the central level is not taken into account for computing totals. With `totals_include_center = TRUE`, half of the proportion of the central level will be added on each side.

```{r}
gglikert(
  df,
  totals_include_center = TRUE,
  sort = "descending",
  sort_prop_include_center = TRUE
)
```

Totals could be customized.

```{r}
gglikert(
  df,
  totals_size = 4,
  totals_color = "blue",
  totals_fontface = "italic",
  totals_hjust = .20
)
```

Or removed.

```{r}
gglikert(df, add_totals = FALSE)
```

## Variable labels

If you are using variable labels (see `labelled::set_variable_labels()`), they will be taken automatically into account by `gglikert()`.

```{r}
if (require(labelled)) {
  df <- df |>
    set_variable_labels(
      q1 = "first question",
      q2 = "second question",
      q3 = "this is the third question with a quite long variable label"
    )
}
gglikert(df)
```

You can also provide custom variable labels with `variable_labels`.

```{r}
gglikert(
  df,
  variable_labels = c(
    q1 = "alternative label for the first question",
    q6 = "another custom label"
  )
)
```

You can control how variable labels are wrapped with `y_label_wrap`.

```{r}
gglikert(df, y_label_wrap = 20)
gglikert(df, y_label_wrap = 200)
```

## Custom center

By default, Likert plots will be centered, i.e. displaying the same number of categories on each side on the graph. When the number of categories is odd, half of the "central" category is displayed negatively and half positively.

It is possible to control where to center the graph, using the `cutoff` argument, representing the number of categories to be displayed negatively: `2` to display the two first categories negatively and the others positively; `2.25` to display the two first categories and a quarter of the third negatively.

```{r}
gglikert(df, cutoff = 0)
gglikert(df, cutoff = 1)
gglikert(df, cutoff = 1.25)
gglikert(df, cutoff = 1.75)
gglikert(df, cutoff = 2)
gglikert(df, cutoff = NULL)
gglikert(df, cutoff = 4)
gglikert(df, cutoff = 5)
```

## Symmetric x-axis

Simply specify `symmetric = TRUE`.

```{r}
gglikert(df, cutoff = 1)
gglikert(df, cutoff = 1, symmetric = TRUE)
```


## Removing certain values

Sometimes, the dataset could contain certain values that you should not be displayed.

```{r}
gglikert(df_dk)
```

A first option could be to convert the don't knows into `NA`. In such case, the proportions will be computed on non missing.

```{r}
df_dk |>
  mutate(across(everything(), ~ factor(.x, levels = likert_levels))) |>
  gglikert()
```

Or, you could use `exclude_fill_values` to not display specific values, but still counting them in the denominator for computing proportions.

```{r}
df_dk |> gglikert(exclude_fill_values = "Don't know")
```

## Facets

To define facets, use `facet_rows` and/or `facet_cols`.

```{r message=FALSE}
df_group <- df
df_group$group1 <- sample(c("A", "B"), 150, replace = TRUE)
df_group$group2 <- sample(c("a", "b", "c"), 150, replace = TRUE)

gglikert(df_group,
  q1:q6,
  facet_cols = vars(group1),
  labels_size = 3
)
gglikert(df_group,
  q1:q2,
  facet_rows = vars(group1, group2),
  labels_size = 3
)
gglikert(df_group,
  q3:q6,
  facet_cols = vars(group1),
  facet_rows = vars(group2),
  labels_size = 3
) +
  scale_x_continuous(
    labels = label_percent_abs(),
    expand = expansion(0, .2)
  )
```

To compare answers by subgroup, you can alternatively map `.question` to facets, and define a grouping variable for `y`.

```{r}
gglikert(df_group,
  q1:q4,
  y = "group1",
  facet_rows = vars(.question),
  labels_size = 3,
  facet_label_wrap = 15
)
```

## Stacked plot

For a more classical stacked bar plot, you can use `gglikert_stacked()`.

```{r}
gglikert_stacked(df)

gglikert_stacked(
  df,
  sort = "asc",
  add_median_line = TRUE,
  add_labels = FALSE
)

gglikert_stacked(
  df_group,
  include = q1:q4,
  y = "group2"
) +
  facet_grid(
    rows = vars(.question),
    labeller = label_wrap_gen(15)
  )
```

## Long format dataset

Internally, `gglikert()` is calling `gglikert_data()` to generate a long format dataset combining all questions into two columns, `.question` and `.answer`.

```{r}
gglikert_data(df) |>
  head()
```

Such dataset could be useful for other types of plot, for example for a classic stacked bar plot.

```{r}
ggplot(gglikert_data(df)) +
  aes(y = .question, fill = .answer) +
  geom_bar(position = "fill")
```

## Weighted data

`gglikert()`, `gglikert_stacked()` and `gglikert_data()` accepts a `weights` argument, allowing to specify statistical weights.

```{r}
df$sampling_weights <- runif(nrow(df))
gglikert(df, q1:q4, weights = sampling_weights)
```

## See also

The function `position_likert()` used to center bars.