--- title: "Select variables" author: "Dan Chaltiel" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: keep_md: yes vignette: > %\VignetteIndexEntry{Select variables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning=FALSE, message=FALSE ) old = options(width = 100) ``` This vignette will review the different ways of selecting variables to describe with `crosstable`. For more general informations about `crosstable`, see `vignette("crosstable")` ([link](crosstable.html)). For more information and tips on `tidyselect`, see [tidyselect syntax](https://tidyselect.r-lib.org/articles/syntax.html) ## Whole table The simplest case is when you want to describe the whole table, as you need no further argument. If you really want to be more explicit, you also can use `tidyselect::everything()`. All `tidyselect` helpers are re-exported by `dplyr` so we only want to load this latter package. Here are the 10 first lines of the `iris2` dataset: ```{r crosstable-bare} library(crosstable) ct = crosstable(iris2, everything()) #or simply `crosstable(iris2)` ct %>% as_flextable(keep_id=TRUE) ``` ## Select by column name ### Name Just like with `dplyr::select()`, you can use names with or without quotes to select variables you want to describe. Use `c()` to select several columns: ```{r crosstable-names} crosstable(mtcars2, c(mpg, "qsec"), by=vs) %>% as_flextable(keep_id=TRUE) ``` ### External vector However, it is better to use `all_of` or `any_of` when you take your column names from an external vector. Otherwise, there would be an ambiguity as you might have wanted to select a column named like this vector. ```{r crosstable-external} qsec = c("mpg", "cyl") #wouldn't that be the most evil variable name ever? crosstable(mtcars2, any_of(qsec), by=vs) %>% as_flextable(keep_id=TRUE) ``` ### Negation You can use negation to keep all but some columns: ```{r crosstable-negation} crosstable(mtcars2, c(-mpg, -cyl, -1), by=vs) %>% head(8) %>% #-c(mpg, cyl, 1) would also work as_flextable(keep_id=TRUE) ``` ### Indice This can be useful sometimes, for instance when you want to quickly describe the 3 first columns. ```{r crosstable-indice} crosstable(mtcars2, 2:4, by=vs) %>% as_flextable(keep_id=TRUE) ``` You can also use negation (`-(1:3)`), concatenation (`c(1,2,3)`), or both (`crosstable(mtcars2, 1:4, -2, by=vs)`). ## Select with `tidyselect` helpers Along with `everything()`, `tidyselect` provides a large choice of helpers. You can browse `?tidyselect::select_helpers` for a complete list. Note that all have the useful `ignore.case` argument which is often very convenient. The main ones are re-exported by crosstable: `starts_with()`, `ends_with()`, `contains()` and `matches()`. Here are some examples: ```{r crosstable-helpers1} crosstable(mtcars2, starts_with("d")) %>% as_flextable(keep_id=TRUE) ``` ```{r crosstable-helpers2} crosstable(mtcars2, c(ends_with("g"), contains("yl"))) %>% as_flextable(keep_id=TRUE) ``` ```{r crosstable-helpers3} #to all regex haters: the following call selects all columns which name #starts with "d" or "g", followed by exactly 3 characters crosstable(mtcars2, matches("^d|g.{3}$")) %>% as_flextable(keep_id=TRUE) ``` ## Select with predicate functions Sometimes, you want to select columns if they meet a set of specifications, for instance of type or of value. You can then use predicate functions: functions that return a single logical value. If the function is named, it is a good practice to wrap it in `where()`. For instance, you might want to keep only `character` variables: ```{r crosstable-functions1} crosstable(mtcars2, c(where(is.character), where(is.factor), -model)) %>% as_flextable(keep_id=TRUE) ``` Using anonymous functions, you can even use more complicated patterns. For instance, you might want only numeric variables which mean is higher than 100: ```{r crosstable-functions2} crosstable(mtcars2, where(function(x) is.numeric(x) && mean(x)>100)) %>% as_flextable(keep_id=TRUE) ``` Of note, crosstable support lambda-functions, so you could instead write `crosstable(mtcars2, where(~is.numeric(.x) && mean(.x)>100))` for the exact same result but a tidier code. The only logical constraint is that the function in `where()` should return a single logical value. Use `&&`, `||`, and parenthesis to combine functions in complex patterns. ## Select with a formula If you want to mutate some variables in real-time, you can use the formula interface. The left-hand-side are the variables to describe, while the right-hand-side is the `by` variable (which can be set to `NULL`, `0` or `1` for "no variable"). ```{r crosstable-formula1} crosstable(mtcars2, mpg+cyl ~ vs) %>% as_flextable(keep_id=TRUE) ``` This permits very complex and interesting patterns, using functions *in situ* and operations using with the `I` function. Labels are inherited and make little sense though. ```{r crosstable-formula2} crosstable(mtcars2, sqrt(mpg) + I(qsec^2) ~ ifelse(mpg>20,"mpg>20","mpg<20"), label=FALSE) %>% as_flextable() ``` Note that you cannot use `tidyselect` helpers in formulas and that you cannot use formula declared in an object. ## Ultimate example Lets play a little with all of this :-) I want all numeric variables that do not start by "d" or "w", but I still want `drat` in the end. ```{r crosstable-ultimate1} crosstable(mtcars2, c(where(is.numeric), -matches("^d|w"), drat), label=FALSE) %>% as_flextable() ``` ```{r, include = FALSE} options(old) ```