--- title: "Advanced functionalities" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Advanced functionalities} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(joyn) library(data.table) x <- data.table(id = c(1, 4, 2, 3, NA), t = c(1L, 2L, 1L, 2L, NA), country = c(16, 12, 3, NA, 15)) y <- data.table(id = c(1, 2, 5, 6, 3), gdp = c(11L, 15L, 20L, 13L, 10L), country = 16:20) ``` ## Advanced use This vignette will let you explore some additional features available in `joyn`, through an example use case. Suppose you want to join tables `x` and `y`, where the variable *country* is available in both. You could do one of five things: ### 1. Use variable *country* as one of the key variables If you don't use the argument `by`, `joyn` will consider *country* and *id* as key variables by default given that they are common between `x` and `y`. ```{r ex1} # The variables with the same name, `id` and `country`, are used as key # variables. joyn(x = x, y = y) ``` Alternatively, you can specify to join by *country* ```{r ex2} # Joining by country joyn(x = x, y = y, by = "country") ``` ### 2. Ignore the values of *country* from `y` and don't bring it into the resulting table This the default if you did not include *country* as part of the key variables in argument `by`. ```{r} joyn(x = x, y = y, by = "id") ``` ### 3. Update only NAs in table x Another possibility is to make use of the `update_NAs` argument of `joyn()`. This allows you to update the NAs values in variable *country* in table `x` with the actual values of the matching observations in *country* from table y. In this case, actual values in *country* from table x will remain unchanged. ```{r ex3} joyn(x = x, y = y, by = "id", update_NAs = TRUE) ``` ### 4. Update actual values in table x You can also update all the values - both NAs and actual - in variable *country* of table `x` with the actual values of the matching observations in *country* from `y`. This is done by setting `update_values = TRUE`. Notice that the `reportvar` allows you keep track of how the update worked. In this case, *value update* means that only the values that are different between *country* from `x` and *country* from `y` are updated. However, let's consider other possible cases: - If, for the same matching observations, the values between the two *country* variables were the same, the reporting variable would report *x & y* instead (so you know that there is no update to make). - if there are NAs in *country* from `y`, the actual values in `x` will be unchanged, and you would see a *not updated* status in the reporting variable. Nevertheless, notice there is another way for you to bring *country* from `y` to `x`. This is done through the argument `keep_y_in_x` (*see 5. below* ⬇️) ```{r ex4} # Notice that only the value that are joyn(x = x, y = y, by = "id", update_values = TRUE) ``` ### 5. Keep original *country* variable from y into returning table #### (Keep matching-names variable from y into x -not updating values in x) Another available option is that of bringing the original variable *country* from `y` into the resulting table, without using it to update the values in `x`. In order to distinguish *country* from `x` and *country* from `y`, `joyn` will assign a suffix to the variable's name: so that you will get *country.y* and *country.x*. All of this can be done specifying `keep_common_vars = TRUE.` ```{r ex5} joyn(x = x, y = y, by = "id", keep_common_vars = TRUE) ``` ### Bring other variables from y into returning table In `joyn` , you can also bring non common variables from `y` into the resulting table. In fact you can specify them in `y_vars_to_keep`, as shown in the example below: ```{r ex6} # Keeping variable gdp joyn(x = x, y = y, by = "id", y_vars_to_keep = "gdp") ``` Notice that if you set `y_vars_to_keep = FALSE` or `y_vars_to_keep = NULL`, then `joyn` won't bring any variable into the returning table.