goodpractice

Hannah Frick

2024-06-03

What’s it for?

Building an R package is a great way of encapsulating code, documentation and data in a single testable and easily distributable unit.

For a package to be distributed via CRAN, it needs to pass a set of checks implemented in R CMD check, such as: Is there minimal documentation, e.g., are all arguments of exported functions documented? Are all dependencies declared?

These checks are helpful in developing a solid R package but they don’t check for several other good practices. For example, a package does not need to contain any tests but is it good practice to include such. Following a coding standard helps readability. Avoiding overly complex functions reduces the risk of bugs. Including an URL for bug reports lets people more easily report bugs if they find any.

Tools for automatically checking several of these aspects already exist and the goodpractice package bundles the checks from rcmdcheck with code coverage through the covr package, source code linting via the lintr package and cyclomatic complexity via the cyclocomp package and augments it with some further checks on good practice for R package development such as avoiding T and F in favour of TRUE and FALSE. It provides advice on which practices to follow and which to avoid.

You can use goodpractice checks as a reminder for you and your colleagues - and if you have custom checks to run, you can make goodpractice run those as well! Please see the vignette “Custom Checks” for more details.

Good practice out of the box

Main function

The main function is goodpractice() and has an alias gp() which takes the path to the source code of a package as its first argument. The goodpractice package contains the source for a simple package which violates some good practices. We’ll use this for the examples.

library(goodpractice)

# get path to example package
pkg_path <- system.file("bad1", package = "goodpractice")

# run gp() on it
g <- gp(pkg_path)
#> Preparing: covr
#> Warning in MYPREPS[[prep]](state, quiet = quiet): Prep step for test coverage
#> failed.
#> Preparing: cyclocomp
#> ── R CMD build ─────────────────────────────────────────────────────────────────
#> * checking for file ‘/tmp/RtmpiNjOgY/remotes55a95662a77f/badpackage/DESCRIPTION’ ... OK
#> * preparing ‘badpackage’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking vignette meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * building ‘badpackage_1.0.0.tar.gz’
#> Preparing: description
#> Preparing: lintr
#> Preparing: namespace
#> Preparing: rcmdcheck

# show the result
g
#> ── GP badpackage ───────────────────────────────────────────────────────────────
#> 
#> It is good practice to
#> 
#>   ✖ not use "Depends" in DESCRIPTION, as it can cause name clashes, and
#>     poor interaction with other packages. Use "Imports" instead.
#>   ✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid
#>     quite often. A build date will be added to the package when you
#>     perform `R CMD build` on it.
#>   ✖ add a "URL" field to DESCRIPTION. It helps users find information
#>     about your package online. If your package does not have a
#>     homepage, add an URL to GitHub, or the CRAN package package page.
#>   ✖ add a "BugReports" field to DESCRIPTION, and point it to a bug
#>     tracker. Many online code hosting services provide bug trackers for
#>     free, https://github.com, https://gitlab.com, etc.
#>   ✖ omit trailing semicolons from code lines. They are not needed and
#>     most R coding standards forbid them
#> 
#>     R/semicolons.R:4:30
#>     R/semicolons.R:5:29
#>     R/semicolons.R:9:38
#> 
#>   ✖ not import packages as a whole, as this can cause name clashes
#>     between the imported packages. Instead, import only the specific
#>     functions you need.
#>   ✖ fix this R CMD check ERROR: VignetteBuilder package not declared:
#>     ‘knitr’ See section ‘The DESCRIPTION file’ in the ‘Writing R
#>     Extensions’ manual.
#>   ✖ avoid 'T' and 'F', as they are just variables which are set to the
#>     logicals 'TRUE' and 'FALSE' by default, but are not reserved words
#>     and hence can be overwritten by the user.  Hence, one should always
#>     use 'TRUE' and 'FALSE' for the logicals.
#> 
#>     R/tf.R:NA:NA
#>     R/tf.R:NA:NA
#>     R/tf.R:NA:NA
#>     R/tf.R:NA:NA
#>     R/tf.R:NA:NA
#>     ... and 4 more lines
#> 
#> ────────────────────────────────────────────────────────────────────────────────

So with this package, we’ve done a few things in the DESCRIPTION file for which there are reasons not to do them, have unnecessary trailing semicolons in the code and used T and F for TRUE and FALSE. The output of gp() tells you what you did that isn’t considered good practice and if it’s in the R code, it points you the location of your faux-pas. In general, the messages are supposed to not only point out to you what you might want to avoid but also why.

The above example tries to run all 230 checks available, to see the full list use all_checks(). If you only want to run a subset of the checks, e.g., the one on the URL field in the DESCRIPTION, you can specify the checks by name:

# what is the name of the check?
grep("url", all_checks(), value = TRUE)
#> [1] "description_url"

# run only this check
g_url <- gp(pkg_path, checks = "description_url")
#> Preparing: description

g_url
#> ── GP badpackage ───────────────────────────────────────────────────────────────
#> 
#> It is good practice to
#> 
#>   ✖ add a "URL" field to DESCRIPTION. It helps users find information
#>     about your package online. If your package does not have a
#>     homepage, add an URL to GitHub, or the CRAN package package page.
#> ────────────────────────────────────────────────────────────────────────────────

Doing more than just printing

Apart from printing a goodPractice object as returned by gp() to access the advice, you can also access which checks were carried out and which of those failed:

# which checks were carried out?
checks(g_url)
#> [1] "description_url"

# which checks failed?
failed_checks(g)
#> [1] "no_description_depends"                
#> [2] "no_description_date"                   
#> [3] "description_url"                       
#> [4] "description_bugreports"                
#> [5] "lintr_trailing_semicolon_linter"       
#> [6] "no_import_package_as_a_whole"          
#> [7] "rcmdcheck_package_dependencies_present"
#> [8] "truefalse_not_tf"

To access all the checks carried out with their results in a data frame, use results() on your goodPractice object.

# show the first 5 checks carried out and their results
results(g)[1:5,]
#>                    check result
#> 1                   covr     NA
#> 2              cyclocomp   TRUE
#> 3 no_description_depends  FALSE
#> 4    no_description_date  FALSE
#> 5        description_url  FALSE

Note that the code coverage could not be calculated. The corresponding check does not show up in the failed checks (because it was not carried out) and the result is NA. It is also possible to export the results to a JSON file with export_json().