This vignette is an excerpt from the DANTE Project’s beta release of Open, Reproducible, and Distributable Research with R Packages. To view the entire current release, please visit the bookdown site. If you would like to contribute to this bookdown project, please visit the project GitLab repository.
Default Package Files
RStudio leaves you with a handful of files and directories after creating a new package. We’ll review the new files and create some additional commonly used directories.
.RbuildignoreThe build ignore file is where you list files you do not want to be bundled up with your package, but are inside the package root directory because they are used for package development. These may include images, notes, files that are used for pre-processing of larger embedded datasets, or any other file that is non-essential to the final package. By default the RStudio project files are listed in
.Rbuildignore. Build ignore uses regular expression syntax, but if you’re not comfortable with regular expression you can use
DESCRIPTIONThe package description file contains basic information about your package. By default it’s fairly simple. The only mandatory fields are
Maintainer, however, it would be very rare to not have an
man/directory contains automatically generated manual and reference materials by
roxygen2. You do not have to edit this directory. It will populate every time you
Install and Restartyour package as long as you followed the steps above to
Configure Build Tools....
NAMESPACEfile is also automatically generated by
Install and Restart. It’s not something to cloud your mind with as a beginner; more information is available here.
R/directory contains all of your functions. By default it will contain the
hello.Rfile for the
hello()function. This directory should only have function files, and a
data.Rfile that we will discuss later. Current best practices are for each function to be in a single file named after the function, but you may also place multiple functions in a single file.
- The final default file is the RStudio project file (
myresearch.Rproj). You can execute this file from anywhere to open up an RStudio session for your package project.
DESCRIPTION merits additional discussion as one of the primary package files you edit directly. We can address important fields in more detail:
Title:is slightly more explanatory title to your project beyond the package name.
Version:is not terribly important in this context. I usually leave it at the default. You can read more about R package versioning here.
Authors:is self explanatory and may be written in plain text, however, it’s strongly suggested that you replace this with the
Authors@R:field. This sets the authors and roles in a more programmatic way and establishes emails and roles (author
"ctb", copyright holder
Authors@R: person("Joshua", "Brinks", email = "firstname.lastname@example.org", role = c("aut", "cre"))
Maintainer:is the package maintainer. Typically the same as the author. Written in plain text followed by the email address:
Joshua Brinks <email@example.com>.
Description:is a comprehensive description of your package functions. I usually include a few sentences for context and functionality.
License:is the operating license of your package determines the legality of how and whom may use your package. Being this is an article on open science we strongly recommend using a Free or Open Source Software Licence (FOSS) when possible, however, there are several contexts where this simply doesn’t work. There is lots of discussion regarding comparative software licenses on the internet. I suggest you acquire a greater understanding. When possible I implement a GPL3 open source license with
Encoding:determines your package encoding. Usually a good idea to leave this
LazyData:determines how the data you embed in your package is loaded when your package is loaded. It’s best to leave this set to
true. This ensures that data embedded in your package is only loaded into memory when you call on the dataset. Otherwise any large datasets will use up memory as soon as your package is loaded.
RoxygenNote:specifies the version of
roxygen2being used to manage your package documentation. It will be updated automatically.
These are other common fields.
URL:Any appropriate package or personal website. I usually list the Git
Imports:is a list of packages that your package depends on to carry out its core functions found in the
R/directory. If you have a function in the
R/directory that uses
ggplot2::geom_point(), these packages must be listed in the
Imports:. This ensures that when your package is installed additional dependencies are also installed. Syntax for the
Imports: data.table, dplyr, ggplot2
Suggests:is similar to
Imports:but for packages that are used in your vignettes, but not listed as part of your core
Imports:. These are typically packages used for your vignettes (
leaflet), but you may also have a package you use for a rare function in the
R/directory that you don’t want to automatically load as a courtesy for your users.
Remotes:is used to specify packages your package depends on that are not released on CRAN but are available on GitHub or GitLab. The syntax is
Remotes: gitlab::dante-sttr/commonCodes, gitlab::dante-sttr/untools
The simplest way to add a package dependency is with
usethis, although I typically edit the
DESCRIPTION file directly.
Here is an example of a completed
DESCRIPTION from the
When importing either the
tidyverse packages you must accommodate their special operators and naming conventions (
data.table doesn’t need quoted variables in functions) that are not part of base R programming. For
tidyverse this refers to the
%>% (pipe) operator that comes from the
data.table implements several additional operators including
c(.N, .I, ':='). If these operators are not addressed your package will kickback warnings and errors when executing build checks.
usethis has functions to assist setting these up.
These functions will adjust your imports section. Additionally, they will both create non function files in your
R/ directory (
utils-data-table.R file needs an addendum to handle the special operators. The base file created is:
# data.table is generally careful to minimize the scope for namespace # conflicts (i.e., functions with the same name as in other packages); # a more conservative approach using @importFrom should be careful to # import any needed data.table special symbols as well, e.g., if you # run DT[ , .N, by='grp'] in your package, you'll need to add # @importFrom data.table .N to prevent the NOTE from R CMD check. # See ?data.table::`special-symbols` for the list of such symbols # data.table defines; see the 'Importing data.table' vignette for more # advice (vignette('datatable-importing', 'data.table')). # #' @import data.table NULL
As stated you must add the additional line for their operators. I would add the most common.
# data.table is generally careful to minimize the scope for namespace # conflicts (i.e., functions with the same name as in other packages); # a more conservative approach using @importFrom should be careful to # import any needed data.table special symbols as well, e.g., if you # run DT[ , .N, by='grp'] in your package, you'll need to add # @importFrom data.table .N to prevent the NOTE from R CMD check. # See ?data.table::`special-symbols` for the list of such symbols # data.table defines; see the 'Importing data.table' vignette for more # advice (vignette('datatable-importing', 'data.table')). # #' @import data.table #' @importFrom(data.table, .N, .I, ':=') NULL
There are additional directories that are both common constructs in the R community and helpful for research specific workflows. These include
inst/. Click on
New Folder in the RStudio Files window to create these directories.
/raw-data/folder is where you place scripts used to import, pre-process, and embed datasets into your package. This will be explained in greater detail later.
/raw-scripts/directory is where you keep standard scripts with notes as you work out your workflow and code you will eventually wrap up and document in a function. This directory is less common and the naming is not widely accepted, however, it’s good practice to keep rough drafts of the code prior to wrapping it up into a function.
/inst/folder contains additional files vital to your package that are not scripts, vignettes, or can not be directly embedded as
.RDatafiles. These files will be installed along with the package when someone else installs your package. Therefore, some consideration should be given to including massive amounts of data or otherwise potentially harmful or sensitive scripts and data. These may be complex copyright or licensing agreements that can not be captured by the
DESCRIPTION, external and unprocessed data, the package citation, and code from other languages. When your package is installed, everything in the
/inst/folder will be moved up to the root level. This is somewhat confusing at first. For example, while working directly on your package you may have:
When your package is installed locally or on another computer these files are accessible at:
We will discuss how to programmatically access
/inst/ data in the embedded data section.
At this time you may also create the
vignettes/ folders, but
usethis will do this automatically with functions specifically designed to embed data and create vignettes.