How to Stop R Projects and Scripts Breaking
A how-to in getting started with Package Management in R
r, packrat, package-management
A how-to in getting started with Package Management in R
The problem: Let's say you coded a project in
R in 2014 using the package
dplyr and you called the project
2014project. Over time packages get updated, sometimes changing syntax and function names, let's say some major syntax changes occurred in 2015 to
dplyr and you used the latest version of
dplyr in other projects in 2015. Now you come to 2016 and you want to re-run the code in the
2014project, but you find a bunch of errors because in 2015 dplyr had those syntax changes, such that the syntax you used in 2014 no longer computes. Now for the solution...
R scripts breaking over time is annoying for anyone, but it's worth highlighting that organisations using proper package management will significantly decrease their costs in the form of maintenance time spent by developers, data scientists and data analysts fixing projects broken due to changes in package dependencies. In contrast to the Python community who seem very familiar with their
virtualenv package management, the R community doesn't seem to have widely adopted the practice. In the data science world, I attribute this difference between the R and Python communities to Python coming out of the computer science discipline and R coming from the statistics discipline. If you are an organisation heavily using R, you have a lot of productivity to gain from the adoption of package management by your R developers.
> readLines(system.file("DESCRIPTION", package = "packrat"))[c(3,9,10)]  "Title: A Dependency Management System for Projects and their R Package"  "Description: Manage the R packages your project depends on in an"  "isolated, portable, and reproducible way."
To get the RStudio niceties/integrations it seems neccessary to treat your project folder as an R Project. I've created a new project
/testpackrat2 and with the latest RStudio you should be able to select both
packrat to be initialised. This runs the
packrat::init() function to enter packrat mode, takes a snapshot of the package dependencies (of which there are currently none) and places the binaries in the project folder under
restore() to apply the latest snapshot to the project folder.
R will restart once this process is finished.
To double check packrat has been initialised properly run:
# Check I am in the right directory > getwd()  "/Users/lukesingham/Projects/testpackrat2" # See if packrat is working > packrat::status() Up to date.
You should also notice the RStudio integration in the packages area, where it will show your project version against that which Packrat has the binary for and the source of that binary:
For test purposes I created
script.R with only one package.
# Create script > system("echo 'library(stringr)' >| script.R") # Confirm project contents > system("ls") packrat script.R testpackrat.Rproj
If I open
script.R in RStudio and try and load the library this is the expected behaviour:
> library(stringr) Error in library(stringr) : there is no package called ‘stringr’
stringr installed on my system, but now I am in an isolated packrat project which is only reading from the project folders list of packages. Now to run the packrat commands to
snapshot() the addition of
stringr to the project and install it ready for use in the project.
> packrat::status() The following packages are referenced in your code, but are not present in your library nor in packrat: stringr You will need to install these packages manually, then use packrat::snapshot() to record these packages in packrat. # Well I do have stringr in my library, anyway... # now to take a snapshot > packrat::snapshot() Adding these packages to packrat: _ magrittr 1.5 stringi 1.1.2 stringr 1.1.0 Fetching sources for magrittr (1.5) ... OK (CRAN current) Fetching sources for stringi (1.1.2) ... OK (CRAN current) Fetching sources for stringr (1.1.0) ... OK (CRAN current) Snapshot written to '/Users/lukesingham/Projects/testpackrat2/packrat/packrat.lock' # Now to install/'restore' those sources to the snapshot just taken > packrat::restore() Installing magrittr (1.5) ... OK (built source) Installing stringi (1.1.2) ... OK (built source) Installing stringr (1.1.0) ... OK (built source)
It's probably worth pointing out what packrat is doing to your library path, particularly when you go to install new packages in your project.
# Turn packrat off > packrat::packrat_mode() Packrat mode off. Resetting library paths to: - "/usr/local/lib/R/3.2/site-library" - "/usr/local/Cellar/r/3.2.3/R.framework/Versions/3.2/Resources/library" # Turn packrat back on > packrat::packrat_mode() Packrat mode on. Using library in directory: - "~/Projects/testpackrat2/packrat/lib"
So as long as you are in
packrat_mode, when you run
install.packages() or remove
remove.packages() it will only be modifying the project
Existing Project with Out-Of-Date Packages
I want to test the scenario where I have existing projects using old versions of packages. To do this I ran:
# Find old packages old.packages() > Installed Built ReposVer lubridate "1.5.6" "3.2.3" "1.6.0" htmlwidgets "0.6" "3.2.3" "0.7"
I then created a folder with these packages, to simulate an old project that I want
packrat to manage.
mkdir testpackratOutofDatePackages echo "library(lubridate); library(htmlwidgets)" >| testpackratOutofDatePackages/oldScript.R
Now go to RStudio and open the existing project
testpackratOutofDatePackages. Click on
File > New Project > Existing Directory
Unfortunately there is not an option to
init packrat as part the process. So...
> packrat::init() Initializing packrat project in directory: - "~/Projects/testpackratOutofDatePackages" ... (edited down the consolte printout) Fetching sources for htmlwidgets (0.6) ... OK (CRAN archived) Fetching sources for lubridate (1.5.6) ... OK (CRAN archived)
Ok great, it fetches the old binaries. Now let's test upgrading these packages and then reverting back to the initial project state.
> install.packages("htmlwidgets") # v0.7 Installing package into ‘/Users/lukesingham/Projects/testpackratOutofDatePackages/packrat/lib/x86_64-apple-darwin15.2.0/3.2.3’ ... * DONE (htmlwidgets)
So it seems packrat auto snapshots the project, such that I couldn't run
restore() and return to htmlwidgets v0.6. Let's try that again without autosnapshot and with my out-of-date
# Turn off auto.snapshot > packrat::set_opts(auto.snapshot=FALSE) # Check status > packrat::status() Up to date. # Now try updating lubridate > install.packages("lubridate") DONE > packrat::status() The following packages are out of sync between packrat and your current library: packrat library lubridate 1.5.6 1.6.0 Use packrat::snapshot() to set packrat to use the current library, or use packrat::restore() to reset the library to the last snapshot. # Return to old version of lubridate > packrat::restore(overwrite.dirty=T) Downgrading these packages in your library: from to lubridate 1.6.0 1.5.6 Do you want to continue? [Y/n]: Y Replacing lubridate (downgrade 1.6.0 to 1.5.6) ... OK (built source)
By default packrat
auto.snapshot my project when I updated a package. Crucially, this meant I could not use
packrat::restore() because I was
already up to date. Of course, if I had been using
git and making commits before and after, then there wouldn't have been a problem. However, without
packrat::restore() to downgrade a package requires the
overwrite.dirty=TRUE option and only seems possible if you switched off
auto.snapshot. I'd highly recommend turning it off anyway, as the manual update will allow for a better understanding and control over packrat.