Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packrat support #110

Closed
dpmccabe opened this issue Feb 13, 2018 · 9 comments
Closed

Packrat support #110

dpmccabe opened this issue Feb 13, 2018 · 9 comments

Comments

@dpmccabe
Copy link

Packrat is, in my opinion, a necessary component in R projects these days, especially if there are multiple developers and/or multiple servers running your project's code.

While I could write my init.R to specify the exact package versions to install, it would be great if this buildpack could use Packrat to install dependencies since we're using it anyway.

When I deploy my code to a VPS for the first time (or when I add a dependency), it's just a matter of running packrat::restore() inside the project folder. If you're not familiar with the package, there's a local .Rprofile file that sources packrat/init.R. The restore function parses packrat/packrat.lock, which is like the Gemfile.lock file in a Ruby project, and runs install.packages for each specific package version. Tarballs for the dependencies are already in packrat/src/ (since they're checked in to git) and the dependencies are installed into packrat/lib/x86_64-apple-darwin15.6.0/3.4.3 (which is gitignored). So no need to go out to CRAN and download them, which is nice.

In my buildpack's init.R (not the packrat/init.R), I figured you could do something this:

my_packages = c("packrat")
install_if_missing = <snip>
invisible(sapply(my_packages, install_if_missing))

setwd("/app")
source(".Rprofile") # which just sources packrat/init.R
packrat::restore()

This doesn't work, though, since I'm apparently not in the app directory. If I replace the last three lines with

list.files("/app")
setwd("/app")
print(getwd())
print(list.files(getwd()))

then git push heroku master prints

remote: -----> Executing init.r script
remote:        [1] "data.R"                "environment.example.R" "init.R"
remote:        [4] "packrat"               "README.md"             "run.R"
remote:        [7] "server.R"              "ui.R"                  "wrapper.R"
remote:        [1] "/tmp/build_50bdef0edfab8ae27abbeb3a2c074a7c"
remote:        character(0)

So /app symlinks to a temp directory, which contains my app files. But when I try to enter that directory, the files disappear. Any idea what's going on? Thanks!

@virtualstaticvoid
Copy link
Owner

I guess there are two ways of addressing this.

Firstly, for the least impact in the short term, taking your approach, using the init.R file to install the packrat package and running the package restore.

Secondly, in the medium term, to add "native support" by including packrat by default, changing the detection logic to look for a packrat directory (and/or packrat.lock file) in the root (possibly removing the need for the init.R file) to determine whether the R buildpack applies, and then to run the restore.

I'll try and get the first option working in the meantime.

@dpmccabe
Copy link
Author

dpmccabe commented Mar 23, 2018

Thanks for looking into this. My workaround for the time being is to parse the packrat.lock file myself in init.R and install the packages from the tarballs already contained in packrat/src:

install.packages("remotes")

pkgs <- as.data.frame(read.dcf("/app/packrat/packrat.lock")[-1, , drop = FALSE])

for (i in 1:nrow(pkgs)) {
  pkg <- pkgs[i, ]

  message("Trying to install ", pkg$Package)

  if (pkg$Package %in% rownames(installed.packages())) {
    message(pkg$Package, " is already installed")
  } else if (pkg$Source == "CRAN") {
    f <- file.path("/app/packrat/src", pkg$Package, paste0(pkg$Package, "_", pkg$Version, ".tar.gz"))
    message("...from ", f)

    remotes::install_local(f, INSTALL_opts = "--no-docs --no-help --no-demo")
  } else if (pkg$Source == "github") {
    f <- file.path("/app/packrat/src", pkg$Package, paste0(pkg$GithubSha1, ".tar.gz"))
    message("...from ", f)

    remotes::install_local(f, INSTALL_opts = "--no-docs --no-help --no-demo")
  }
}

Packrat supports more sources than just CRAN and GitHub, but this is good enough for my purposes right now.

Note: I also had to add .Rprofile to .slugignore so that that source("packrat/init.R") line isn't evaluated during compilation.

@virtualstaticvoid
Copy link
Owner

virtualstaticvoid commented Mar 26, 2018

Yes, I found the issue with execution of the .Rprofile file during slug compilation. I tweaked the compile script by adding the --no-init-file switch to the command in my test, since .slugignore would cause the file not to be included at runtime, which wouldn't be desirable.

I've experimented with the first approach I suggested above, but without much luck. The main issue is that since the /app is symlinked into the fake chroot, under /app/.root/app, this causes issues when packrat tries to restore the packages. I tried copying the files instead of symlinking them, but then ran into other issues.

I have reached out to the Heroku team to see whether they have any ideas for a better way to package R given it's unique requirements and the restrictions imposed during slug compilation; which is why I had to resort to using a fake chroot.

@virtualstaticvoid
Copy link
Owner

See the heroku-buildpack-r-packrat-test project. You will see that packrat::restore() succeeds, however without the /app symlink and the project packages installed under /app/packrat/lib*, the slug size exceeds the limit of 500MB.

Also since /app is copied instead of being symlinked, if you have a multi-buildpack scenario, the files may become out of sync.

@ankane
Copy link

ankane commented Jun 12, 2018

Hey @virtualstaticvoid, did the Heroku team have any ideas?

@virtualstaticvoid
Copy link
Owner

virtualstaticvoid commented Jun 12, 2018

Hi @ankane

Yes, their feedback was to use the container stack together with a build manifest (heroku.yml).

I went ahead and implemented a compatible solution which will work for most cases, but will require rework if multiple buildpacks were used.

Currently I am working on implementing "native support" for packrat in the buildpack. See heroku-16-packrat branch. Note that this is still work in progress.

@ankane
Copy link

ankane commented Jun 13, 2018

Thanks @virtualstaticvoid, I'll check out that branch. The general issue I've seen with container only solutions is it doesn't allow for the same fine-grained caching as buildpacks (like package-level caching). Even as containers gain popularity, I still think buildpacks are needed for fast deploys (Dokku does a great job combining both technologies).

@stephenhmarsh
Copy link

+1

I definitely need packrat support in my project but I also need to run R along with Ruby, so it looks like the container approach is the way to go

@virtualstaticvoid
Copy link
Owner

Fixed by #123 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants