Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize arguments of webshot #33

Merged
merged 20 commits into from Dec 1, 2016
Merged

Vectorize arguments of webshot #33

merged 20 commits into from Dec 1, 2016

Conversation

@FrancoisGuillem
Copy link
Contributor

@FrancoisGuillem FrancoisGuillem commented Nov 28, 2016

Hello,

This pull request solves #32 . It vectorizes all arguments of "webshot" in order to improve performance when taking a lot of screenshots. For instance, on my laptop, taking 10 screenshots with default parameters takes 7 seconds instead of 27 seconds.

Parameters for each screenshot are passed to script webshot.js as a JSON array. Then casper.eachThen is used to loop over each configuration object.

@timelyportfolio
Copy link
Contributor

@timelyportfolio timelyportfolio commented Nov 28, 2016

Wow, this looks really nice! Thanks for your efforts adding this functionality.

Copy link
Owner

@wch wch left a comment

I like the general approach, although since the change is pretty substantial, I'm concerned about the risk of breaking existing behavior. If you can verify that you've tested the behavior for the following (vectorized and un-vectorized), that would help:

  • Broken URLs
  • Missing selectors
  • Different zoom levels
  • General error handling

Also, it would be good to have one or two examples in the README and in the vignette.

R/webshot.R Outdated
#' @param file Name of output file. Should end with \code{.png}, \code{.pdf}, or
#' \code{.jpeg}.
#' @param url A vector of URLs to visit.
#' @param file A Vector of names of output files. Should end with \code{.png},

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

Should be "vector" instead of "Vector".

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

Also, can you describe what happens to filenames if there are multiple URLs but only one filename?

R/webshot.R Outdated
#' \code{.jpeg}.
#' @param url A vector of URLs to visit.
#' @param file A Vector of names of output files. Should end with \code{.png},
#' \code{.pdf}, or \code{.jpeg}.

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

Could you indent consistently with the other documentation? If you're using RStudio, you can press Ctrl-Shift-/ (or Cmd-Shift-/ on Mac) to automatically reflow/indent correctly.

R/webshot.R Outdated
@@ -112,6 +124,17 @@ webshot <- function(
stop("Need url.")
}

if (length(url) > 1) {
if (length(file) == 1) {
file <- sapply(1:length(url), function(i) {

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

Please use vapply or lapply instead of sapply -- this makes it easier for others to infer what is returned.

Also, use seq_along(url) instead of 1:length(url).

R/webshot.R Outdated
gsub("\\.(.{3,4})$", replacement, file)
})
} else if (length(file) != length(url)) {
stop("parameters 'url' and 'file' should have same length")

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

Should say that file can also be length 1.

R/webshot.R Outdated
@@ -112,6 +124,17 @@ webshot <- function(
stop("Need url.")
}

if (length(url) > 1) {
if (length(file) == 1) {

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

It would be helpful to have a short comment describing what this block does.

if (opts.selector) {
opts.selector = opts.selector.split(",");
}
var data = JSON.parse(args[1]);

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

I think data is a bit vague. Something like allOpts would be better (though there's probably a better name that could be used).

var opts = response.data;

// Prepare options
opts = utils.merge(opt_defaults, opts);

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

I believe the utils.merge function has the side effect of modifying opts_defaults, which is undesirable, since that variable gets reused now.

This comment has been minimized.

@FrancoisGuillem

FrancoisGuillem Nov 28, 2016
Author Contributor

Does replacing this by opts = utils.merge(opts, opt_defaults); resolve the problem ?

This comment has been minimized.

@FrancoisGuillem

FrancoisGuillem Nov 28, 2016
Author Contributor

Ho no I see the problem, forget it.

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

That has different behavior: it will overwrite values in opts with those from opt_defaults, so that the defaults will always supersede the values in opts. I think it would make sense to add a (shallow) object cloning function called utils.clone.

this.capture(filename, cr);
// Go to url and perform the desired screenshot
this.thenOpen(url, function(r) {
this.zoom(opts.zoom)

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

I'm not 100% sure about this, but I believe that it's more efficient to do the zoom and viewport sizing before opening the URL.

I think it would be better to do the chaining like this:

casper
  .zoom(...)
  .viewport(...)
  .thenOpen(...)
  .wait(..., function() {
    ...
  });
R/webshot.R Outdated
})
}

if (!is.null(selector)) data$selector <- argToVec(selector)

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

selector and expand should probably check for the correct number of items (either 1 or 4) in the R code instead of in the Javascript, because an error in the JS could now result in a partially-successful run. Better to have it not work at all than to partially work.

eval(opts.eval);
}
casper.eachThen(data, function(response) {
var url = response.data.url;

This comment has been minimized.

@wch

wch Nov 28, 2016
Owner

url and file don't need to be pulled out separately from the rest of the opts, do they?

@FrancoisGuillem
Copy link
Contributor Author

@FrancoisGuillem FrancoisGuillem commented Nov 29, 2016

I think I have corrected every problem. I have tested unvectorized and vectorized versions of examples of the vignette and everything looks fine.

Invalid URLs are better handled. Before, webshot was creating en empty screenshot. Now it throws an explicit error.

I Have not tested functions resize and shrink because I have not the required tools on my profesional computer. Could you test it ?

Copy link
Owner

@wch wch left a comment

@cuche27 Looking good! I have more comments though... I don't have time right now to test out those functions but I may in the next couple of days.

@yihui with these changes, the object of class webshot can now be a vector of filenames instead of just one. What kind of changes will knit_print.webshot need to work with a vector of names?

@jimhester will the vector of names work with the code you've written to use webshot objects?

R/image.R Outdated
@@ -21,6 +21,11 @@
#' }
#' @export
resize <- function(filename, geometry) {
mapply(.resize, filename = filename, geometry = geometry)

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

2 issues with mapply:

  • Like sapply, the return type isn't known in advance (it could be a list, char vector, numeric vector, etc.) unless you use SIMPLIFY = FALSE.
  • If you pass an unnamed char vector to mapply, it returns a named vector or list, which is undesirable here. (For example, try mapply(function(x) 1, c("a","b")).)

To ensure that the result is an unnamed char vector, it should be something like:

as.character(mapply( ... , USE.NAMES = FALSE, SIMPLIFY = FALSE))

Also, I think the documentation needs to be updated to indicate that the args can be vectors. This would be similar to the documentation changes for webshot.

R/image.R Outdated
structure(filename, class = "webshot")
}

.resize <- function(filename, geometry) {

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

I'd prefer this to be called resize_one.

R/webshot.R Outdated
if(!is.null(expand) && !is.list(expand)) expand <- list(expand)

# Check length of arguments
lengthOfArgs <- vapply(environment(), length, numeric(1))

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

I don't like the vapply(environment()), because it's fragile (if code above is modified to add a new var, it could break) and because using vapply on an environment just seems strange.

gsub("\\.(.{3,4})$", replacement, file)
})
}

if (is_windows()) {
url <- fix_windows_url(url)

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

I think fix_windows_url won't work with multiple URLs.

This comment has been minimized.

@FrancoisGuillem

FrancoisGuillem Nov 29, 2016
Author Contributor

It looks fine actually. All functions used in fix_windows_url accept vectors.

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

Yes, but it calls if(grepl(...)), and if is not vectorized. For example:

> if (c(T, F, F)) 1 else 0
[1] 1
Warning message:
In if (c(T, F, F)) 1 else 0 :
  the condition has length > 1 and only the first element will be used

This comment has been minimized.

@FrancoisGuillem

FrancoisGuillem Nov 29, 2016
Author Contributor

Oh yes I missed that. It should be fixed now.

' phantomjs webshot.js <url> <name>.png [options]');
if (args.length < 2) {
console.log(
'usage: phantomjs webshot.js <optsList>\n' +

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

I'd prefer the original formatting, which puts "Usage:" on a separate line:

Usage:
  phantomjs webshot.js <optsList>

...
casper.zoom(opts.zoom)
.viewport(opts.zoom * opts.vwidth, opts.zoom * opts.vheight)
.thenOpen(opts.url, function(r) {
casper.wait(opts.delay * 1000, function() {

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

I think the usual convention is to use this instead of casper, right?

This comment has been minimized.

@FrancoisGuillem

FrancoisGuillem Nov 29, 2016
Author Contributor

I fill unconfortable with this convention but indeed it is the case. I will change it back :)

R/image.R Outdated
@@ -77,6 +81,11 @@ resize <- function(filename, geometry) {
#' }
#' @export
shrink <- function(filename) {
mapply(.shrink, filename = filename)
structure(filename, class = "webshot")

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

Same issues here as with resize.

R/webshot.R Outdated
#' of multiple URLs, this parameter can also be a list with same length as
#' \code{url} with each element of the list being "viewport" or a
#' four-elements numeric vector. This option is not compatible with
#' \code{selector}

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

Missing period at end of sentence.

R/webshot.R Outdated
@@ -45,6 +56,12 @@
#' # Might need a longer delay for all assets to display
#' webshot("http://rstudio.github.io/leaflet", delay = 0.5)
#'
#' # One can also take screenshots of several URLs with only one command.
#' # This is more efficient than calling multiple times 'webshot'.

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

"... calling 'webshot' multiple times."

R/appshot.R Outdated
@@ -2,6 +2,8 @@
#'
#' @inheritParams webshot
#' @param app A Shiny app object, or a string naming an app directory.
#' @param file Name of output file. Should end with \code{.png}, \code{.pdf}, or

This comment has been minimized.

@wch

wch Nov 29, 2016
Owner

I don't think this is necessary, because appshot's documentation should inherit values from webshot.

@yihui
Copy link
Collaborator

@yihui yihui commented Nov 29, 2016

I can make changes in knitr after this PR is merged.

@jimhester
Copy link
Contributor

@jimhester jimhester commented Nov 29, 2016

No this does not currently work with printing within knitr documents. The knit_print.webshot() function assumes only one filename and would need to be modified to return more than one html_screenshot object. Not sure if there is some functionality in knitr to do this already or not.

@jimhester
Copy link
Contributor

@jimhester jimhester commented Nov 29, 2016

Actually this seems to be all that is needed to get it working.

diff --git a/R/webshot.R b/R/webshot.R
index 02f9c0d..1aadbdc 100644
--- a/R/webshot.R
+++ b/R/webshot.R
@@ -236,9 +236,11 @@ webshot <- function(
 }

 knit_print.webshot <- function(x, ...) {
-  res <- readBin(x, "raw", file.size(x))
-  ext <- gsub(".*[.]", "", basename(x))
-  structure(list(image = res, extension = ext), class = "html_screenshot")
+  lapply(x, function(filename) {
+    res <- readBin(filename, "raw", file.size(filename))
+    ext <- gsub(".*[.]", "", basename(filename))
+    structure(list(image = res, extension = ext), class = "html_screenshot")
+  })
 }

A simple test is

knit(text = c("```{r}", "webshot(c(\"https://www.r-project.org/\", \"https://cran.r-project.org/package=webshot\"))", "```"))
#> [1] "\n```r\nwebshot(c(\"https://www.r-project.org/\", \"https://cran.r-project.org/package=webshot\"))\n```\n\n![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-1.png)![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-2.png)"
@FrancoisGuillem
Copy link
Contributor Author

@FrancoisGuillem FrancoisGuillem commented Nov 29, 2016

I have tested functions shrink and resize. They seem to work correctly.

@wch wch merged commit 1d6e6b1 into wch:master Dec 1, 2016
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@wch
Copy link
Owner

@wch wch commented Dec 1, 2016

Thanks!

@wch wch mentioned this pull request Dec 1, 2016
@wch
Copy link
Owner

@wch wch commented Dec 1, 2016

Oh, I just realized that the result of the mapply calls in resize and shrink aren't saved, so the SIMPLIFY and USE.NAMES args aren't needed. But there's also no harm in having those there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.