Skip to content

order_by() with call to lag() (but not dplyr::lag()) causing crash #3065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rebeccaferrell opened this issue Aug 31, 2017 · 6 comments
Closed
Assignees

Comments

@rebeccaferrell
Copy link

On Windows 10 Enterprise, I am experiencing crashes in R 3.4.1 when using the order_by() helper function (as in the window function vignette) with lag() in dplyr 0.7.2, tidyverse 1.1.1. As you can see below, it works with dplyr::lag() but not unspecified lag() so perhaps some namespace issue?

After running the last block of code, my R session gives a crash error message and must be restarted:

library(tidyverse)

data <- data_frame(
    ordering = 1:10,
    y = 1:10
)

# this works with order_by argument to lag()
happy_face <- data %>%
    mutate(z = lag(y, order_by = ordering))

# order_by() works with call to cumsum()
also_happy_face <- data %>%
    mutate(z = order_by(ordering, cumsum(y)))

# order_by() works with explicit call to dplyr::lag()
still_a_happy_face <- data %>%
    mutate(z = order_by(ordering, dplyr::lag(y)))

# order_by() helper crashes with call to unspecified lag()
sad_face <- data %>%
    mutate(z = order_by(ordering, lag(y)))
@rebeccaferrell
Copy link
Author

Following up: I confirm I experience the same crashing in R 3.4.0, dplyr 0.7.2 on Mac OS X El Capitan. Running debug(order_by), isolating the problem to running this line in order_by() (I don't understand these quazy quosures well enough to probe further):

fn <- set_expr(quo, node_car(get_expr(quo)))

@cderv
Copy link
Contributor

cderv commented Sep 2, 2017

Using debug(order_by), we see that the argument call of the order_by function is not set to lag(y) but to the result c(NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L).
Reproducing the call to the order_by function, I could isolate the issue to rlang::node_car

library(dplyr)
library(rlang)
# example data
data <- data_frame(
  ordering = 1:10,
  y = 1:10
)
# reproduce the call argument
call <- lag(data$y)
# quo instead of enquo inside of order_by
quo <- quo(!!call)
# this expression is ok
temp1 <- get_expr(quo)
# this one make the r session crashed
temp2 <- node_car(temp1)

rlang::node_car should take as input an expression or a formula. Problem here is it is neither.

library(rlang)
data <- dplyr::data_frame(
  ordering = 1:10,
  y = 1:10
)
call <- dplyr::lag(data$y)
quo <- quo(!!call)
temp1 <- get_expr(quo)
temp1
#>  [1] NA  1  2  3  4  5  6  7  8  9
is_expr(temp1)
#> [1] FALSE
is_formula(temp1)
#> [1] FALSE
class(temp1)
#> [1] "integer"

I think it is the reason why it is crashing.
When using dplyr::lag(y), we have this working behaviour

library(rlang)
data <- dplyr::data_frame(
  ordering = 1:10,
  y = 1:10
)

quo <- quo(dplyr::lag(data$y))
temp1 <- get_expr(quo)
temp1
#> dplyr::lag(data$y)
is_expr(temp1)
#> [1] TRUE
is_formula(temp1)
#> [1] FALSE
class(temp1)
#> [1] "call"
temp2 <- node_car(temp1)

node_car should maybe check its input to see if it is correct. This way crash could be prevented. rlang issue on this one.

However, there will still be a dplyr issue because order_by is not working. It seems it should pass lag(y) and not its result. I will continue to investigate and understand why.

I hope this help understand.

@lionel- lionel- self-assigned this Sep 2, 2017
@lionel-
Copy link
Member

lionel- commented Sep 2, 2017

node_car should maybe check its input to see if it is correct.

Nope, the check should have been this line in order_by():

stopifnot(is_lang(quo))

because in early rlang versions is_lang() would check the contents of the quosure, e.g. it would do is_lang(get_expr(quo)) automatically. We changed that later on but forgot to update that line. It should be:

stopifnot(quo_is_lang(quo))

@cderv
Copy link
Contributor

cderv commented Sep 2, 2017

Could not have known this one but could have guess it. 😉 indeed there is a check in order_by !

This new check will prevent the crash of r session.
However, order_by will not work lag(y) but dplyr::lag(y). I did not understand why mutate and its internal do not pass lag(y) to order_by function but evaluate lag(y) and pass its results.
It is not the case with dplyr::lag(y) and neither with cumsum(y).

@lionel-
Copy link
Member

lionel- commented Sep 4, 2017

@krlmlr The issue here, besides the wrong type checking, is that the hybrid handler preemptively replaces the lag() call with its value. That's not appropriate because order_by() is a quoting function, not a pure function.

@krlmlr
Copy link
Member

krlmlr commented Sep 15, 2017

Yes, we have a whole battery of skipped tests. The underlying issue is the ad-hoc argument matching, ideally we'd like to do the argument matching properly and only once, not for every group. There's an open issue for this problem.

Can we close this ticket?

@hadley hadley closed this as completed Nov 2, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jun 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants