Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lag inconsistent with base R and clobbers all other uses of lag #1586

Closed
ggrothendieck opened this issue Dec 15, 2015 · 13 comments
Closed

lag inconsistent with base R and clobbers all other uses of lag #1586

ggrothendieck opened this issue Dec 15, 2015 · 13 comments

Comments

@ggrothendieck
Copy link

@ggrothendieck ggrothendieck commented Dec 15, 2015

lag in dplyr clobbers lag in base R and also any lag methods. To make it worse it clobbers it with a function that works differently from lag in R reversing the notation of lead and lag.

lag in dplyr should be renamed to Lag or some other name which will not conflict with core R and other packages.

A warning is issued when dplyr is loaded but this inconsistency is so egregious that I don't think that that is sufficient.

Try the following and note the difference.

lag(ts(1:4))
library(dplyr)
lag(ts(1:4))

Note that zoo and dyn work consistently with R. lag.xts works the opposite way that R works but at least it just adds an xts method and does not interfere with other methods. dynlm provides L which is like lag but in the opposite direction which seems ok because it uses a different name.

@hadley

This comment has been minimized.

Copy link
Member

@hadley hadley commented Dec 15, 2015

This is a natural consequence of having lots of R packages. Just be explicit and use stats::lag or dplyr::lag

@hadley hadley closed this Dec 15, 2015
@ggrothendieck

This comment has been minimized.

Copy link
Author

@ggrothendieck ggrothendieck commented Dec 15, 2015

I really don't agree at all. This is not the result of many packages. It's a conflict between R out of the box and dplyr. Even if there were no other packages this problem would exist. The names in R ought to be respected.

@joshuaulrich

This comment has been minimized.

Copy link

@joshuaulrich joshuaulrich commented Dec 16, 2015

@ggrothendieck see the discussion on the commit f8a46e0 for a more thoughtful line of reasoning. FWIW, I agree with you. The least-bad solution is: don't mask generics from base packages.

@ggrothendieck

This comment has been minimized.

Copy link
Author

@ggrothendieck ggrothendieck commented May 3, 2016

This just came up yet again on stackoverflow and the situation did not even involve lag in an obvious way although dplyr's lag was the culprit.
http://stackoverflow.com/questions/36991517/retain-zoo-class-after-lapply
The comments there refer to yet another SO post where it came up too showing how prevalent this problem is.

@jsekamane

This comment has been minimized.

Copy link

@jsekamane jsekamane commented Oct 18, 2017

@hadley: It is easy for me to use stats::lag and dplyr::lag in my own code. But how about the many packages that use the base R lag-function, without explicitly writing stats::lag, how do I proceed here?

  • Should I unload/reload dplyr every time I want to use a function from a different package with which there is a conflict? (Note: this is not possible if I use tidyverse. In this case I would have to unload tidyverse/broom).
  • Should I toggle between lag = function(...) { stats::lag(...) } and lag = function(...) { dplyr::lag(...) } whenever there is a conflict.
  • Should I manually add stats:: to the function with which there is a conflict: trace("conflictFunction", edit=TRUE) + some search/replace?
  • or ... ?

While I haven't been able to search specifically for R packages that use the (unspecified) lag-function, the 3,439 available code results give some indication of the scale of this problem. On the other hand, it is quite clear that very few packages are specific in terms of which lag function they use; stats::lag is used ~10 times and dplyr::lag is used ~7 times.

@lionel-

This comment has been minimized.

Copy link
Member

@lionel- lionel- commented Oct 18, 2017

The namespaces of packages are encapsulated and fixed in advance. So other packages will always use the intended lag() function even if you attach dplyr. The packages attached with library() are sort of a namespace for your scripts (the full story is a little more complicated) so you only need to be careful about lag in your scripts.

@lionel-

This comment has been minimized.

Copy link
Member

@lionel- lionel- commented Oct 18, 2017

The namespace encapsulation of packages is a very good reason to turn your utility functions into a package, this way they will aways work the same way no matter what packages you've attached to the search path.

@ljanissen

This comment has been minimized.

Copy link

@ljanissen ljanissen commented Dec 21, 2017

@lionel- I am experiencing the same problems as @jsekamane . I don't really understand your explanation of namespace encapsulation, but you seem to suggest that loading dplyr will not interfere with other package's usage of lag(). That has not been my experience. I am forced to unload/load dplyr depending on the project I'm working on. I'll know right away if I've forgotten to do this because I get crash with msg: "Error: n must be a nonnegative integer scalar, not double of length 1". Aha! forgot to unload dplyr!!

I love the tidyverse but it makes my whole development environment unstable.
What can we do?

@hadley

This comment has been minimized.

Copy link
Member

@hadley hadley commented Dec 21, 2017

@ljanissen I'd suggest asking on https://community.rstudio.com — it's likely you'll need a solid reprex to get a good answer, and I'd bet the process of creating a reprex will help you identify exactly where the problem occurs.

@lionel-

This comment has been minimized.

Copy link
Member

@lionel- lionel- commented Dec 21, 2017

@ljanissen Are these CRAN packages? CRAN forces you to import functions explicitly in order to make your package completely encapsulated but if the package isn't on CRAN this might not be the case. If lag() isn't explicitly imported then the package is relying on the interactive search path to get the definition of lag. This is very brittle. Instead it should use the roxygen directive #' @importFrom stats lag. Imported functions will never be masked by something on the search path.

@ljanissen

This comment has been minimized.

Copy link

@ljanissen ljanissen commented Dec 21, 2017

Thank you @hadley and @lionel- . I believe the problem is in quantstrat. I traced it (painfully) in the debugger which is how I worked out it was lag(), but I didn't pull out the code to create a reproducible example. I found some r-forge logs from 2015 that seem to be related, and that they feel they addressed the issue, so now I'm doubting myself. I think putting together a clear example is a good idea.

@joshuaulrich

This comment has been minimized.

Copy link

@joshuaulrich joshuaulrich commented Dec 21, 2017

@ljanissen quantstrat never calls lag(), so it cannot be the problem. quantstrat depends on blotter, which does call lag(). But blotter imports stats::lag() to avoid this issue, so that can't be the problem either. I suspect the issue is with a function you're passing into the quantstrat workflow, but it's impossible to know for sure without a reproducible example.

@ljanissen

This comment has been minimized.

Copy link

@ljanissen ljanissen commented Dec 21, 2017

@joshuaulrich thanks for the clarification. Yes, I think a reprex is called for.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
6 participants
You can’t perform that action at this time.