Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter filtered out my filter! #1480

Closed
lianos opened this issue Oct 27, 2015 · 12 comments
Closed

filter filtered out my filter! #1480

lianos opened this issue Oct 27, 2015 · 12 comments

Comments

@lianos
Copy link

lianos commented Oct 27, 2015

The general problem is that dplyr's exported filter function tramples over stats::filter. This is all fine and good in the context of a workspace/interactive session, because we are at least told as much:

R> library(dplyr)
Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

But this also tramples calls to filter that are invoked within other packages that were previously loaded.

For instance, the limma bioconductor package, calls filter in its tricubeMovingAverage function, but dplyr will hose it:

R> library(dplyr)
R> library(limma)
R> tricubeMovingAverage(rnorm(100))
Error in UseMethod("filter_") :
  no applicable method for 'filter_' applied to an object of class "c('double', 'numeric')"

One way to get around this issue is to ask the limma authors (and who knows what other packages call filter assuming they will get stats::filter) to explicitly call stats::filter, or import stats in their NAMESPACE, however I can also imagine that a package author might be surprised to have to explicitly import functions from base, stats, etc. that they would (should(?)) reasonably expect to be available to them without special consideration.

So I'm left wondering what other options are available here? Unfortunately there is no alternate British spelling of the word filter that I'm aware of, which would have made this a bit easier :-)

Would promoting filter to an S3 generic in dplyr's NAMESPACE be one way to get around this problem? I could then imagine having dplyr::filter dispatch down to stats::filter if it's not called on a tbl_df (or data.frame, or whatever).

Or, perhaps, define a filter_.numeric (or filter_.default(?)) function which sends all the right arguments over to stats::filter

@hadley
Copy link
Member

hadley commented Oct 27, 2015

That is what R packages are supposed to do, and R CMD check now warns if you call a function from stats without being explicit about it.

@hadley hadley closed this as completed Oct 27, 2015
@lianos
Copy link
Author

lianos commented Oct 27, 2015

I realized that R CMD check now fires off warnings about this shortly after posting, actually, but would you accept a patch that enables dplyr::filter to play nice with stats::filter, even when it tramples it in the working environment (ie. outside of the package context)? Seems like we'd have the best of both worlds where filter works with dplyr as expected, and doesn't stamp out stats::filter, no?

This would also buy time for other packages to clean up their internals.

If you would entertain such a patch, which way would you like to see that go? Defining a filter_.numeric function, perhaps?

@hadley
Copy link
Member

hadley commented Oct 27, 2015

Making a method to the generic won't work, and is likely to break a lot of existing code. I think in this case it's pretty clear that the problem is with limma.

@lianos
Copy link
Author

lianos commented Oct 27, 2015

Although I used limma as an example, I wasn't necessarily talking about limma specifically. As far as limma in particular is concerned, it has already been fixed in SVN.

I was raising this issue in more general terms. I would have thought that it would be desirable in these cases (in general) to allow for "safe passage" of core R functions, if at all possible.

Forgive me if I'm missing the obvious, but I don't follow how this would break existing code. As it stands now, when dplyr is loaded, a call to filter that uses a value that normally works with stats::filter (ie. a numeric, ts, or whatever) now breaks anyway because dplyr::filter can't find the relevant function to dispatch to.

Including and exporting the following (very ugly and can surely be written more elegantly) function seems to do the trick:

filter_.default <- function(.data, ..., .dots) {
  fargs <- formals(stats::filter)
  dnames <- names(.dots)
  pos.args <- lapply(.dots[nchar(dnames) == 0], identity)
  named.args <- lapply(.dots[dnames %in% names(fargs)], identity)
  call.args <- lapply(c(pos.args, named.args), lazyeval::lazy_eval)
  do.call(stats::filter, c(list(.data), call.args))
}

Now the examples from stats::filter work, as well as dplyr::filter.

Yes, package authors will have to eventually clean up their packages and fix up their NAMESPACEs to comply to this new standard (which, for the record, I totally think is a good thing to do). This is good programing practice, and it's best encourage people to do it.

In the meantime, why not allow even more code to work (not less). You also get the added bonus of allowing someone to work in an interactive session with dplyr, and filter will work as expected in all of the same scenarios that worked before dplyr was loaded.

@hadley
Copy link
Member

hadley commented Oct 27, 2015

The problem is that it also introduces potentially new bugs, and I'd need to spend some time thinking about your proposed implementation of filter_.default to figure out what the new failure modes are. Currently I know exactly what the problem is and how to tell people how to fix it, introducing this new code is likely to introduce unknown problems which I don't know how to fix.

The point seems moot as to submit a package to CRAN you know need to fix the namespace problem, so it's likely this problem will have disappeared in 6 months without me having to do anything :)

@mdsumner
Copy link
Contributor

Yet

x <- 1:100
## all is well
filter(x, rep(1, 3))
#Time Series:
#Start = 1 
#End = 100 
# ...
# ...


library(dplyr)
filter(x, rep(1, 3))

#Error in UseMethod("filter_") : 
#  no applicable method for 'filter_' applied to an object of class "c('integer', 'numeric')"

Isn't this a case where dplyr should define the S3 generic for filter and provide filter.default as stats::filter, as per sets.r?

@lianos
Copy link
Author

lianos commented Feb 20, 2016

Yes, that was my point exactly.

@mdsumner
Copy link
Contributor

It goes deeper, dplyr clashes with raster S4 set funs, and filter/select.
Dplyr masks those, and raster breaks dplyr intersect. Adding S3 methods to
raster fixes some but not all. More soon

On Sat, 20 Feb 2016 1:19 pm Steve Lianoglou notifications@github.com
wrote:

Yes, that was my point exactly.


Reply to this email directly or view it on GitHub
#1480 (comment).

Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia

@mikedolanfliss
Copy link

I believe I'm having similar issues with dplyr and raster not playing nice together, depending, I believe, on load order. I'm following the conversation a little bit... not sure exactly what to do in a pinch. Either "no applicable method for 'extract_'" or "unable to find an inherited method for function" ‘select’ for signature ‘"tbl_df"’" - I think depending on load.

@hadley
Copy link
Member

hadley commented Jul 14, 2016

@mikedolanfliss I'd recommend creating a minimal reprex and sending to one of the R spatial mailing lists

@mikedolanfliss
Copy link

Thanks Hadley, I'll look into that. I've seen some SO remarks suggesting temporarily detach()-ing the packages to try to minimize the conflicts in the namespace, or using unloadNamespace(). Horrible idea? :)

@hadley
Copy link
Member

hadley commented Jul 17, 2016

Detaching is generally a bad idea because it typically only does 95% of the job and you get a set of weirder conflicts afterwards

@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants