Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dangerous silent failures due to no argument matching #413

Closed
alexpghayes opened this issue Jun 30, 2018 · 8 comments · Fixed by #1104
Closed

Dangerous silent failures due to no argument matching #413

alexpghayes opened this issue Jun 30, 2018 · 8 comments · Fixed by #1104

Comments

@alexpghayes
Copy link
Collaborator

One example: it can look like you're augmenting newdata:

augment.Mclust(Mclust_obj, newdata = df)

while you're just getting the augmented original data because augment.Mclust doesn't actually take a newdata argument.

validate_augment_input is going to deal with this in the augment case but this is generally an issue with generics where you think you've passed an argument but it just gets absorbed in ... and ignored. In general some sort of match.arg should fix this. This is especially likely to happen in packages like broom that wrap lots of things and send stuff through preprocessing helpers like augment_columns or process_lm.

There are probably a lot of places in broom where we need to check that this isn't happening.

Argument matching would be a good thing to include a Best practices for developing modelling packages document. cc @topepo

@dgrtwo
Copy link
Collaborator

dgrtwo commented Jul 2, 2018

I think a simple solution would be to have a utility function that checks for common mistakes:

check_dots <- function(...) {
  dots <- list(...)
  if ("newdata" %in% names(dots)) {
    stop("This method does not take a newdata argument")
  }
}

f <- function(x, ...) {
  check_dots(...)
  x
}

f(3, newdata = 4)

@alexpghayes
Copy link
Collaborator Author

The big issue is when you have a function that takes ... as argument but doesn't do anything with .... My guess is this happens most often to match the signature of an S3 generic. Then you can misspell something and accidentally pass the argument to ... and have it ignored. You don't know how the user is going to mispecify the argument though.

Suppose you have some function g that's the last function that will be called. Then you can just check if anything got passed to ...:

check_dots <- function(...) {
  # use match.call to avoid evaluating arguments in ...
  passed <- names(as.list(match.call())[-1])
  if (length(passed) > 0)
    stop("Unused arguments: ", paste(passed, collapse = ", "))
}

# suppose this takes ... to align with a generic signature
g <- function(a, b, round_result = FALSE, ...) {
  check_dots(...)
  res <- a * b
  if (round_result) round(res) else res
}

g(1, 2, some_misspelled_argument = 4)

Often, though, you might pass ... to a function that someone else wrote that doesn't do any checking. If g is some function from base R or another package, it could easily look like:

g <- function(a, b, round_result = FALSE, ...) {
  res <- a * b
  if (round_result) round(res) else res
}

g(1, 2, some_misspelled_argument = 4)  # succeeds

This is especially worrisome in a modelling context. Imagine something like misspelling conf.lvel = 0.99 and having this get ignored, and then doing inference. You could easily get a tibble back with exactly the structure you expect, but the numbers will just be wrong. Unless you have killer intuition or find the code error later, you'll never know.

In broom we often use ... to pass arguments to modelling utilities that other people wrote, along the lines of:

f <- function(x = 3, y = 4, ...) {
  res <- g(x, y, ...)
  res
}

and g doesn't have check_dots(...). One option would be to manually look at everything ... gets passed to, make a list of acceptable arguments and check our call against this list, but this is a huge amount of work. I don't know if we ever passed ... to functions outside of broom in a way that potentially allows important arguments to disappear, but I wouldn't be terribly surprised if we do in one or two places.

@alexpghayes
Copy link
Collaborator Author

This turns out to be a much deeper problem than I had any clue about, but luckily the tidyverse crew has https://github.com/hadley/ellipsis. To use this package we just add an ellipsis::check_dots_used() call to the model generics. Will begin experimenting with this once some other higher priority items get dealt with.

@github-actions
Copy link

This issue has been automatically closed due to inactivity.

@github-actions
Copy link

github-actions bot commented Jul 2, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 2, 2021
@alexpghayes alexpghayes reopened this Dec 28, 2021
@alexpghayes
Copy link
Collaborator Author

We should keep this open, but really the solution here is to use ellipsis/rlang::check_dots_used() in the generic definitions in generics. We tried that at one point but it caused a bunch of revdep breakages that we didn't have time to fix on the timeline generics needed, so we left it out and never revisited it.

@simonpcouch

This comment was marked as off-topic.

@simonpcouch
Copy link
Collaborator

Some more recent thoughts on this:

  • Agreed that we ought to check for unused dots!
  • Some tidiers do make use of the dots, so unfortunately we can't just drop the dots check into the generics. Sounds like that may also result in revdep breakages from generics itself.
  • Some other packages in the tidymodels just warn rather than erroring in this situation, which would be easier on reverse dependencies.

:)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants