Skip to content

Can't use 'recover' with dplyr 1.0.0. How to interactively debug? #5308

@MilesMcBain

Description

@MilesMcBain

I've noticed that recover = TRUE is now seemingly unusable with dplyr.

For example this code

library(tidyverse)

mtc <- mtcars

set.seed(1111)
rand_na <- function(vec) {
  index_na  <- rdunif(15, a = 1, b = length(vec))
  vec[index_na] <- NA
  vec
}

mtc$cyl <- rand_na(mtcars$cyl)

mtc %>%
  group_by(carb) %>%
  nest() %>%
  mutate(model = map(data, ~lm(mpg ~ cyl + hp + wt,
                               data = .x)))

Produces this stack of frames:

Enter a frame number, or 0 to exit   

 1: mtc %>% group_by(carb) %>% nest() %>% mutate(model = map(data, ~lm(mpg ~ cy
 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 3: eval(quote(`_fseq`(`_lhs`)), env, env)
 4: eval(quote(`_fseq`(`_lhs`)), env, env)
 5: `_fseq`(`_lhs`)
 6: freduce(value, `_function_list`)
 7: withVisible(function_list[[k]](value))
 8: function_list[[k]](value)
 9: mutate(., model = map(data, ~lm(mpg ~ cyl + hp + wt, data = .x)))
10: mutate.data.frame(., model = map(data, ~lm(mpg ~ cyl + hp + wt, data = .x))
11: mutate_cols(.data, ...)
12: tryCatch({
    for (i in seq_along(dots)) {
        not_named <- (is.null(d
13: tryCatchList(expr, classes, parentenv, handlers)
14: tryCatchOne(expr, names, parentenv, handlers[[1]])
15: value[[3]](cond)
16: stop_dplyr(i, dots, fn = "mutate", problem = conditionMessage(e), parent = 
17: abort(bullets, class = "dplyr_error", error_name = error_name, error_expres
18: signal_abort(cnd)

And I haven't been able to actually find the call to lm burried in there, since I seem to hit some c code in mutate_cols that breaks the chain.

Just shy of a year ago I made a video of myself debugging this error with recover. So we can look at what the stack of frames was then:

image

And we can see, I had access to lm and lm.fit, where the error is occurring.

IMHO this is a real shame as recover was a really efficient technique for jumping to the source of an error and tinkering with offending code to understand why it was breaking. In the latest dplyr we do get a handy error message:

Error: Problem with `mutate()` input `model`.
x 0 (non-NA) cases
i Input `model` is `map(data, ~lm(mpg ~ cyl + hp + wt, data = .x))`.
i The error occured in group 6: carb = 8.
Run `rlang::last_error()` to see where the error occurred.

Which points me to the group that caused the problem. But I still have to write code to go inspect that group to figure out what is going on, e.g. mtc %>% filter(carb == 8). In this case it might be enough, but in more complex cases, like the second error I debug in the video, and the case I just had today it won't be.

I'm looking at writing a lot of code, overloading other people's functions, and messing around with environments to get to a situation where I can debug an error right at the point it fails.

I had a look at rlang::last_error and it seems like it almost has all the pieces I'd need to re-run the offending function with debugOnce. So that got me thinking that maybe rlang could bring this facility back with something like rlang::debug_last_error. That might be jumping the gun though. Maybe I am just stuck in the old way of doing things.

Is there another pathway to get to interactive debugging with dplyr, like what I was doing with recover?

Metadata

Metadata

Assignees

Labels

bugan unexpected problem or unintended behaviorwipwork in progress

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions