Skip to content

Unenhanced warnings are emitted from mutate() when run in a reprex() #6005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DavisVaughan opened this issue Sep 7, 2021 · 4 comments · Fixed by #6443
Closed

Unenhanced warnings are emitted from mutate() when run in a reprex() #6005

DavisVaughan opened this issue Sep 7, 2021 · 4 comments · Fixed by #6443
Assignees
Labels
columns ↔️ Operations on columns: mutate(), select(), rename(), relocate() feature a feature request or enhancement rlang 🔬
Milestone

Comments

@DavisVaughan
Copy link
Member

When run with reprex::reprex():

library(dplyr)

foo <- function() {
  warning("foo")
  NA
}

df <- tibble(x = 1)

df <- mutate(df, x = foo())
#> Warning in foo(): foo

vs when run interactively:

df <- mutate(df, x = foo())
#> Warning message:
#> Problem with `mutate()` column `x`.
#> ℹ `x = foo()`.
#> ℹ foo 

I think this is related to the changes from #5686, in particular it is about the maybe_restart() bit here that occurs when the warning is muffled

dplyr/R/mutate.R

Lines 427 to 446 in 87d1cec

warning = function(w) {
# Check if there is an upstack calling handler that would muffle
# the warning. This avoids doing the expensive work below for a
# silenced warning (#5675).
if (check_muffled_warning(w)) {
maybe_restart("muffleWarning")
}
local_call_step(dots = dots, .index = i, .fn = "mutate")
warn(c(
cnd_bullet_header(),
i = cnd_bullet_column_info(),
i = conditionMessage(w),
i = cnd_bullet_cur_group_label(what = "warning")
))
# Cancel `w`
maybe_restart("muffleWarning")
})

I guess in the stack of reprex(), someone is muffling the warnings as it captures them to be able to print them out later on?

Since it was slow to throw the enhanced warnings, maybe this is the best we can do, but it wasn't mentioned in #5686 as a potential issue so I figured I'd bring it up in case @lionel- has any thoughts here

DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Sep 7, 2021
There shouldn't be any performance issues from doing this, unlike tidyverse#5686 where the warning was rethrown per group, which could be slow.

Removing this allows the enhanced warnings to be shown in `reprex()`s, see tidyverse#6005.
@lionel-
Copy link
Member

lionel- commented Sep 8, 2021

hmmm, it might be possible to create a structured warning where the conditionMessage() method communicates with the calling handler. If the instrumenting handler is alive when the method is called, we'd call it back to return a full warning. I'm happy to work on this if you like (or assist if you want to implement).

@DavisVaughan
Copy link
Member Author

If you have the time I'd appreciate if you could take the lead on this because I only understood about half of that 😬

DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Sep 8, 2021
There shouldn't be any performance issues from doing this, unlike tidyverse#5686 where the warning was rethrown per group, which could be slow.

Removing this allows the enhanced warnings to be shown in `reprex()`s, see tidyverse#6005.
@hadley hadley added feature a feature request or enhancement verbs 🏃‍♀️ labels Sep 16, 2021
@hadley
Copy link
Member

hadley commented Sep 16, 2021

@lionel- can you take a look at this when you start integrating the new rlang error stuff into dplyr?

@lionel- lionel- self-assigned this Sep 20, 2021
@romainfrancois romainfrancois added columns ↔️ Operations on columns: mutate(), select(), rename(), relocate() and removed verbs 🏃‍♀️ labels Oct 1, 2021
DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Oct 6, 2021
There shouldn't be any performance issues from doing this, unlike tidyverse#5686 where the warning was rethrown per group, which could be slow.

Removing this allows the enhanced warnings to be shown in `reprex()`s, see tidyverse#6005.
@lionel-
Copy link
Member

lionel- commented Nov 29, 2021

I implemented a working strategy to support capturing+muffling handlers without incurring message generation costs on muffling-only handlers:

modified   R/mutate.R
@@ -443,23 +443,40 @@ mutate_cols <- function(.data, dots, caller_env, error_call = caller_env()) {
       call = error_call
     )
   },
-  warning = function(w) {
+  warning = function(orig) {
+    stale <- FALSE
+    on.exit(stale <- TRUE)
+
+    header <- function(cnd, ...) {
+      if (stale) {
+        return(conditionMessage(orig))
+      }
+      local_error_context(dots = dots, .index = i, mask = mask)
+      cnd_bullet_header("computing")
+    }
+    body <- function(cnd, ...) {
+      if (stale) {
+        return(NULL)
+      }
+      local_error_context(dots = dots, .index = i, mask = mask)
+      c(
+        i = cnd_header(orig),
+        i = cnd_bullet_cur_group_label(what = "warning")
+      )
+    }
+
+    rich <- warning_cnd(header = header, body = body)
+
     # Check if there is an upstack calling handler that would muffle
     # the warning. This avoids doing the expensive work below for a
     # silenced warning (#5675).
-    if (check_muffled_warning(w)) {
+    if (check_muffled_warning(rich)) {
       maybe_restart("muffleWarning")
     }
 
-    local_error_context(dots = dots, .index = i, mask = mask)
-
-    warn(c(
-      cnd_bullet_header("computing"),
-      i = cnd_header(w),
-      i = cnd_bullet_cur_group_label(what = "warning")
-    ))
+    cnd_signal(rich)
 
-    # Cancel `w`
+    # Cancel `orig`
     maybe_restart("muffleWarning")
   })

This uses the newly implemented support for storing header and footer fields in condition objects (r-lib/rlang@5aefc4a7).

However this doesn't help with rmarkdown/knitr because they collect warnings and generate the message at the end of the chunk evaluation. At this point, the state necessary to generate the enriched message is gone.

So it seems like there is a conflict between our optimisation strategy for suppressWarnings() and the way knitr collects messages.

DavisVaughan added a commit that referenced this issue May 9, 2022
* Hack in `vec_matches()` for testing

* Add prototype of `join_by()`

* Support `join_by()` with varying conditions in all joins

* Add remote on the `vec_matches()` vctrs PR

* Add support for controlling multiple matches

* Support the `filter` option of `vec_matches()`

* Let `vec_matches()` be responsible for dropping `x` specific rows

* Use `remaining` to avoid a call to `vec_in()`

* First pass of `join_by()` documentation

* Add `new_join_by()` constructor

* Implement `keep = NULL`

This defaults to `keep = FALSE` on equi conditions, but `keep = TRUE` on non-equi conditions, as this is generally what you want. The information in columns involved in a non-equi condition rarely overlaps, so you almost never want to drop the keys of the RHS.

* Revise `join_by()` examples section

* Let `vec_matches()` drop unmatched `needles` in `semi_join()`

* Clean up function signature styling

* Implement `complete` for forcing unmatched rows to error

* Implement `unique` for enforcing uniqueness in `x` or `y`

* Abstract out `dplyr_matches()`

* Give `nest_join()` `multiple`, `complete` and `unique` arguments

* Remove `check_multiple()` since `multiple` passes straight through

* Use `dplyr_matches()` in `join_filter()`

* Rename `unique` to `check_duplicates`

* Rename `complete` to `check_unmatched`.

This is much less ambiguous, and parallels nicely to `check_duplicates`.

* Take a pass at updating the documentation

* Implement `as_join_by()` and `join_by_common()`

These replace `standardise_join_by()` and separate logic and error handling in a cleaner way

* Push `anti_join()` and `semi_join()` through `join_rows()`

* Push `nest_join()` through `join_rows()`

* Second pass through `join_by()` documentation

* Implement `na_matches = "error"`

If you want to catch missing keys in `x` as an issue, this is the only way to do it. Missing keys in `y` can be caught by setting `check_unmatched = "y"`, since they will be treated as unmatched values.

* NEWS bullet

* Don't add `()` to binary operators

* Add `join_by()` tests

* Test join column generation with non-equi conditions

* Catch `multiple` related error and warning

Copying `mutate_cols()` for the catching of warnings

* Add tests to `join_rows()`

* Test that empty `join_by()` generates a cross join

* No longer need `is_join_by()`

* Add tests to `test-join.r`

* Add a note about the column name restriction of `join_by()`

* Only `setdiff()` away the equi key columns

Non-equi key columns should be considered in the duplicate check when `keep = NULL`, since both sides are kept

* Report the names of the duplicated columns, not the positions

The positions might be different from what the user supplied, since we rearrange them beforehand when we drop any duplicated non-equi column names

* Allow explicit referencing with `x$` and `y$` and add join helpers

There are three helpers: `between()`, `within()`, and `overlaps()`. If people need to tweak them at all, then they should change to constructing them from the binary conditions directly.

* Remove `na_matches = "error"` support

We now think this will be part of `enforce(key_validate(a, b, c))`, which will check for missing values in the keys and for uniqueness.

Additionally, this again limits `na_matches` to being strictly about the join process. `"error"` could be validated independently from the join, but the other options can't be.

* Remove `check_duplicates`

This will also be handled by something like `enforce(key_validate())`.

Again, note that this is a check that can be handled independently of the join process, so removing it seems logical.

* Replace `check_unmatched` with `unmatched`

Further analysis shows that `check_unmatched` is really only useful in a few scenarios. In particular, it is only helpful to avoid the surprising case of dropping rows, which can only occur in a few of the join types.

* Always throw enhanced warning from `dplyr_matches()`

There shouldn't be any performance issues from doing this, unlike #5686 where the warning was rethrown per group, which could be slow.

Removing this allows the enhanced warnings to be shown in `reprex()`s, see #6005.

* Change the default of `multiple` to `NULL`

This uses `"warning"` for equi and rolling joins, and `"all"` for cross joins and when there are any non equi join conditions present

* Class join specific errors and warnings

* Update tests affected by change to `multiple` default

* Use `.named = NULL` now that r-lib/rlang#1223 is fixed

* Update snapshots for dev vctrs

* Add `join_by()` to `_pkgdown.yml`

* Adjust position of the `multiple` NEWS bullet

* Add a non-equi join example to the main join page

* Expand on `keep = NULL` behavior

* Mention each type of join that is possible through `join_by()`

* Various `join_by()` doc updates from code review

* Mention data.table in NEWS bullet

* Tweak `join_by()` named expression error

* Use a switch statement to clean up control flow

* Tweak error message

* Further explain what "supplying a single name" does

* Use `needles_arg` and `haystack_arg` to improve error message generation

* Streamline if/else conditions in `standardise_` helpers

* Mention the multiple matches warning in the join description

* Various doc changes to main join help page based on feedback

* Move dots of `nest_join()` to right after `name = NULL`

So we can at least force the new arguments to be named

* Add `na_matches` to `nest_join()`

* NEWS bullet for `na_matches` in `nest_join()`

* Tweak `keep = NULL` documentation

* Adapt to removal of `missing = "propagate"` in `vec_matches()`

* Update snapshot tests for latest rlang

* Update for latest vctrs. `missing` -> `incomplete`

* Update to new vctrs PR

* Mix in a partial evaluation approach to match arguments by name

* Remove `max()` and `min()` in favor of `preceding()` and `following()`

* Use new faster `multiple = "any"` argument for semi/anti join

This is generally faster than first/last, sometimes by a lot (i.e. in the case where there would generally be a ton of matches)

* Update remotes and version requirements for vctrs and rlang

* Update join related snapshot tests

* Update `incomplete` from `"match"` to `"compare"`

And add a test for this behavior

* Handle cross joins on the dplyr side

`condition = NULL` is no longer allowed on the vctrs side, because we can easily emulate a cross join by generating two vectors of `1L`s to match with, and this makes the vctrs implementation much simpler.

* Use latest CRAN versions of rlang and vctrs

* Use uppercase `J`

* Be overly defensive against typos

* `by` is a required argument for `join_filter()`

* Pass the error call through `check_na_matches()`

* Pass the error call through `check_unmatched()`

* Defensively pass the `error_call` through more places in `join_mutate()`

I don't think these can error, as the common type is handled earlier, but it doesn't hurt and is good to be consistent

* Pass the error call through `join_by()`

* Pass the error call through `join_by_common()`

* Officially add support for `multiple = "any"`

* Make the `join_by()` helper documentation more user friendly

* Optimize `join_by()` docs for pkgdown and fix some typos

* Add a comment about the signatures of the `binding_*()` functions

* Upload revdep results

* Bump to 99 dev version so revdeps can depend on it
@hadley hadley added this to the 1.1.0 milestone Aug 18, 2022
lionel- added a commit that referenced this issue Sep 1, 2022
lionel- added a commit that referenced this issue Sep 1, 2022
lionel- added a commit that referenced this issue Sep 8, 2022
lionel- added a commit that referenced this issue Sep 20, 2022
lionel- added a commit that referenced this issue Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
columns ↔️ Operations on columns: mutate(), select(), rename(), relocate() feature a feature request or enhancement rlang 🔬
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants