Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pivot_wider error messages more useful, like spread errors #1113

Closed
hrecht opened this issue Mar 31, 2021 · 5 comments · Fixed by #1267
Closed

Make pivot_wider error messages more useful, like spread errors #1113

hrecht opened this issue Mar 31, 2021 · 5 comments · Fixed by #1267
Labels
feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes"

Comments

@hrecht
Copy link

hrecht commented Mar 31, 2021

Hello, I have found that the pivot_ family error messages are much less helpful than spread and gather error messages. For example, here's an error message from the exact same dataset that had a few rows with repeated values, using pivot_wider vs spread.

The spread error message tells me exactly where the issue is. The pivot_wider one is very difficult to understand and doesn't point me to the error. When I added values_fn = length to the pivot call I got another error, Error: 1 components of ...were not used. We detected these problematic arguments: *vaues_fn``
Could you please consider incorporating some of the old context?

pivot_wider:

Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates 

spread:

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 6 rows:
* 427, 428
* 540, 541
* 800, 802

(edited to add text for error messages)

@hadley
Copy link
Member

hadley commented Apr 22, 2021

From my experience, most people found that error message cryptic; I don't think we want to recreate it.

@hrecht
Copy link
Author

hrecht commented Apr 22, 2021

Understood. Could the new error messages be changed to something new then, ideally pointing out where the issues are? I find the current message very confusing. I've tried this suggestion - Use values_fn = length to identify where the duplicates arise - but that always gives another error. Perhaps it needs more description of how to use that argument.

@hadley
Copy link
Member

hadley commented Apr 22, 2021

Can you please provide a reprex?

@hrecht
Copy link
Author

hrecht commented Apr 22, 2021

Here's an example. I couldn't quickly reproduce the length error on this dataset. But the output with values_fn = length working doesn't really show where the duplicate is. I'm not sure how that resulting tibble is helpful in identifying duplicates, especially if the dataset had thousands of rows.

library(tidyverse)

# Add a duplicate row
row_dupe <- fish_encounters %>% filter(row_number() == 20)
fish_dupe <- bind_rows(fish_encounters, row_dupe)

temp <- fish_dupe %>%
    pivot_wider(names_from = station, values_from = seen)
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates

# Trying to follow suggestion in error message
fish_dupe %>%
    pivot_wider(names_from = station, values_from = seen, values_fn = length)
#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE   MAW
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     2     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1     1
#>  4 4845        1     1      1     1       1    NA    NA    NA    NA    NA    NA
#>  5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA
#>  6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA    NA
#>  7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
#>  8 4850        1     1     NA     1       1     1     1    NA    NA    NA    NA
#>  9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
#> 10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
#> 11 4855        1     1      1     1       1    NA    NA    NA    NA    NA    NA
#> 12 4857        1     1      1     1       1     1     1     1     1    NA    NA
#> 13 4858        1     1      1     1       1     1     1     1     1     1     1
#> 14 4859        1     1      1     1       1    NA    NA    NA    NA    NA    NA
#> 15 4861        1     1      1     1       1     1     1     1     1     1     1
#> 16 4862        1     1      1     1       1     1     1     1     1    NA    NA
#> 17 4863        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
#> 18 4864        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
#> 19 4865        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA

Created on 2021-04-22 by the reprex package (v2.0.0)

@hadley
Copy link
Member

hadley commented Apr 22, 2021

Hmmmm, you're looking for the cell with a 2 in it, but obviously that might be hard to find. What I would do in this case is:

library(tidyverse)

row_dupe <- fish_encounters %>% filter(row_number() == 20)
fish_dupe <- bind_rows(fish_encounters, row_dupe)

fish_dupe %>%
  count(fish, station, seen) %>% 
  filter(n > 1)
#> # A tibble: 1 x 4
#>   fish  station  seen     n
#>   <fct> <fct>   <int> <int>
#> 1 4843  BCW2        1     2

Created on 2021-04-22 by the reprex package (v2.0.0)

So maybe the error message could suggest that more directly.

@hadley hadley added feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes" labels Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes"
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants