Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr::distinct appears to consider empty rows different #2954

Closed
JohnMount opened this issue Jul 10, 2017 · 5 comments
Closed

dplyr::distinct appears to consider empty rows different #2954

JohnMount opened this issue Jul 10, 2017 · 5 comments
Assignees
Labels

Comments

@JohnMount
Copy link

@JohnMount JohnMount commented Jul 10, 2017

dplyr::distinct() appears to consider empty rows different. Notice in the example below dplyr::distinct() returns a 2 row data frame where both rows are identical. This is a corner-case where there are no columns, but I think in this case dplyr::distinct() should not return more than 1 row in this case. Notice adding a column in the example then decreases the number of rows considered distinct.

suppressPackageStartupMessages(library("dplyr"))
packageVersion("dplyr")
#> [1] '0.7.1.9000'

d <- data.frame(x= c(1, 1))

d0 <- select(d, one_of(character(0)))
dD <- distinct(d0)
print(dD)
#> data frame with 0 columns and 2 rows

d2 <- mutate(dD, newCol = 1)
print(d2)
#>   newCol
#> 1      1
#> 2      1

distinct(d2)
#>   newCol
#> 1      1
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Jul 12, 2017

Thanks. Honestly, I'd rather throw an error here, distinct() on a data frame without columns smells like division by zero to me. @hadley?

@JohnMount
Copy link
Author

@JohnMount JohnMount commented Jul 12, 2017

Error out is not a bad idea. I consider this enough of a dangerous corner case that I already re-coded my application check for this situation and not call distinct() in this situation.

@hadley
Copy link
Member

@hadley hadley commented Aug 23, 2017

I think if there are no columns, we could just special case distinct() to return the first row.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 11, 2018

just returning 1 row feels less surprising than error. I'll do that.

@krlmlr krlmlr closed this in 366585b May 2, 2018
krlmlr added a commit that referenced this issue May 2, 2018
- `distinct()` respects the order of the variables provided (#3195, @foo-bar-baz-qux).

- Special case when the input data to `distinct()` has 0 rows and 0 columns (#2954).
@lock
Copy link

@lock lock bot commented Oct 29, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Oct 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants