Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique operator #97

Closed
hadley opened this issue Oct 25, 2013 · 10 comments
Closed

Unique operator #97

hadley opened this issue Oct 25, 2013 · 10 comments
Assignees
Labels
Milestone

Comments

@hadley
Copy link
Member

@hadley hadley commented Oct 25, 2013

Or similar - would translate to DISTINICT in SQL.

@hadley hadley added this to the 0.3 milestone Mar 17, 2014
@hadley hadley removed this from the v0.2 milestone Mar 17, 2014
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 2, 2014

Perhaps distinct so that we don't have more problems with the namespace police. Should not be too much trouble to use the visitors classes we already have to get unique rows more efficiently than what R's unique does.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 2, 2014

Alright, R's unique pastes all columns together and then uses duplicated on that silly character vector.

> unique.data.frame
function (x, incomparables = FALSE, fromLast = FALSE, ...)
{
    if (!identical(incomparables, FALSE))
        .NotYetUsed("incomparables != FALSE")
    x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE]
}
<bytecode: 0x105a2a188>
<environment: namespace:base>
> duplicated.data.frame
function (x, incomparables = FALSE, fromLast = FALSE, ...)
{
    if (!identical(incomparables, FALSE))
        .NotYetUsed("incomparables != FALSE")
    if (length(x) != 1L)
        duplicated(do.call("paste", c(x, sep = "\r")), fromLast = fromLast)
    else duplicated(x[[1L]], fromLast = fromLast, ...)
}

@hadley
Copy link
Member Author

@hadley hadley commented Apr 2, 2014

I like distinct! Another option would be to use union which I've already made generic. We also need intersect and setdiff for data frames to cover the remaining options.

romainfrancois added a commit that referenced this issue Apr 2, 2014
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 2, 2014

I pushed some initial code.

> distinct <- distinct_impl
> mtcars %.% select(cyl) %.% distinct()
  cyl
1   6
2   4
3   8

Should the data be ordered at the end ? At the moment, it is in the order of appearance in the original data.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 3, 2014

Turns out, we already have union, intersect and setdiff, match:
https://github.com/hadley/dplyr/blob/master/src/dplyr.cpp#L806

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 3, 2014

@hadley hadley self-assigned this Jul 28, 2014
@hadley
Copy link
Member Author

@hadley hadley commented Jul 28, 2014

Looks good. I'll add the generic and methods for data frames, data table and sql.

@hadley
Copy link
Member Author

@hadley hadley commented Jul 28, 2014

@romainfrancois could you add a second argument to distinct_impl, a vector of column names/indices to use? Convention is to take first row if multiple matches.

@hadley
Copy link
Member Author

@hadley hadley commented Jul 28, 2014

I've implemented the generic and basic methods. Let me know when you've added this feature and I'll add the docs & some tests.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Sep 10, 2014

Done, I get this now:

> df <- data.frame( x = rep(1:4, each = 4), y = rep(1:8, each  =  2), z=  1:16)
> distinct_impl( df, c("x", "y" ) )
  x y
1 1 1
2 1 2
3 2 3
4 2 4
5 3 5
6 3 6
7 4 7
8 4 8

@hadley hadley assigned hadley and unassigned romainfrancois Sep 11, 2014
@hadley hadley closed this in 9f17598 Sep 25, 2014
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants