New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
union the exactly same two copies gives different number of rows #3238
Comments
I often think that data.frame with rownames does not play well with library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# with rownames
intersect(iris, iris) %>% nrow()
#> [1] 149
union(iris, iris) %>% nrow()
#> [1] 149
setequal(union(iris,iris),iris)
#> FALSE: Different number of rows
# Without rownames
iris <- tibble::rownames_to_column(iris)
intersect(iris, iris) %>% nrow()
#> [1] 150
union(iris, iris) %>% nrow()
#> [1] 150
setequal(union(iris,iris),iris)
#> TRUE Created on 2017-12-02 by the reprex package (v0.1.1.9000). I think it is a good practice to not have rownames when working with
However, I am not sure why this behaviour with Hope It helps. |
I think some of this is that one of |
Thank you all for the insights. How about issuing a warning message then when having a input with rownames? |
Just a follow-up:
Seems rownames() <- NULL would not have an effect here |
There is a duplicate row and library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
iris %>% group_by_at(., colnames(.)) %>% summarize(n = n()) %>% filter(n > 1)
#> # A tibble: 1 x 6
#> # Groups: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width [1]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species n
#> <dbl> <dbl> <dbl> <dbl> <fctr> <int>
#> 1 5.8 2.7 5.1 1.9 virginica 2 Please try library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
union_all(iris, iris) %>% nrow()
#> [1] 300 |
Thanks. Looks like |
I can help with this, what is the best way to proceed? Add the clarification at the beginning of the docs, add one example, both? |
Clarification and example would be great. Does this affect |
@edublancas, are you still interested in submitting a PR for this? If so, awesome, and, if not, that's totally fine, too. If you wouldn't mind letting us know when you get a chance (or if you have any questions), that'd be great so I know whether to add to my TODO list! 👍 |
@batpigandme Yes, I can work on this. I can probably find some time to do it during this week. Will post updates here. |
Submitted PR (#3474), let me know what you think. |
* Improved documentation for set operations (#3238, @edublancas).
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
tested with latest tidyverse
setequal(union(iris,iris),iris)
union(iris,iris) (149) has different number of rows from iris (150)! How can this be?
The text was updated successfully, but these errors were encountered: