-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distinct rearranges columns separately when used with objects of class Period #2568
Comments
Even more minimal reprex for me: df <- tibble(
x = lubridate::hm("10:30", "10:30", "0:0"),
y = c("apple", "apple", "tomato")
)
df
distinct(df)
# Different problem with group_by
df %>% group_by(x) %>% summarise(n()) Probably related to hashing S4 objects, and unlikely to get fixed in this release (probably will get bundled into vctrs) |
Indeed it seems like the duration I was wondering, should |
Yes, that's why I labelled this issue with "bug". |
@krlmlr this seems to be a little different because it's not a two-table verb. It may be that something in the hybrid evaluator isn't preserving the S4 bit. |
Filtering and reordering should also be a responsibility of vctrs, I think, but this might be a separate problem indeed. |
@krlmlr I'm picking this one up |
Right, it definitely sounds like something I guess currently distinct only uses the For now, we could refuse to go forward in https://github.com/tidyverse/dplyr/blob/master/inst/include/dplyr/visitor_impl.h#L43 if the object bit is set, but that might be a bit strong. library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- tibble(
x = lubridate::hm("10:30", "10:30", "0:0"),
y = c("apple", "apple", "tomato")
)
df
#> # A tibble: 3 x 2
#> x y
#> <S4: Period> <chr>
#> 1 10H 30M 0S apple
#> 2 10H 30M 0S apple
#> 3 0S tomato
str( df$x )
#> Formal class 'Period' [package "lubridate"] with 6 slots
#> ..@ .Data : num [1:3] 0 0 0
#> ..@ year : num [1:3] 0 0 0
#> ..@ month : num [1:3] 0 0 0
#> ..@ day : num [1:3] 0 0 0
#> ..@ hour : num [1:3] 10 10 0
#> ..@ minute: num [1:3] 30 30 0 Created on 2018-03-12 by the reprex package (v0.2.0). |
This is just a workaround until `vctrs`
This is definitely a // [[Rcpp::export]]
SEXP distinct_impl(DataFrame df, const SymbolVector& vars, const SymbolVector& keep) {
if (df.size() == 0)
return df;
// No vars means ungrouped data with keep_all = TRUE.
if (vars.size() == 0)
return df;
check_valid_colnames(df);
DataFrameVisitors visitors(df, vars);
std::vector<int> indices;
VisitorSetIndexSet<DataFrameVisitors> set(visitors);
int n = df.nrows();
for (int i = 0; i < n; i++) {
if (set.insert(i).second) {
indices.push_back(i);
}
}
return DataFrameSubsetVisitors(df, keep).subset(indices, get_class(df));
} it means we need two visitor classes:
So until we have a |
This is just a workaround until `vctrs`
- Dedicated error message when trying to use columns of the `Interval` or `Period` classes (#2568).
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
This bug happens when working with
lubridate
+dplyr
. If a data.frame contains some period object of length0s
, and also some other non-date column,distinct()
actually rearranges the rows in each column, creating erroneous data.Session info
The text was updated successfully, but these errors were encountered: