-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make mutate use collecter h #2487
Conversation
We want to support difftime in bind_rows and combine. We are already supporting mutate and I'm preparing a PR to make mutate use Collecter.h as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into these issues! I have a few comments and questions.
@@ -457,3 +457,10 @@ test_that("bind_rows rejects data frame columns (#2015)", { | |||
fixed = TRUE | |||
) | |||
}) | |||
|
|||
test_that("bind_rows accepts difftime objects", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test case for "hms"
:
df1 <- data.frame(x = hms::hms(hours = 1))
df2 <- data.frame(x = as.difftime(1, units = "mins"))
res <- bind_rows(df1, df2)
expect_equal(res$x, hms::hms(hours = 1, minutes = 1))
inst/include/dplyr/Collecter.h
Outdated
} | ||
|
||
inline SEXP get() { | ||
set_class(Parent::data, "difftime"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the "hms"
class that inherits from "difftime"
, I guess we'd like to return "hms"
objects if the first object is of that class.
double factor_data = time_conversion_factor(units); | ||
if (factor_data != 1.0) { | ||
for (int i=0; i<Parent::data.size(); i++) { | ||
Parent::data[i] = factor_data*Parent::data[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to copy the data before in-place modification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if a copy has been made, I'm missing it ;-)
inst/include/dplyr/Collecter.h
Outdated
units = wrap("secs"); | ||
double factor_v = time_conversion_factor(v_units); | ||
NumericVector v_sec(v); | ||
double* v_sec_ptr = v_sec.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unsafe, better use an iterator and advance that (in addition to i
).
inst/include/dplyr/Collecter.h
Outdated
// then collect the data: | ||
Parent::collect(index, v); | ||
} else { | ||
// We already units, is the new vector with the same units? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check grammar.
inst/include/dplyr/Collecter.h
Outdated
} | ||
if (units.isNULL()) { | ||
// if current unit is NULL, grab the new one | ||
units = v_units; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply reject "difftime"
objects that don't have a "units"
attribute?
inst/include/dplyr/Collecter.h
Outdated
@@ -25,12 +25,13 @@ namespace dplyr { | |||
static bool is_class_known(SEXP x) { | |||
/* C++11 (need initializer lists) | |||
static std::set<std::string> known_classes { | |||
"POSIXct", "factor", "Date", "AsIs", "integer64", "table" | |||
"difftime", "POSIXct", "factor", "Date", "AsIs", "integer64", "table" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to carry around code in comments. The old-style code also works in C++11, we'll rewrite that when we are on C++11 and when we touch the code again.
inst/include/dplyr/Gatherer.h
Outdated
if (first_non_na < gdf.ngroups()) | ||
grab(first, indices); | ||
copy_most_attributes(data, first); | ||
class Gatherer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please indent like the original, to make review simpler?
`mutate(col2 = fun(col1))` on a grouped data frame calls `fun` once per group. It used to require that `fun` returns the exact same type and that was not desirable in functions that may return different (but compatible) types, such as integer and numeric. This PR changes that behaviour, so the returned vectors from each of the `fun` calls are combined using the same coercion rules than `combine` and `bind_rows`, defined in `Collecter.h`.
1f1b494
to
36319d8
Compare
9b0fca5
to
dad85f8
Compare
dad85f8
to
1231ced
Compare
@krlmlr I have taken care of all the issues you mentioned. Apologies for the grammar error and for all the indentation issues, thanks for all the comments and suggestions 👍 |
double factor_data = time_conversion_factor(units); | ||
if (factor_data != 1.0) { | ||
for (int i=0; i<Parent::data.size(); i++) { | ||
Parent::data[i] = factor_data*Parent::data[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if a copy has been made, I'm missing it ;-)
inst/include/dplyr/Collecter.h
Outdated
for (int i=0; i<index.size(); i++) { | ||
Parent::data[index[i]] = factor_v * REAL(v)[i]; | ||
} | ||
} else if (TYPEOF(v) == INTSXP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you create a difftime of mode integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only think of structure(4L, units="secs", class = "difftime")
and I agree it is forcing it. I can drop the else if
if you prefer.
inst/include/dplyr/Collecter.h
Outdated
double time_conversion_factor(RObject v_units) { | ||
// Acceptable units based on r-source/src/library/base/R/datetime.R | ||
double factor = 1; | ||
std::string v_units_c = Rcpp::as<std::string>(v_units); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wounder if we could use a map (as a static variable in a function) that allows lookup by SEXP, both here and in has_valid_time_unit().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the last commit I have used an std::string
as key. If that is not good enough I will try to fix it tomorrow.
inst/include/dplyr/Collecter.h
Outdated
} | ||
|
||
private: | ||
RObject units; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please move data members to the bottom?
inst/include/dplyr/Collecter.h
Outdated
} | ||
} | ||
|
||
double time_conversion_factor(SEXP v_units) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels overworked to me now - I preferred the previous version with explicit if statements. Might be better to make the argument std::string()
inst/include/dplyr/Collecter.h
Outdated
} | ||
|
||
void collect_difftime(const SlicingIndex& index, SEXP v) { | ||
RObject v_units(Rf_getAttrib(v, Rf_install("units"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you do v.attr("units")
? I'm pretty sure there's a C++ api here.
inst/include/dplyr/Collecter.h
Outdated
void collect_difftime(const SlicingIndex& index, SEXP v) { | ||
RObject v_units(Rf_getAttrib(v, Rf_install("units"))); | ||
if (v_units.isNULL()) { | ||
stop("Can't collect difftime without units"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here, and for the non REALSXP case below, you can simply do stop("Invalid difftime object")
.
inst/include/dplyr/Collecter.h
Outdated
} | ||
} | ||
|
||
RObject units; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would all this code be simpler if units
was a std::string
?
I took care of the comments, feel free to do another review if you want |
Looks good, thanks! |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
This PR is a continuation to #2486 and closes #1892.
mutate(col2 = fun(col1))
on a grouped data frame callsfun
once per group.It used to require that
fun
returns the exact same type on each call. That is not desirable in functions that may return different (but compatible) types, such as integer and numeric.This PR changes that behavior, so the returned vectors from each of the
fun
calls are combined using the same coercion rules thancombine
andbind_rows
, defined inCollecter.h
.Comments are welcome. Feel free to be picky, so I can improve a bit my C++ and Rcpp skills. 😃