New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make mutate use collecter h #2487

Merged
merged 6 commits into from Mar 5, 2017

Conversation

Projects
None yet
3 participants
@zeehio
Copy link
Contributor

zeehio commented Mar 2, 2017

This PR is a continuation to #2486 and closes #1892.

mutate(col2 = fun(col1)) on a grouped data frame calls fun once per group.

It used to require that fun returns the exact same type on each call. That is not desirable in functions that may return different (but compatible) types, such as integer and numeric.

This PR changes that behavior, so the returned vectors from each of the fun calls are combined using the same coercion rules than combine and bind_rows, defined in Collecter.h.

Comments are welcome. Feel free to be picky, so I can improve a bit my C++ and Rcpp skills. 馃槂

zeehio added some commits Mar 1, 2017

Add difftime support to Collecter.h
We want to support difftime in bind_rows and combine.

We are already supporting mutate and I'm preparing a PR
to make mutate use Collecter.h as well.
@krlmlr
Copy link
Member

krlmlr left a comment

Thanks for looking into these issues! I have a few comments and questions.

@@ -457,3 +457,10 @@ test_that("bind_rows rejects data frame columns (#2015)", {
fixed = TRUE
)
})

test_that("bind_rows accepts difftime objects", {

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

Test case for "hms":

  df1 <- data.frame(x = hms::hms(hours = 1))
  df2 <- data.frame(x = as.difftime(1, units = "mins"))
  res <- bind_rows(df1, df2)
  expect_equal(res$x, hms::hms(hours = 1, minutes = 1))
}

inline SEXP get() {
set_class(Parent::data, "difftime");

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

There is also the "hms" class that inherits from "difftime", I guess we'd like to return "hms" objects if the first object is of that class.

double factor_data = time_conversion_factor(units);
if (factor_data != 1.0) {
for (int i=0; i<Parent::data.size(); i++) {
Parent::data[i] = factor_data*Parent::data[i];

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

Do we need to copy the data before in-place modification?

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

So, if a copy has been made, I'm missing it ;-)

units = wrap("secs");
double factor_v = time_conversion_factor(v_units);
NumericVector v_sec(v);
double* v_sec_ptr = v_sec.begin();

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

This looks unsafe, better use an iterator and advance that (in addition to i).

// then collect the data:
Parent::collect(index, v);
} else {
// We already units, is the new vector with the same units?

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

Please check grammar.

}
if (units.isNULL()) {
// if current unit is NULL, grab the new one
units = v_units;

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

Can we simply reject "difftime" objects that don't have a "units" attribute?

@@ -25,12 +25,13 @@ namespace dplyr {
static bool is_class_known(SEXP x) {
/* C++11 (need initializer lists)
static std::set<std::string> known_classes {
"POSIXct", "factor", "Date", "AsIs", "integer64", "table"
"difftime", "POSIXct", "factor", "Date", "AsIs", "integer64", "table"

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

We shouldn't need to carry around code in comments. The old-style code also works in C++11, we'll rewrite that when we are on C++11 and when we touch the code again.

if (first_non_na < gdf.ngroups())
grab(first, indices);
copy_most_attributes(data, first);
class Gatherer {

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

Could you please indent like the original, to make review simpler?

@krlmlr

This comment has been minimized.

Copy link
Member

krlmlr commented Mar 2, 2017

@zeehio: If you address the feedback here and backport to #2486, we can continue the discussion there.

Make mutate use collecter.h. Closes #1892
`mutate(col2 = fun(col1))` on a grouped data frame calls `fun` once per group.

It used to require that `fun` returns the exact same type and that was not desirable in functions that may return different (but compatible) types, such as integer and numeric.

This PR changes that behaviour, so the returned vectors from each of the `fun` calls are combined using the same coercion rules than `combine` and `bind_rows`, defined in `Collecter.h`.

@zeehio zeehio force-pushed the zeehio:make_mutate_use_collecter_h branch from 1f1b494 to 36319d8 Mar 2, 2017

@zeehio zeehio changed the title Make mutate use collecter h [WIP] Make mutate use collecter h Mar 2, 2017

@zeehio zeehio force-pushed the zeehio:make_mutate_use_collecter_h branch from 9b0fca5 to dad85f8 Mar 2, 2017

@zeehio zeehio force-pushed the zeehio:make_mutate_use_collecter_h branch from dad85f8 to 1231ced Mar 2, 2017

@zeehio zeehio changed the title [WIP] Make mutate use collecter h Make mutate use collecter h Mar 2, 2017

@zeehio

This comment has been minimized.

Copy link
Contributor Author

zeehio commented Mar 2, 2017

@krlmlr I have taken care of all the issues you mentioned. Apologies for the grammar error and for all the indentation issues, thanks for all the comments and suggestions 馃憤

double factor_data = time_conversion_factor(units);
if (factor_data != 1.0) {
for (int i=0; i<Parent::data.size(); i++) {
Parent::data[i] = factor_data*Parent::data[i];

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

So, if a copy has been made, I'm missing it ;-)

for (int i=0; i<index.size(); i++) {
Parent::data[index[i]] = factor_v * REAL(v)[i];
}
} else if (TYPEOF(v) == INTSXP) {

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

How do you create a difftime of mode integer?

This comment has been minimized.

@zeehio

zeehio Mar 2, 2017

Author Contributor

I can only think of structure(4L, units="secs", class = "difftime") and I agree it is forcing it. I can drop the else if if you prefer.

double time_conversion_factor(RObject v_units) {
// Acceptable units based on r-source/src/library/base/R/datetime.R
double factor = 1;
std::string v_units_c = Rcpp::as<std::string>(v_units);

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

I wounder if we could use a map (as a static variable in a function) that allows lookup by SEXP, both here and in has_valid_time_unit().

This comment has been minimized.

@zeehio

zeehio Mar 3, 2017

Author Contributor

In the last commit I have used an std::string as key. If that is not good enough I will try to fix it tomorrow.

}

private:
RObject units;

This comment has been minimized.

@krlmlr

krlmlr Mar 2, 2017

Member

Can you please move data members to the bottom?

}
}

double time_conversion_factor(SEXP v_units) {

This comment has been minimized.

@hadley

hadley Mar 3, 2017

Member

This feels overworked to me now - I preferred the previous version with explicit if statements. Might be better to make the argument std::string()

}

void collect_difftime(const SlicingIndex& index, SEXP v) {
RObject v_units(Rf_getAttrib(v, Rf_install("units")));

This comment has been minimized.

@hadley

hadley Mar 3, 2017

Member

Can't you do v.attr("units")? I'm pretty sure there's a C++ api here.

void collect_difftime(const SlicingIndex& index, SEXP v) {
RObject v_units(Rf_getAttrib(v, Rf_install("units")));
if (v_units.isNULL()) {
stop("Can't collect difftime without units");

This comment has been minimized.

@hadley

hadley Mar 3, 2017

Member

I think here, and for the non REALSXP case below, you can simply do stop("Invalid difftime object").

}
}

RObject units;

This comment has been minimized.

@hadley

hadley Mar 3, 2017

Member

Would all this code be simpler if units was a std::string?

@zeehio

This comment has been minimized.

Copy link
Contributor Author

zeehio commented Mar 4, 2017

I took care of the comments, feel free to do another review if you want

@hadley hadley merged commit 996318b into tidyverse:master Mar 5, 2017

4 checks passed

codecov/patch 85.18% of diff hit (target 77.23%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +7.94% compared to 8430adc
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@hadley

This comment has been minimized.

Copy link
Member

hadley commented Mar 5, 2017

Looks good, thanks!

@zeehio zeehio deleted the zeehio:make_mutate_use_collecter_h branch Mar 5, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Jan 18, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 18, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.