Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #2203 (combine with list of characters with NA) #2209

Merged
merged 10 commits into from
Feb 1, 2017

Conversation

zeehio
Copy link
Contributor

@zeehio zeehio commented Oct 27, 2016

This rather small pull request:

Any feedback will be very much appreciated

@krlmlr
Copy link
Member

krlmlr commented Nov 7, 2016

LGTM. Fixes #2203. @hadley: Okay to merge?

@hadley
Copy link
Member

hadley commented Nov 7, 2016

I wonder if it would be better to be stricter and only check for logicals of length 1?

Do we need to apply the same principle for other atomic vectors, or are they already covered? Either way, it's probably worth pulling LGLSXP == TYPEOF(x) && all_na(x) out into an informatively named function.

@zeehio
Copy link
Contributor Author

zeehio commented Nov 7, 2016

Logicals with all NA of length > 1 also are meaningful to me:

dplyr::combine(list("a", "b", "c", c(NA, NA), "e")) 

Should work like:

dplyr::combine(list("a", "b", "c", c(NA_character_, NA_character_), "e"))

Returning

c("a", "b", "c", NA, NA, "e")

I will write tests for the other collecters to check (a) if this NA conversion affects them too and (b) in order to prevent regressions. As hadley suggests, I will pull LGLSXP == TYPEOF(x) && all_na(x) into its own function named all_logical_na so I can use it in other collecters if required.

@krlmlr
Copy link
Member

krlmlr commented Nov 7, 2016

> combine(list(1:3, NA))
[1]  1  2  3 NA
> combine(list(1.5:3, NA))
[1] 1.5 2.5  NA
> combine(list(complex(real = 1.5:3), NA))
Error in eval(substitute(expr), envir, enclos) : 
  Can not automatically convert from complex to logical.
> combine(list(factor(1.5:3), NA))
Error in eval(substitute(expr), envir, enclos) : 
  Can not automatically convert from factor to logical.

@zeehio
Copy link
Contributor Author

zeehio commented Nov 7, 2016

Test and fix if necessary that combine handles missing values with:

  • logical
  • integer
  • factor
  • real
  • character
  • POSIXct
  • Date
  • complex
  OK: 1609 SKIPPED: 3 FAILED: 4
  1. Error: combine works with NA and factors (#2203) (@test-combine.R#49) 
  2. Error: combine works with NA and POSIXct (#2203) (@test-combine.R#86) 
  3. Error: combine works with NA and Date (#2203) (@test-combine.R#103) 
  4. Error: combine works with NA and complex (#2203) (@test-combine.R#131) 

@hadley
Copy link
Member

hadley commented Nov 7, 2016

My main concern with checking the value for vectors > length 1 is performance since you're now scanning the value of every element. But I guess it can't actually cause a performance regression, since currently the code throws an error.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thanks. @hadley: Okay to merge after 2x splitting collect()?

STORAGE* source_ptr = Rcpp::internal::r_vector_start<RTYPE>(source);
for (int i=0; i<index.size(); i++) {
data[index[i]] = source_ptr[i];
if (all_logical_na(v)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way Collecter::collect() is split into functions with clear names.

int* source_ptr = Rcpp::internal::r_vector_start<INTSXP>(source);
for (int i=0; i<index.size(); i++) {
if (source_ptr[i] == NA_INTEGER) {
if (Rf_inherits(v, "factor") && has_same_levels_as(v)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

@zeehio
Copy link
Contributor Author

zeehio commented Nov 8, 2016

This still requires further work, some cases where there is an initial NA fail:

combine(list(NA, "hello"))
# Error cannot convert from logical to character

Similar errors happen with factors and complex.

Solutions (I need to work on them):

  • Promote logical collecter to character collecter if the logical collecter only has NA.
  • Promote logical collecter to factor collecter if the logical collecter only has NA.
  • Promote logical collecter to complex collecter. Would you like to convert FALSE to 0+0i and TRUE to 1+0i as well in this case?

@@ -9,6 +9,10 @@

namespace dplyr {

static inline bool all_logical_na(SEXP x) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow reuse is_logical_all_na() below?

@krlmlr
Copy link
Member

krlmlr commented Nov 8, 2016

I think the collecter should be strict and only mix logicals with other types (including numerics) if only NAs were seen.

I remember fixing very similar issues in DelayedProcessor.h.

@zeehio
Copy link
Contributor Author

zeehio commented Nov 8, 2016

@krlmlr,

  • I hope I understood your comments in the code review. I tried to address both splitting the collect() function and the reuse of is_logical_all_na(). If I misunderstood you, please correct me again.
  • If we change the collecter to be strict with respect to numbers then we break backwards compatibility. I will document this breaking change in the NEWS file.
  • I still have some test cases to fix, but it's time for me to go to sleep. I will try to fix this tomorrow (in approx 20 hours).

Thanks for all your feedback, comments and suggestions

@krlmlr
Copy link
Member

krlmlr commented Nov 8, 2016

@hadley: What should combine(list(1, TRUE, FALSE, 2)) do -- raise an error or return c(1, 1, 0, 2) ?

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I made some comments on the tests in order to make them a bit more compact.


test_that("combine works with NA and Date (#2203)", {
# NA first
expected_result <- as.Date(c(NA, "2010-01-01", "2010-01-02", NA, "2010-01-04"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a little easier to read if you initialised as as.Date("2010-01-01") + c(NA, 0, 1, NA, 3) etc

expected_result <- as.Date(c(NA, "2010-01-01", "2010-01-02", NA, "2010-01-04"))
works1 <- combine(list(NA, as.Date("2010-01-01"), as.Date("2010-01-02"),
as.Date(NA), as.Date("2010-01-04")))
expect_equal(works1, expected_result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be expect_equal(combine(as.list(works1)), works1)


# NA length > 1
expected_result <- c(as.Date(c("2010-01-01", "2010-01-02", NA, NA, "2010-01-04")))
works1 <- combine(list(as.Date("2010-01-01"), as.Date("2010-01-02"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you could maybe make the principle easier to understand by doing split(works1, c(1, 2, 3, 3, 4)) or similar.

expect_equal(works1, expected_result)

# NA length == 1
expected_result <- complex(real = c(1, 2, NA, 4),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c(1, 2, NA, 4) + 1i would be more compact

@hadley
Copy link
Member

hadley commented Nov 8, 2016

@krlmlr For now I'd say it should be an error because we want to be strict about coercing types (integer -> double is one exception). We will need to write up the exact principles in vctrs - currently we have somewhat behaviour spread across multiple functions (i.e. r-lib/vctrs#7).

@zeehio
Copy link
Contributor Author

zeehio commented Nov 9, 2016

  • I have improved the readability of the tests following @hadley's suggestions.
  • I have added a test case that loops through all pairs of atomic types and combines them, checking that the result or error matches the expected result. Following a "strict coercing types policy" I just allow coercing NA values to anything, integers to reals, and factors to characters.
  • I have fixed all the test cases dealing with combine. However I now have a test case failing:
# (@test-binds.R#123)
test_that("bind_rows promotes logical to integer", {
  df1 <- data_frame(a = FALSE)
  df2 <- data_frame(a = 1L)

  res <- bind_rows(df1, df2)
  expect_equal(res$a, c(0L, 1L))
})

So the question now is, do we remove the coercion from logical to integer in bind_rows or do we use different rules in each case?

In case you decide to remove the coercion I will replace the expect_equal in the test above to an expect_error.

Before merging, the NEWS entry will need to be updated. Probably tomorrow.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The logical -> integer test originates in 51ddd60, I think we can change it.

# NA first
expected_result <- factor(c(NA, "a", "c", NA, "b"), levels = c("a","b","c"))
works1 <- combine(list(NA,
factor("a", levels = c("a","b","c")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified e.g. by defining x <- factor(letters[1:3]) and then using x[[1]] etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done eaa9b0d

# Date-NA is Date (unlist would coerce to numeric)
# logicalNA_POSIXct: We add tzone = ""
# POSIXct_POSIXct (unlist would coerce to numeric)
pairs_result <- list("factor_character" = c("a", "b"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about defining another column in the data frame (of type list)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 698e750

expect_equal(works3, expected_result)
})

test_that("combine is strict combining/promoting types", {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very thorough test, I appreciate that. Can we this into a meta-code that generates explicit but simple expect_equal() or expect_error() statements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I do not know how to do that and I could not find any example using testthat to guide me.

However, when I run the test-combine.R script the expect_that functions that fail print the info argument that contains Pair: (item1, item2). This is printed when running R CMD check:

  1. Failure: combine is strict combining/promoting types (@test-combine.R#214) --
  combine(items[c(pairs$Var1[i], pairs$Var2[i])]) did not throw an error.
  Pair: ( integer , character )

So by reading this I know that coercing an integer and a character did not throw an error as expected (this is a dummy example, it actually gives an error)

I imagine that you would rather have combine(items[c("integer", "character")]) did not throw an error. as an error message, but I do not know how to do that substitution. 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know how to do that now. Fix is on the way

@@ -177,11 +210,13 @@ namespace dplyr {

inline bool compatible(SEXP x) {
int RTYPE = TYPEOF(x);
return (INTSXP == RTYPE || RTYPE == LGLSXP) && !Rf_inherits(x, "factor");
return (INTSXP == RTYPE && !Rf_inherits(x, "factor") &&
!Rf_inherits(x, "POSIXct") && !Rf_inherits(x, "Date")) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't completely failsafe against other classes such as "difftime". I wonder if the additional checks here are worth the effort, or if we should just let it slide and wait for r-lib/vctrs#7 (and potentially r-lib/vctrs#27). @hadley?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably treat any vector with a class as incompatible by default, but I agree we don't need to do that in this PR.

Copy link
Contributor Author

@zeehio zeehio Nov 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not implement support for "difftime", but now it treats any "integer with a class" or "real with a class" as incompatible (0b9ed6a) so if someone tries to collect an unsupported class (such as "difftime" or "dummyint" -in the test case-) the collecter returns an error. If you are not sure about accepting this behaviour I can revert it and difftime will be coerced to integer again (with potential loss of information).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe raising a warning and then coercing the class is a better solution for now (breaks less code and allows users to adapt).

I could also allow coercion from logical to integer with a warning, so the breaking change is softer. The decision is up to you. :-)

@@ -63,6 +63,8 @@

* `mutate_all()` etc now accept unnamed additional arguments.

* `combine()` accepts `NA` values (#2203, @zeehio)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking changes should be documented clearly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9fb42da

zeehio added a commit to zeehio/dplyr that referenced this pull request Nov 10, 2016
return new Collecter_Impl<INTSXP>(n);
else {
SEXP classes = Rf_getAttrib(model, R_ClassSymbol);
stop("Can't collect elements of class %s", CHAR(STRING_ELT(classes, 0)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this a warning for now

@zeehio zeehio force-pushed the fix_2203 branch 2 times, most recently from be40e25 to 918dac4 Compare November 13, 2016 16:08
@zeehio
Copy link
Contributor Author

zeehio commented Nov 13, 2016

This is the summary of all the 11x11 = 121 possible coercions I have tested. I have considered all pairs of the following types:

  • logicalvalue (logical with a non-NA value)
  • logicalNA (NA)
  • integer
  • double
  • complex
  • factor
  • character
  • POSIXct
  • Date
  • num_with_class (custom numeric class, think of difftime)
  • int_with_class (custom integer class)

The behaviour implemented in this PR allows to combine 35 out of 121 possible pairs.

Allow combine with NA

Custom classes may lose attributes or have wrong attributes if a NA is present, therefore they give a warning.

Var1 Var2 can_combine warning result result_class
logicalvalue logicalNA TRUE FALSE TRUE, NA logical
logicalNA logicalvalue TRUE FALSE NA, TRUE logical
logicalNA logicalNA TRUE FALSE NA, NA logical
logicalNA integer TRUE FALSE NA, 4 integer
logicalNA factor TRUE FALSE NA, a factor
logicalNA double TRUE FALSE NA, 4.5 numeric
logicalNA character TRUE FALSE NA, b character
logicalNA POSIXct TRUE FALSE NA, 2010-01-01 POSIXct, POSIXt
logicalNA Date TRUE FALSE NA, 2016-01-01 Date
logicalNA complex TRUE FALSE NA, 1+2i complex
logicalNA int_with_class TRUE TRUE NA, 4 int_with_class
logicalNA num_with_class TRUE TRUE NA, 4.5 num_with_class
integer logicalNA TRUE FALSE 4, NA integer
factor logicalNA TRUE FALSE a, NA factor
double logicalNA TRUE FALSE 4.5, NA numeric
character logicalNA TRUE FALSE b, NA character
POSIXct logicalNA TRUE FALSE 2010-01-01, NA POSIXct, POSIXt
Date logicalNA TRUE FALSE 2016-01-01, NA Date
complex logicalNA TRUE FALSE 1+2i, NA complex
int_with_class logicalNA TRUE TRUE 4, NA int_with_class
num_with_class logicalNA TRUE TRUE 4.5, NA num_with_class

Allow combine with the same type

Custom classes may lose attributes and therefore give a warning.

Var1 Var2 can_combine warning result result_class
logicalvalue logicalvalue TRUE FALSE TRUE, TRUE logical
logicalNA logicalNA TRUE FALSE NA, NA logical
integer integer TRUE FALSE 4, 4 integer
factor factor TRUE FALSE a, a factor
double double TRUE FALSE 4.5, 4.5 numeric
character character TRUE FALSE b, b character
POSIXct POSIXct TRUE FALSE 2010-01-01, 2010-01-01 POSIXct, POSIXt
Date Date TRUE FALSE 2016-01-01, 2016-01-01 Date
complex complex TRUE FALSE 1+2i, 1+2i complex
int_with_class int_with_class TRUE TRUE 4, 4 int_with_class
num_with_class num_with_class TRUE TRUE 4.5, 4.5 num_with_class

Allowed coercions

Var1 Var2 can_combine warning result result_class
factor character TRUE TRUE a, b character
character factor TRUE FALSE b, a character
integer double TRUE FALSE 4, 4.5 numeric
double integer TRUE FALSE 4.5, 4 numeric

Not allowed combinations

Var1 Var2 can_combine warning result result_class
logicalvalue integer FALSE FALSE
logicalvalue factor FALSE FALSE
logicalvalue double FALSE FALSE
logicalvalue character FALSE FALSE
logicalvalue POSIXct FALSE FALSE
logicalvalue Date FALSE FALSE
logicalvalue complex FALSE FALSE
logicalvalue int_with_class FALSE FALSE
logicalvalue num_with_class FALSE FALSE
integer logicalvalue FALSE FALSE
integer factor FALSE FALSE
integer character FALSE FALSE
integer POSIXct FALSE FALSE
integer Date FALSE FALSE
integer complex FALSE FALSE
integer int_with_class FALSE FALSE
integer num_with_class FALSE FALSE
factor logicalvalue FALSE FALSE
factor integer FALSE FALSE
factor double FALSE FALSE
factor POSIXct FALSE FALSE
factor Date FALSE FALSE
factor complex FALSE FALSE
factor int_with_class FALSE FALSE
factor num_with_class FALSE FALSE
double logicalvalue FALSE FALSE
double factor FALSE FALSE
double character FALSE FALSE
double POSIXct FALSE FALSE
double Date FALSE FALSE
double complex FALSE FALSE
double int_with_class FALSE FALSE
double num_with_class FALSE FALSE
character logicalvalue FALSE FALSE
character integer FALSE FALSE
character double FALSE FALSE
character POSIXct FALSE FALSE
character Date FALSE FALSE
character complex FALSE FALSE
character int_with_class FALSE FALSE
character num_with_class FALSE FALSE
POSIXct logicalvalue FALSE FALSE
POSIXct integer FALSE FALSE
POSIXct factor FALSE FALSE
POSIXct double FALSE FALSE
POSIXct character FALSE FALSE
POSIXct Date FALSE FALSE
POSIXct complex FALSE FALSE
POSIXct int_with_class FALSE FALSE
POSIXct num_with_class FALSE FALSE
Date logicalvalue FALSE FALSE
Date integer FALSE FALSE
Date factor FALSE FALSE
Date double FALSE FALSE
Date character FALSE FALSE
Date POSIXct FALSE FALSE
Date complex FALSE FALSE
Date int_with_class FALSE FALSE
Date num_with_class FALSE FALSE
complex logicalvalue FALSE FALSE
complex integer FALSE FALSE
complex factor FALSE FALSE
complex double FALSE FALSE
complex character FALSE FALSE
complex POSIXct FALSE FALSE
complex Date FALSE FALSE
complex int_with_class FALSE FALSE
complex num_with_class FALSE FALSE
int_with_class logicalvalue FALSE FALSE
int_with_class integer FALSE FALSE
int_with_class factor FALSE FALSE
int_with_class double FALSE FALSE
int_with_class character FALSE FALSE
int_with_class POSIXct FALSE FALSE
int_with_class Date FALSE FALSE
int_with_class complex FALSE FALSE
int_with_class num_with_class FALSE FALSE
num_with_class logicalvalue FALSE FALSE
num_with_class integer FALSE FALSE
num_with_class factor FALSE FALSE
num_with_class double FALSE FALSE
num_with_class character FALSE FALSE
num_with_class POSIXct FALSE FALSE
num_with_class Date FALSE FALSE
num_with_class complex FALSE FALSE
num_with_class int_with_class FALSE FALSE

I have rebased and squashed the commits into 4 manageable commits. I will appreciate your comments.

Oh, and in case a regression appeared in the future, the error message would explicit (here is an example of what would be printed):

1. Failure: Coercion from factor to character (@test-combine.R#282) ----------
combine(items[c("factor", "character")]) not equal to c("a", "b").

@zeehio
Copy link
Contributor Author

zeehio commented Dec 8, 2016

This has been left here ready for review, but no hurries (end of year is usually busy, so happy new year to all of you)

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your work on "this small pull request" ;-) This will be very useful for vctrs, too, we can probably use most of the tests there right away.

Would you mind doing another small round?

label = paste0("combine(items[c(\"", var1, "\", \"", var2, "\")])"),
expected.label = deparse(result))
} else {
expect_error(suppressWarnings(combine(item_pair)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer expect_error(expect_warning(...)), does that work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect_error(expect_warning(...)) will not work as expected, because if a bug causes ... to run without warnings, then expect_warning will raise an error that will be accepted and suppressed by expect_error and the test will succeed where it should have failed.

I am working on all the other comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about expect_warning(expect_error(...)) ?

library(testthat)

with_reporter("summary",
  test_that("error + warning", {
    expect_warning(expect_error({warning(); stop()}))
    expect_warning(expect_error({stop()}))
    expect_warning(expect_error({warning()}))
    expect_warning(expect_error({}))
  })
)
#> ...12.34
#> Failed --------------------------------------------------------------------
#> 1. Failure: error + warning (@<text>#6) -----------------------------------
#> expect_error(...) showed 0 warnings
#> 
#> 
#> 2. Failure: error + warning (@<text>#7) -----------------------------------
#> {
#>     ...
#> } did not throw an error.
#> 
#> 
#> 3. Failure: error + warning (@<text>#8) -----------------------------------
#> {
#>     ...
#> } did not throw an error.
#> 
#> 
#> 4. Failure: error + warning (@<text>#8) -----------------------------------
#> expect_error(...) showed 0 warnings
#> 
#> 
#> DONE ======================================================================

combine_pair_test <- function(item_pair, var1, var2, result,
can_combine = TRUE, warning = FALSE) {
if (can_combine) {
if (warning) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified:

warning_regexp <- if (warning) ".*" else NA
expect_warning(..., warning_regexp)

if (can_combine) {
if (warning) {
expect_warning(res <- combine(item_pair),
label = paste0("combine(items[c(\"", var1, "\", \"", var2, "\")])"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please extract a variable here?

}


prepare_table_with_coercion_rules <- function(items) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to declare items in the function, because the code below is tightly coupled with the names. However, you could split this large function in several smaller functions.

expect_equal(works3, expected_result)
})

combine_pair_test <- function(item_pair, var1, var2, result,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of the pair tests and the test logic should perhaps live in a helper file, tests/testthat/helper-combine.R or similar.

}

combine_coercion_types <- function() {
items <- list(logicalvalue = TRUE, logicalNA = NA, integer = 4L,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use logicalNA = NA, logicalNA = c(NA, NA) here? I suspect we could then get rid of the explicit tests.

switch (TYPEOF(model)) {
case INTSXP:
if (Rf_inherits(model, "Date"))
return new TypedCollecter<INTSXP>(n, get_date_classes());
if (Rf_inherits(model, "factor"))
return new Collecter_Impl<STRSXP>(n);
if (has_classes(model)) {
SEXP classes = Rf_getAttrib(model, R_ClassSymbol);
Rf_warning("Coercing class %s, with possible loss of information",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coercing into what?

return new Collecter_Impl<INTSXP>(n);
case REALSXP:
if (Rf_inherits(model, "POSIXct"))
return new POSIXctCollecter(n, Rf_getAttrib(model, Rf_install("tzone")));
if (Rf_inherits(model, "Date"))
return new TypedCollecter<REALSXP>(n, get_date_classes());
if (has_classes(model)) {
SEXP classes = Rf_getAttrib(model, R_ClassSymbol);
Rf_warning("Coercing class %s, with possible loss of information",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... into what?

@krlmlr
Copy link
Member

krlmlr commented Jan 26, 2017

Oh, don't worry about the merge conflict, I can resolve it.

zeehio added a commit to zeehio/dplyr that referenced this pull request Jan 27, 2017
prepare_table_with_coercion_rules is decoupled from the items variable as suggested in comments on tidyverse#2209.
@zeehio
Copy link
Contributor Author

zeehio commented Jan 27, 2017

I have taken care of all your comments. The most noticeable change is that I decoupled prepare_table_with_coercion_rules from the items variable.

I am ready for the next round of comments if there still are any other left 👍

zeehio added a commit to zeehio/dplyr that referenced this pull request Jan 27, 2017
prepare_table_with_coercion_rules is decoupled from the items variable as suggested in comments on tidyverse#2209.
Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Almost ready ;-)

int_with_class = structure(4L, class = "int_with_class"),
num_with_class = structure(4.5, class = "num_with_class"))

pairs <- prepare_table_with_coercion_rules(items)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define items in prepare_table_with_coercion_rules() and call that function without arguments? This probably means that we need to add a new list column "item_pair" to the pairs data frame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

can_be_combined <- function(item1, item2,
class1, class2,
all_na1, all_na2,
known_to_dplyr1, known_to_dplyr2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of local functions. Do they access variables other than their arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you prefer, I have placed those local functions as conventional functions. 👍

Like you, I don't like functions that use variables out of their scope (variables other than their arguments). However I find it practical to have local functions when there is only one function that calls them (a local function, like a local variable).

As this is just a matter of taste, I have avoided using local functions as you requested 😃

@zeehio
Copy link
Contributor Author

zeehio commented Jan 27, 2017

I will finish on Monday evening. A romantic weekend in Rome awaits me.

Thanks for all the time and reviews!

…n rules

This commit fixes issue tidyverse#2203, allowing combine to deal with missing
values.

Additionally it restricts coercion rules, in particular coercing
logical to integer or double is not allowed anymore.

Other coercion cases will give warnings, if information may be
lost in the conversion, for instance when coercing integers with
classes, such as difftime.
This commit checks the coercion rules of many pairs of types:

 - logical values
 - logical missing value
 - character
 - integer
 - factor
 - double
 - complex
 - integer with class
 - double with class
 - Date
 - POSIXct
prepare_table_with_coercion_rules is decoupled from the items variable as suggested in comments on tidyverse#2209.
- Replace local functions
- Define variable inside function
@krlmlr krlmlr merged commit 0833d5b into tidyverse:master Feb 1, 2017
@krlmlr
Copy link
Member

krlmlr commented Feb 1, 2017

Thanks!

@lock
Copy link

lock bot commented Jan 18, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants