Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join warnings #2728

Merged
merged 4 commits into from May 3, 2017
Merged

Join warnings #2728

merged 4 commits into from May 3, 2017

Conversation

@hadley
Copy link
Member

@hadley hadley commented May 2, 2017

The main purpose of this PR is to switch from an error to a warning when the attributes don't match. I also used this as an opportunity to pull down the join variable names so that we can start to make better warnings and errors.

@hadley hadley requested a review from krlmlr May 2, 2017
Copy link
Member

@krlmlr krlmlr left a comment

Thanks. I think this could benefit from a new Column class that encapsulates a SymbolString and an RObject, to avoid these huge argument lists.

Loading

src/join.cpp Outdated

if (var_left == var_right) {
std::string var_utf8 = var_left.get_utf8_cstring();
Rf_warningcall(R_NilValue,
Copy link
Member

@krlmlr krlmlr May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unsafe with options(warn = 2), no stack unwinding will occur. But Rcpp::warning() isn't any safer :-(

Loading

src/join.cpp Outdated
@@ -33,6 +33,31 @@ int count_attributes(SEXP x) {
return n;
}

void warn_bad_var(const SymbolString& var_left, const SymbolString var_right,
Copy link
Member

@krlmlr krlmlr May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to use a reference for var_right here and elsewhere (also name_right)?

Loading

Copy link
Member Author

@hadley hadley May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, thanks.

Loading

src/join.cpp Outdated
@@ -51,8 +76,10 @@ void check_attribute_compatibility(SEXP left, SEXP right) {
return;
}

if (n_left != n_right)
stop("attributes of different sizes");
if (n_left != n_right) {
Copy link
Member

@krlmlr krlmlr May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should extract a function and just write if (attributes_different(left, right)) warn_bad_var(...); .

Loading

Copy link
Member Author

@hadley hadley May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and it's probably better to do that at the R level and use all.equal(). Something like this:

attributes_equal <- function(x, y) {
  attr_x <- attributes(x)
  attr_x <- attr_x[sort(names(attr_x))]
  
  attr_y <- attributes(y)
  attr_y <- attr_y[sort(names(attr_y))]
  
  isTRUE(all.equal(attr_x, attr_y)
}

Loading

@hadley hadley mentioned this pull request May 2, 2017
if (!is<bool>(test) || !as<bool>(test)) {
stop("attributes are different");
// Otherwise rely on R function based on all.equal
static Function attr_equal = Function("attr_equal", Environment::namespace_env("dplyr"));
Copy link
Member Author

@hadley hadley May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a more accurate approach than previously which would fail with an error if there was an attribute on the LHS but not the RHS. @krlmlr what do you think?

Loading

Copy link
Member

@krlmlr krlmlr May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the R code looks much cleaner, too. The one-time call into R should't be too much of an issue performance-wise.

Loading

hadley added 3 commits May 2, 2017
Fixes #2701

Currently emits too warnings, but I can't figure out where that's coming from and it's not very high priority.
Copy link
Member

@krlmlr krlmlr left a comment

This looks good, we can merge and clean up later (e.g., Column class).

Loading

src/join.cpp Outdated
std::string left_utf8 = var_left.get_utf8_cstring();
std::string right_utf8 = var_right.get_utf8_cstring();
Rf_warningcall(R_NilValue,
"Variable `%s`/`%s` %s",
Copy link
Member

@krlmlr krlmlr May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we use "Column `xxx`" in error messages?

Loading

if (!is<bool>(test) || !as<bool>(test)) {
stop("attributes are different");
// Otherwise rely on R function based on all.equal
static Function attr_equal = Function("attr_equal", Environment::namespace_env("dplyr"));
Copy link
Member

@krlmlr krlmlr May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the R code looks much cleaner, too. The one-time call into R should't be too much of an issue performance-wise.

Loading

c = 2:1
)

expect_warning(out1 <- left_join(df1, df2, by = "a"), "different attributes")
Copy link
Member

@krlmlr krlmlr May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to check the precise wording here? Worked well for the error messages.

Loading

@hadley hadley merged commit 25ed407 into master May 3, 2017
0 of 4 checks passed
Loading
@hadley hadley deleted the join-errors branch Jun 22, 2018
@lock
Copy link

@lock lock bot commented Dec 19, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

Loading

@lock lock bot locked and limited conversation to collaborators Dec 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants