Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when a column has not classes but has the object bit set, pretend it … #3394

Merged
merged 8 commits into from Mar 12, 2018

Conversation

romainfrancois
Copy link
Member

…has not. closes #3349

This is pretty straightforward. stats::model.frame does weird stuff, and broom was an innocent bystander. Just added some defensive code.

@MichaelChirico
Copy link
Contributor

Perhaps can be deprecated once dependency on R passes if/when the underlying R bug is fixed?

Certainly in that case the test here will fail; given that, according to the is.object documentation, this situation should never arise, it seems it'll be difficult to create a test.

@romainfrancois
Copy link
Member Author

Yeah. We can leave the defensive code just in case, but disable the test once the issue is fixed upstream in R.

@MichaelChirico
Copy link
Contributor

sounds good. maybe add a comment to tag for later readers? not sure the best pipeline for dealing with something minor like this that will (hopefully) be hardly visible.

@@ -571,6 +571,13 @@ test_that("bind_rows() accepts lists of dataframe-like lists as first argument",
expect_identical(bind_rows(list(list(a = 1, b = 2))), tibble(a = 1, b = 2))
})

test_that("columns that are OBJECT but have NULL class are handled gracefully (#3349)", {
mod <- lm(y~ ., data=freeny)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to follow http://style.tidyverse.org/, the styler package gets most of it right with style_pkg(strict = FALSE) .

if (!is_class_known(x)) {
SEXP classes = Rf_getAttrib(x, R_ClassSymbol);
SEXP classes = Rf_getAttrib(x, R_ClassSymbol);
if (!Rf_isNull(classes) && !is_class_known(x)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not implement this change in is_class_known() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoiding to do Rf_getAttrib(x, R_ClassSymbol) twice.

while we are here, I don't see the point of

if (!OBJECT(x)) {
    return false;
  }

in inherits_from which is only called when OBJECT(x) is set.

And the std::set_intersection looks a bit overkill, this could exit earlier, as soon as there is at least one class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this avoid querying the class attribute twice? I'm fine with removing the classes variable in this function, fix is_class_known(), and only query the classes when constructing the warning (we don't need to optimize for the failure case).

Happy to look at the other problems you mentioned in a separate PR/issue.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move checks into inherits_from().

I agree that the inheritance check in that function can be improved, if I were to reimplement it today I would:

  • let known_classes be a set (or maybe sorted array) of CHARSXP
  • iterate manually over all elements of the "class" vector and do a lookup in the set/array

But I'm not sure it has an impact on the overall performance. Let's not spend too much time on that.

@@ -34,7 +34,7 @@ static bool is_class_known(SEXP x) {
known_classes.insert("integer64");
known_classes.insert("table");
}
if (OBJECT(x)) {
if (OBJECT(x) && !Rf_isNull(Rf_getAttrib(x, R_ClassSymbol))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to understand the problem here. If you want to avoid querying the class symbol and the OBJECT() bit twice (and also clean up the code a bit), maybe this check could be added to line 17 in this file? I suspect inherits_from() will fail if the class attribute is NULL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inherits_from and is_class_known are only used once each anyway. it all looks kind of confusing. But I feel that this patch is good enough for fixing the problem, and we can revisit the code later.

I suspect inherits_from() will fail if the class attribute is NULL.

yes that was part of the initial problem.

@krlmlr
Copy link
Member

krlmlr commented Mar 12, 2018

Thanks! Could you please resolve conflicts and merge?

@romainfrancois romainfrancois merged commit 895ab0b into master Mar 12, 2018
@romainfrancois romainfrancois deleted the fix-issue-3349 branch March 12, 2018 08:47
@lock
Copy link

lock bot commented Sep 8, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Sep 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bind_rows: Expecting a string vector: [type=NULL; required=STRSXP].
3 participants