Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bind_rows() with columns of class "table" #2406

Closed
krlmlr opened this issue Feb 4, 2017 · 5 comments · Fixed by #2425
Closed

bind_rows() with columns of class "table" #2406

krlmlr opened this issue Feb 4, 2017 · 5 comments · Fixed by #2425
Labels
bug an unexpected problem or unintended behavior

Comments

@krlmlr
Copy link
Member

krlmlr commented Feb 4, 2017

now works, with a warning. Is this intended? CC @zeehio.

Stable version, and 8aa1bdb (the one before #2209):

> bind_rows(data_frame(a = table(1:3)), data_frame(a = table(4:6)))
# A tibble: 6 × 1
      a
  <int>
1     1
2     1
3     1
4     1
5     1
6     1

Current master, and 0833d5b (which contains #2209):

> bind_rows(data_frame(a = table(1:3)), data_frame(a = table(4:6)))
# A tibble: 6 × 1
            a
  <S3: table>
1           1
2           1
3           1
4           1
5           1
6           1
Warning message:
In bind_rows_(x, .id) :
  Vectorizing 'table' elements may not preserve their attributes
@zeehio
Copy link
Contributor

zeehio commented Feb 4, 2017

It is intended, but we could improve how table is handled to prevent the warning if desired. Here is an explanation of the reasons:

A table is an integer vector with several attributes:

  • The attribute dim, with the dimensionality
  • The attribute dimnames with the name of each component for each dimension
  • The attribute class, with value "table"

There is not (and there was not) any specific code in dplyr's inst/include/dplyr/Collecter.h module for handling table objects so they are treated as any other "custom integer class".

Before #2209, when two custom integer classes were combined they were treated as conventional integers, with POSIXct, Date and factor as the only exceptions (see here). This means that the attributes of all custom integer classes were dropped. While this works as expected for some classes (e.g. table), it can also be dangerous. See for instance the following dummy example where the "unit" attribute would be silently dropped in dplyr-0.5.0:

some_lengths <- structure(c(1, 2, 3), unit = "m", class = "num_with_unit")
more_lengths <- structure(c(1, 2, 3), unit = "cm", class = "num_with_unit")
bind_rows(data_frame(x=some_lengths), data_frame(x=more_lengths))
# A tibble: 6 × 1
      x
  <dbl>
1     1
2     2
3     3
4     1
5     2
6     3

After #2209 the code works, attributes are still dropped but a warning is given. This warning can help the user to guess that their "units" are lost and prevent further errors (see here how the default case for objects with a class gives a warning).

Adding table as another special case (with POSIXct, Date and factor) seems easy to do and I can submit a PR if you want. I believe that the general case should raise a warning anyway.

Thinking on vctrs, I wonder if there would be an approach for allowing other packages to define how the attributes should be combined. For instance, in my dummy example above I would like to define a method for num_with_unit that would allow me to convert the units as needed or raise an error if the units were incompatible.

I hope I have given a reasonable explanation. Comments are welcome and I am willing to help if needed, in the little spare time I currently have (I am writing my PhD thesis these months)

Best

@krlmlr
Copy link
Member Author

krlmlr commented Feb 8, 2017

Thanks. I see that after #2209 the "class" attribute remains in the result. I'm not sure this is always safe, but stripping it away is wrong e.g. for integer64 objects. While we agreed to postpone further discussion to vctrs, changing the behavior feels dangerous to me and might give maintenance overhead. (A warning feels ok, though.)

I think we should revert to the old behavior and strip "unknown" classes, and perhaps add support for integer64 (#1092). @hadley: Could you please comment?

@hadley
Copy link
Member

hadley commented Feb 9, 2017

That sounds reasonable to me

@krlmlr krlmlr added the bug an unexpected problem or unintended behavior label Feb 10, 2017
@krlmlr
Copy link
Member Author

krlmlr commented Feb 10, 2017

@zeehio: Would you like to contribute a fix?

@zeehio
Copy link
Contributor

zeehio commented Feb 10, 2017

Yes, I will submit it in the following days 👍

hadley pushed a commit that referenced this issue Feb 26, 2017
* Combine strips unknown classes. Support integer64.

* Fixes suggested by @krlmlr

* More fixes suggested by @krlmlr

* Rely on the STL to simplify code per @hadley suggestion

* Factor out inherits_from.

Fixes #2406.
@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants