New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as_tibble(matrix w/ attributes) copies too many attributes from matrix to its columns #184

Closed
BillDunlap opened this Issue Oct 21, 2016 · 11 comments

Comments

Projects
None yet
4 participants
@BillDunlap

BillDunlap commented Oct 21, 2016

The as_tibble method for matrices copies all the attributes of the input matrix to
each of its columns, then removes the 'names' and 'dim' attributes (see matrixToDataFrame.cpp:
copy_most_attributes). This can make illegal columns because, e.g., certain classes require certain attributes to be present. E.g., "mts" and "matrix" objects are expected to have a "dim" attribute so you can
get problems like the following

mts <- cbind(A=ts(c(1,1,2,2), start=2016, freq=4), B=ts(c(11,11,12,13), start=2016, freq=4))
tblMts <- tibble::as_tibble(mts)
str(attributes(tblMts[["A"]]))
   # List of 2
   # $ tsp  : num [1:3] 2016 2017 4
   # $ class: chr [1:3] "mts" "ts" "matrix"
duplicated(tblMts[["A"]])
   # Error in duplicated.matrix(tblMts[["A"]]) : character(0)

With TIBCO's TERR there are further problems with the same code because its SET_ATTRIB checks that the dim attribute is compatible with the object's length and R's SET_ATTRIB does not.

Calling out to the R-level "[" to extract the columns would solve the problem.

(tibble-1.2, R-3.3.0)

@krlmlr

This comment has been minimized.

Member

krlmlr commented Nov 17, 2016

Tibbles currently don't support matrix columns, so the above should give an error.

@krlmlr krlmlr added the bug label Nov 17, 2016

@BillDunlap

This comment has been minimized.

BillDunlap commented Nov 17, 2016

Note that I am not trying insert a matrix column into a tibble,
I am making a 2-column tibble from a 2-column time-series matrix.

I would expect each column to be a time-series vector or perhaps
a pure vector. as_tibble tries to produces such a thing but it
can put illegal combinations of attributes on the columns (since
it does not go through the "[" method for the time-series matrix).
The illegal combinations of attributes are not immediately apparent,
but will show up when a method assumes the input object is legal.

On Thu, Nov 17, 2016 at 7:07 AM, Kirill Müller notifications@github.com
wrote:

Tibbles currently don't support matrix columns, so the above should give
an error.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#184 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALHWBZROuqOdVxl9jQ7-IkdT46IvfF_1ks5q_G2wgaJpZM4KdlI8
.

@krlmlr krlmlr added this to the 1.3.0 milestone Jan 4, 2017

@krlmlr krlmlr closed this in c64b760 Jan 7, 2017

krlmlr added a commit that referenced this issue Jan 9, 2017

Merge tag 'v1.2-15'
- Test R 3.1.3 and later in AppVeyor, using `Depends: R (>= 3.1.0)` in `DESCRIPTION`. Support for R 3.0.0 requires a `lazyeval` update (#189).
- An attempt to partially update a missing column now throws a clearer warning (#199).
- Time series matrices (objects of class `mts` and `ts`) are now supported in `as_tibble()` (#184).
- An attempt to call `add_row()` for a grouped data frame results in a helpful error message (#179).
- The `all_equal()` function (called by `all.equal.tbl_df()`) now forwards to `dplyr` and fails with a helpful message if not installed. Data frames with list columns cannot be compared anymore, and differences in the declard class (`data.frame` vs. `tbl_df`) are ignored. This ensures consistent behavior of this function, regardless if `dplyr` is loaded or not (#198).
- Backtick `NA` names in printing (#206, #207, @jennybc).
@cfhammill

This comment has been minimized.

cfhammill commented Jan 10, 2017

@krlmlr I think this problem is more general, any time a matrix is given an extra class it breaks matrixToDataFrame.

e.g.

mat <- matrix(1:100, nrow = 10)
class(mat) <- c("special_matrix", "matrix")
as_tibble(mat) %>% mutate(colnum = 1:10) ## error
@krlmlr

This comment has been minimized.

Member

krlmlr commented Jan 10, 2017

I currently don't have a better solution except to add special handling for each matrix class that breaks, as I did with ts here.

@BillDunlap

This comment has been minimized.

BillDunlap commented Jan 10, 2017

@krlmlr

This comment has been minimized.

Member

krlmlr commented Jan 10, 2017

@BillDunlap: Your strategy fails for an existing test for a matrix-like object of class Date, this is supposed to be converted to a tibble with date columns. @hadley: Do you remember the rationale behind this behavior?

@BillDunlap

This comment has been minimized.

BillDunlap commented Jan 10, 2017

@cfhammill

This comment has been minimized.

cfhammill commented Jan 10, 2017

Would it make more sense to special case the matrix-like objects that copy their attributes to the
columns? That way the general rule would be to either discard extra class/attribute information and
users would be free to arbitrarily subclass their matrices.

@hadley

This comment has been minimized.

Member

hadley commented Jan 10, 2017

The original motivation is probably that as.data.frame(date) doesn't do the right thing.

@hadley

This comment has been minimized.

Member

hadley commented Jan 10, 2017

And I have a vague memory that's important for tidyr.

@BillDunlap

This comment has been minimized.

BillDunlap commented Jan 10, 2017

krlmlr added a commit that referenced this issue Apr 1, 2017

Merge tag 'v1.3.0'
Bug fixes
=========

- Time series matrices (objects of class `mts` and `ts`) are now supported in `as_tibble()` (#184).
- The `all_equal()` function (called by `all.equal.tbl_df()`) now forwards to `dplyr` and fails with a helpful message if not installed. Data frames with list columns cannot be compared anymore, and differences in the declared class (`data.frame` vs. `tbl_df`) are ignored. The `all.equal.tbl_df()` method gives a warning and forwards to `NextMethod()` if `dplyr` is not installed; call `all.equal(as.data.frame(...), ...)` to avoid the warning. This ensures consistent behavior of this function, regardless if `dplyr` is loaded or not (#198).

Interface changes
=================

- Now requiring R 3.1.0 instead of R 3.1.3 (#189).
- Add `as.tibble()` as an alias to `as_tibble()` (#160, @LaDilettante).
- New `frame_matrix()`, similar to `frame_data()` but for matrices (#140, #168, @LaDilettante).
- New `deframe()` as reverse operation to `enframe()` (#146, #214).
- Removed unused dependency on `assertthat`.

Features
========

General
-------

- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
- Singular and plural variants for error messages that mention a list of objects (#116, #138, @LaDilettante).
- `add_column()` can add columns of length 1 (#162, #164, @LaDilettante).

Input validation
----------------

- An attempt to read or update a missing column now throws a clearer warning (#199).
- An attempt to call `add_row()` for a grouped data frame results in a helpful error message (#179).

Printing
--------

- Render Unicode multiplication sign as `x` if it cannot be represented in the current locale (#192, @ncarchedi).
- Backtick `NA` names in printing (#206, #207, @jennybc).
- `glimpse()` now uses `type_sum()` also for S3 objects (#185, #186, @holstius).
- The `max.print` option is ignored when printing a tibble (#194, #195, @t-kalinowski).

Documentation
=============

- Fix typo in `obj_sum` documentation (#193, @etiennebr).
- Reword documentation for `tribble()` (#191, @kwstat).
- Now explicitly stating minimum Rcpp version 0.12.3.

Internal
========

- Using registration of native routines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment