New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as_tibble.matrix should keep rownames attribute #288

Closed
ClaytonJY opened this Issue Aug 2, 2017 · 5 comments

Comments

Projects
None yet
2 participants
@ClaytonJY

ClaytonJY commented Aug 2, 2017

Currently, as_tibble.matrix() throws away rownames:

library(tibble)

mtmatrix <- as.matrix(mtcars)
rownames(mtmatrix)
#>  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
#>  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
#>  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
#> [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
#> [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
#> [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
#> [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
#> [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
#> [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
#> [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
#> [31] "Maserati Bora"       "Volvo 142E"
rownames(as_tibble(mtmatrix))
#>  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
#> [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
#> [29] "29" "30" "31" "32"

This is inconsistent with as_tibble.data.frame, which makes sure to keep rownames

all.equal(
  rownames(mtcars),
  rownames(as_tibble(mtcars))
)
#> [1] TRUE

If a user has a rownamed matrix and wants a tibble with those rownames in it (e.g. confusion matrices from the randomForest package), they have to convert to a base data.frame first

mtmatrix %>%
  as.data.frame() %>%
  as_tibble() %>%
  rownames_to_column("rname")

Fixing this bug would remove the need for the as.data.frame() call, and make the steps match those for doing the same with a data.frame, e.g.

mtcars %>%
  as_tibble() %>%
  rownames_to_column("rname")

This only saves one step, but standardizing the tibble interface across matrices and data.frames where possible seems like a worthy UX improvement.

Session Info

Used newest github version to make sure this wasn't fixed already

devtools::session_info("tibble")
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.1 (2017-06-30)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2017-08-02
#> Packages ------------------------------------------------------------------
#>  package * version    date       source                           
#>  Rcpp      0.12.12    2017-07-15 cran (@0.12.12)                  
#>  rlang     0.1.1      2017-05-18 CRAN (R 3.4.0)                   
#>  tibble  * 1.3.3.9001 2017-08-02 Github (tidyverse/tibble@12767ae)
@ClaytonJY

This comment has been minimized.

ClaytonJY commented Aug 2, 2017

Looks like this conversion is done in C-land with matrixToDataFrame; does the fix need to happen in there?

That's a bit outside my skills, but I can try re-setting the attribute after the conversion, inside as_tibble.matrix, and put that in a PR with a test or two.

@krlmlr

This comment has been minimized.

Member

krlmlr commented Sep 12, 2017

Thanks. I agree that there is some inconsistency, but I'd rather not support row names in tibbles altogether. We may want to give a warning at some point, so there seems to be no point in enabling this functionality now.

We may want to consider adding a key argument instead, once we figure out support for keys (tidyverse/dplyr#1792), a special value would then convert the row names to a column and also work for matrices.

@ClaytonJY

This comment has been minimized.

ClaytonJY commented Sep 12, 2017

@krlmlr I think the key idea is really interesting, and something I've definitely missed from data.table, so it's exciting to see work in that direction.

I agree that tibble shouldn't support rownames, but I think it's important to make it easy for users to work with data that has rownames through no fault of theirs. That's why I made the PR, to make it as easy to get matrix rownames into a column as it is for data.frames.

Even better might be to have the as_tibble methods for matrix/data.frames accept a "rownames" argument, which will do what rownames_to_column does up-front, thus avoiding the need to keep the rownames attribute on the tibble at all. Is that what you're thinking the keys will allow for?

@ClaytonJY

This comment has been minimized.

ClaytonJY commented Sep 12, 2017

If that wouldn't be handled best with new keys functionality, I'd be happy to write up a PR to modify as_tibble.matrix and as_tibble.data.frame to do the rownames-to-column conversion up front!

@krlmlr

This comment has been minimized.

Member

krlmlr commented Oct 4, 2017

as_tibble(rownames = "id") works now for matrices and data frames.

krlmlr added a commit that referenced this issue Nov 3, 2017

Merge tag 'v1.3.4.9002'
- In `glimpse()`, compute `type_sum()` from data frame for dbplyr compatibility (#328).
- `as_tibble.matrix()` repairs column names.
- Compatible with R 3.1 (#323).
- Make add_case() and alias for add_row() (#324, @LaDilettante).
- `add_column()` to an empty zero-row tibble with a variable of nonzero length now produces a correct error message (#319).
- Logical indexes are supported, a warning is raised if the length does not match the number of rows or 1 (#318).
- Tibbles now support character subsetting (#312).
- `as_tibble()` gains `rownames` argument (#288, #289).
- Remove Rcpp dependency (#313, @patperry).
- ``` `[.tbl_df`() ``` supports `drop = TRUE` and omits the warning if `j` is passed. The calls `df[i, j, drop = TRUE]` and `df[i, drop = TRUE]` are now compatible with data frames again (#307, #311).
- Prevent `add_column()` from dropping classes and attributes by removing the use of `cbind()`. Additionally this ensures that `add_column()` can be used with grouped data frames (#303, @DavisVaughan).
- Fixed width for word wrapping of the extra information (#301).

krlmlr added a commit that referenced this issue Dec 28, 2017

Merge tag 'v1.4.1'
New formatting
--------------

The new pillar package is now responsible for formatting tibbles. Pillar will try to display as many columns as possible, if necessary truncating or shortening the output. Colored output highlights important information and guides the eye. The vignette in the tibble package describes how to adapt custom data types for optimal display in a tibble.

New features
------------

- Make `add_case()` an alias for `add_row()` (#324, @LaDilettante).
- `as_tibble()` gains `rownames` argument (#288, #289).
- `as_tibble.matrix()` repairs column names.
- Tibbles now support character subsetting (#312).
- ``` `[.tbl_df`() ``` supports `drop = TRUE` and omits the warning if `j` is passed. The calls `df[i, j, drop = TRUE]` and `df[i, drop = TRUE]` are now compatible with data frames again (#307, #311).

Bug fixes
---------

- Improved compatibility with remote data sources for `glimpse()` (#328).
- Logical indexes are supported, a warning is raised if the length does not match the number of rows or 1 (#318).
- Fixed width for word wrapping of the extra information (#301).
- Prevent `add_column()` from dropping classes and attributes by removing the use of `cbind()`. Additionally this ensures that `add_column()` can be used with grouped data frames (#303, @DavisVaughan).
- `add_column()` to an empty zero-row tibble with a variable of nonzero length now produces a correct error message (#319).

Internal changes
----------------

- Reexporting `has_name()` from rlang, instead of forwarding, to avoid warning when importing both rlang and tibble.
- Compatible with R 3.1 (#323).
- Remove Rcpp dependency (#313, @patperry).

@krlmlr krlmlr referenced this issue Mar 1, 2018

Closed

row names #123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment