New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove support for row names #114

Closed
krlmlr opened this Issue Jul 3, 2016 · 22 comments

Comments

Projects
None yet
5 participants
@krlmlr
Copy link
Member

krlmlr commented Jul 3, 2016

Complicates the code. Should first warn when accessing (r/w) row names of a data frame, and then (in a following version) discard them when coercing.

@zeehio

This comment was marked as off-topic.

Copy link

zeehio commented Sep 28, 2016

Do you mean you are planning to remove tibble::rownames_to_column?

Given a matrix with row names that I want to convert to a tibble, I am currently using:

tib <- as.data.frame(mymatrix) %>% tibble::rownames_to_column("myrownames")

Would you think it is better that I go back to the base package for this simple stuff?

The main advantage of tibble is that it provides a rethought interface for creating data.frames with nice defaults, and that is very convenient. The javascript left_pad chaos taught us that using a dependency just for the sake of convenience has its risks. I understand that preserving API compatibility can be a burden and therefore I would appreciate to know if I can rely on the tibble API to be stable (until I found this issue I thought I could rely on it since tibble reached version 1.0).

@krlmlr

This comment was marked as resolved.

Copy link
Member

krlmlr commented Sep 28, 2016

No, rownames_to_column() will stay, but I was planning for it to work only for plain data frames. The code you show would then raise a warning -- indirectly, because rownames_to_column() accesses row names of a tibble. The documentation for rownames_to_column() does mention that "tibbles can have row names but they are removed on subsetting". I'd prefer to gently change this into "tibbles cannot have row names" to simplify the code base, if this happens your code would not work as expected.

So, how to make a gentle transition? We could support a rownames_column argument to as_tibble(), and first give messages before moving over to real warnings if a tibble still has row names after conversion. Would that work for you?

@zeehio

This comment was marked as resolved.

Copy link

zeehio commented Sep 28, 2016

That would be great. Having understood what you meant, I could not agree more with you. It is much bettter to remove the row names from tibbles.

It would be great to have the rownames_column argument in as_tibble (at least both for matrix and data.frame classes).

Thank you for your quick reply, I am looking forward to using it! 👍

@flying-sheep

This comment was marked as resolved.

Copy link

flying-sheep commented Nov 16, 2016

how do go back to rownames? some APIs need them and i’d like to have an easy way to finish a pipeline by converting the tibble into a dataframe or matrix while consuming a column into rownames

@krlmlr

This comment was marked as off-topic.

Copy link
Member

krlmlr commented Nov 16, 2016

You can use . %>% as.data.frame %>% column_to_rownames(...). This also protects you from APIs that expect drop = TRUE when subsetting via [.

@flying-sheep

This comment was marked as resolved.

Copy link

flying-sheep commented Nov 16, 2016

i meant more your new world with an one-step conversion from rownamed df to rowname-less tibble.

i think if you do that, there should be a reverse operation. (one step conversion from rowname-less tibble to rownamed df)

@michaellevy

This comment was marked as outdated.

Copy link

michaellevy commented May 10, 2018

@krlmlr I can't imagine this is news to you, but this invokes substantial pain. Do you really want tibbles to be incompatible with any package that uses rownames? I have a package that uses tibbles, and I just added support for glmnet, and as result the "setting rownames" warning appears all over the place, so now I need to go make sure that if we do lose rowname support it won't break everything, and I have to deal with the warnings that show up all over my tests. Are you open to maybe just leaving support for rownames in tibbles?

@krlmlr

This comment was marked as outdated.

Copy link
Member

krlmlr commented May 10, 2018

Please use . %>% as.data.frame() %>% column_to_rownames(...) before passing your tibbles to code that uses row names, and . %>% rownames_to_column() %>% as_tibble() after. I'm happy to provide shortcut functions that allow doing this in one step, can you think of a good name?

@michaellevy

This comment was marked as outdated.

Copy link

michaellevy commented May 10, 2018

Sure, but then you're asking the entire R community to remember which packages use tibbles and which packages use rownames and forcing us to convert between them every time there's a junction. That's a lot of brain cycles and lines of code for what doesn't appear to be any major benefit. What do you see as the compelling reason for not supporting rownames that outweighs that cost?

@krlmlr

This comment was marked as outdated.

Copy link
Member

krlmlr commented May 18, 2018

It simplifies the code in tibble, dplyr, tidyr, and quite a few other tidyverse packages. It's an opinionated decision, but at this point I really have to ask to use data frames if you need row names. Code that expects data frames is also likely to expect a default of drop = TRUE for [.

Can you point me to the package where you are now seeing the "row names" warnings?

@michaellevy

This comment was marked as resolved.

Copy link

michaellevy commented May 18, 2018

We've silenced them, but this is the package I'm working on: https://github.com/HealthCatalyst/healthcareai-r/

I didn't realize it simplified so much else. I take your point that there are other differences between data frames and tibbles, though I suspect pushing the community to not use drop = TRUE would be a much lighter lift than not using rownames (and save bugs, whereas not using rownames doesn't seem to have that benefit).

Anyway, thanks for indulging me in this. To answer your previous question, how about as_data_frame(tbl, rownames_to = new_col_name)?

@krlmlr

This comment has been minimized.

Copy link
Member

krlmlr commented May 19, 2018

Thanks. An argument to as_tibble() (and to as.data.frame.tibble()) seems like a simple solution.

@hadley

This comment was marked as outdated.

Copy link
Member

hadley commented Jul 24, 2018

I think warning on access is too aggressive, but we could certainly make row.names.tibble() always return 1:nrow(x)

@krlmlr

This comment was marked as resolved.

Copy link
Member

krlmlr commented Jul 29, 2018

@hadley: Agree that warning on access can't work. Is there a difference between your suggestion and stripping row names altogether?

@hadley

This comment was marked as outdated.

Copy link
Member

hadley commented Jul 30, 2018

It depends on whether we re-strip on [ or not

@krlmlr

This comment was marked as resolved.

Copy link
Member

krlmlr commented Jul 30, 2018

We're stripping on [ now:

attr(result, "row.names") <- .set_row_names(nr)

Currently, the following works:

tibble::as_tibble(mtcars)["Mazda RX4", ]
#> # A tibble: 1 x 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    21     6   160   110   3.9  2.62  16.5     0     1     4     4

Created on 2018-07-30 by the reprex package (v0.2.0).

We'd lose that if we strip on creation (and subsetting), but overriding row names would keep that behavior (until the first subsetting which strips row names). To me, the current behavior is inconsistent, but changing it might break other things. I'm leaning towards a warning once per session, and a compatibility option with pkgconfig.

@krlmlr krlmlr added this to the 1.5.0 milestone Aug 20, 2018

@hadley

This comment was marked as outdated.

Copy link
Member

hadley commented Oct 6, 2018

Can we leave any major changes to row names to the next release? I'm not convinced that we've fully thought this through.

@krlmlr

This comment was marked as outdated.

Copy link
Member

krlmlr commented Oct 6, 2018

We now have a rownames argument for the relevant as_tibble() methods, with a default taken from pkgconfig. We still need to coerce to plain data frames in column_to_rownames(). I'd like to try this with the revdeps, we can always pull back by changing the default for the rownames argument.

@krlmlr

This comment was marked as outdated.

Copy link
Member

krlmlr commented Oct 6, 2018

Do you have any specific concerns?

@hadley

This comment was marked as outdated.

Copy link
Member

hadley commented Oct 7, 2018

That this is a large change, and I have no confidence that we have correctly analysed the problem and solution.

@krlmlr

This comment was marked as outdated.

Copy link
Member

krlmlr commented Oct 12, 2018

Do we have a corresponding issue in vctrs?

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Oct 22, 2018

The only new behaviour in this version is that as_tibble() now strips rownames (as it probably always should have). You can get the previous behaviour by setting rownames = NA (or pkgconfig::set_config("tibble::rownames", NA) to affect all code in a package), and there's a convenient way to turn rownames into a variable: rownames = "new_variable_name". This seems like a small change and unlikely to cause problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment