`[` drops attributes of df #155

t-kalinowski · 2016-08-17T17:29:55Z

possibly related to tidyverse/dplyr#1984

library(tibble)
df <- tibble(x = 1:2, y = x)
attr(df, "along for the ride") <- "still here"

attr(df, "along for the ride")
# [1] "still here"

df <- df[names(df)]
attr(df, "along for the ride")
# NULL

this is my current workaround:

df[] <- df[names(df)]

The text was updated successfully, but these errors were encountered:

krlmlr · 2016-08-18T08:30:50Z

Hm, I'm not sure if we want to maintain all attributes. What if they somehow relate to the number of rows or columns?

A safer way would be to define your own class and implement [ by calling NextMethod() and copying over all attributes you want to.

What is your use case?

t-kalinowski · 2016-08-18T11:44:32Z

I'm not sure if my use case is broadly applicable. I'm trying to delicately (lazily?) add some functionality to existing code by adding an attribute "rider" that contains some additional information doesn't fit neatly into the dataframe. My thought was that the attribute would be safely ignored in all the code except where I then explicitly look for it.

The alternatives approach that comes to mind (besides a new class as you mention) is forcing my core data frame and new "rider" into a list at one point, then unpacking it midway in the code path into two variables that get passed as separate arguments to the functions I'm trying to add.

I'm not sure where attributes fit into the tidyverse. To me, user-defined attributes (i.e., not dim and names) are never added "by mistake", and therefore seem like something that should be added, dropped, or modified only when explicitly called for by the user, and should otherwise be carried over by all the "infrastructure" code.

But maybe that encourages bad coding. I honestly don't know.

krlmlr · 2016-08-18T12:54:24Z

As far as I understand tidy data, assuming there is more than one rider, you'd probably use a data frame with a "rider" and a "data" column, the "data" column would be a list of data frames. A row from that data frame naturally becomes a named list, which is in the spirit of the alternative approach you sketched.

From another angle, the attribute defines something like a "has-a" relationship: the tabular data also "has a rider". The list approach you suggested would be more like a compound object that has both tabular data and a rider. This looks tidier to me, but then I don't know the structure of your data.

As shown in in the dplyr issue you linked, dplyr verbs also don't seem to (always) maintain attributes. This guarantee would be a feature that needs to be specified, implemented, and thoroughly tested. I'd review a pull request, but to me this is currently not a top priority for tibble.

krlmlr · 2016-08-26T12:10:50Z

Closing for now, please comment or open a new issue for further questions.

t-kalinowski · 2016-08-26T16:10:39Z

Thanks for taking a look. I agree this is not high priority.

I was reminded of this recently when I updated readr and saw that readr tables now come with a new "spec" attribute. This addition is very analogous to my use case that led me to file this issue. I was adding an attribute that described how the tbl_df was generated (a legend, of sorts), and i was also grafting it onto existing code while trying to not break anything.

It's clear that "spec" as an attribute is the correct way for readr to add that information to the returned object. The alternatives of adding a new column to the returned data frame, or returning a list (compound object), are not appropriate. So the "spec" attribute from readr makes for a good and specific thought experiment. Should functions like [ or mutate() preserve it (and arbitrary, user-defined, attributes like it)? Or should the "spec" attribute be lost as soon as the data frame is modified in any way.

The more I think about it, the less of an opinion I have on what is the correct decision. It would just be nice if there was a consistent behavior across dplyr / purrr / tibble et al.

But again, I agree this is low priority right now. Thank you for your time and attention.

jennybc · 2016-08-26T22:03:23Z

define your own class and implement [

I basically have the same question asked here. I've been building S3 classes based on tibble with the main motivation being to create a print method. So the loss of nice printing (due to the loss of my class) just from looking at a subset of the rows is a source of aggravation.

But I'm also waiting to see if I can exert my will over printing by applying a class to specific columns and then allowing tibble printing to happen normally. Therefore, in my case, this is very linked to #144.

krlmlr · 2016-08-26T22:08:03Z

@t-kalinowski: In the readr case, I think the spec apply to the imported data, but not so much for any modifications of it. My feeling is that stripping the attribute isn't the worst thing to do here.

@jennybc: But who is removing your class attribute? Can you please show an example?

t-kalinowski · 2016-08-27T13:21:49Z

Some further thoughts:
Conceptually, I view this as related to #141 and #156, and driven by the same goal as Hmisc::label and base::comment (which is then respected by, for example, package:zoo).

All are driven by a desire to add richer annotations to the tidy data, and for those annotations to travel with the object.

Each of the examples above is limited to character strings unfortunately. It would be nice if annotations could also be lists.

What if they somehow relate to the number of rows or columns?

It might make sense to support attributes at both levels, annotations for individual vectors, and annotations for the data frame.
Also, annotations like this are primarily intended for human interpretation, and so the potential harm is limited.

jennybc · 2016-08-27T15:24:28Z

Here's an example of a tibble losing my class from row-subsetting:

library(tibble)

x <- structure(head(iris), class = c("jenny", "data.frame"))
class(x)
#> [1] "jenny"      "data.frame"
class(x[1:2, ])
#> [1] "jenny"      "data.frame"

xt <- structure(tibble::as_tibble(head(iris)),
                class = c("jenny", "tbl_df", "tbl", "data.frame"))
class(xt)
#> [1] "jenny"      "tbl_df"     "tbl"        "data.frame"
class(xt[1:2, ])
#> [1] "tbl_df"     "tbl"        "data.frame"

t-kalinowski · 2016-10-11T16:11:53Z

I just came across the last example in ?[.data.frame`` link and realized that using NextMethod() with my own class this is much simpler than I originally thought. I'll be doing that. Thanks for the pointer.

krlmlr · 2017-04-17T21:01:21Z

Can we agree that subclasses of tbl_df should provide their own [ method, like grouped_df does?

library(dplyr, warn.conflicts = FALSE)
group_by(iris, Species)[45:55,]
#> Source: local data frame [11 x 5]
#> Groups: Species [2]
#> 
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#>           <dbl>       <dbl>        <dbl>       <dbl>     <fctr>
#> 1           5.1         3.8          1.9         0.4     setosa
#> 2           4.8         3.0          1.4         0.3     setosa
#> 3           5.1         3.8          1.6         0.2     setosa
#> 4           4.6         3.2          1.4         0.2     setosa
#> 5           5.3         3.7          1.5         0.2     setosa
#> 6           5.0         3.3          1.4         0.2     setosa
#> 7           7.0         3.2          4.7         1.4 versicolor
#> 8           6.4         3.2          4.5         1.5 versicolor
#> 9           6.9         3.1          4.9         1.5 versicolor
#> 10          5.5         2.3          4.0         1.3 versicolor
#> 11          6.5         2.8          4.6         1.5 versicolor

t-kalinowski · 2017-04-18T12:15:56Z

Over time, my opinion on this has completely reversed. I think the current behavior, where subclasses of tbl_df should provide their own [ method, is more preferable now. I'm ok to close this issue. Thank you for considering.

jennybc · 2017-04-18T15:48:03Z

I think that's fine. Then I think it's important to document this well. Maybe there could be a vignette on making your own tibble-based classes. And it could list the minimal set of methods you should implement, with an example.

krlmlr · 2018-03-01T07:45:00Z

I'm happy to call sloop::reconstruct() in [.tbl_df once we standardize on it.

Deleetdk · 2018-03-09T15:57:41Z

Any news on this? Many dplyr verbs drop classes too and it seems silly to have to implement a custom version of every single dplyr verb as well as the base subsetting ones to avoid custom classes being dropped. My case use is similar to @jennybc, namely that I'm building on top of tibble to add a custom print and some light data integrity testing, but I want it to stay functional with all the usual data frame functions and my data is tidy.

There's a lot of related issues and questions (some of which relate to attributes generally):

krlmlr · 2018-03-12T08:24:00Z

Thanks for the reminder and for collecting the pointers. We need sloop before we can work on it. tidyverse/dplyr#3259 is the only open dplyr issue that covers this problem. Please upvote (add 👍 to the first post with the "smiley" icon).

vspinu · 2018-03-18T18:03:29Z

Is there (will it be) an easy solution without requiring dplyr, tibble, sloop etc? Maybe an attributes marker "leave-my-attributes-alone" or a simple similarly-minded method?

My problem is exactly as the one described by @jennybc . I add an extra class to a data.frame to customize printing and a bunch of non-structural attributes.

vspinu · 2018-03-18T18:39:58Z

Can we agree that subclasses of tbl_df should provide their own [

Would the default that none of the attributes are vectorized on columns or rows make more sense? Then, only the subclasses dealing with such attributes would need to provide the [ method.

krlmlr · 2018-03-19T00:22:18Z

I like the idea of a special attribute that contains the names of all attributes that are safe to copy over. This can be used later with a solution based on reconstruct(), too.

t-kalinowski · 2018-03-19T01:16:22Z

I like the idea, but mostly as a mechanism to more precisely control attribute-copying behavior. However, I would prefer if the default was to copy over all attributes, even when this attribute is missing.
In any case, it made me think of this:

vspinu · 2018-03-19T09:58:12Z

The exclusion declaration - list all attributes which should't be copied - seems more natural to me as well. But I am not sure how useful would it be. Subclasses which deal with such attributes will probably have to declare their [ method anyway, unless there is a provision for row-subsetable-attributes and col-subsetable-attributes attributes.

t-kalinowski · 2018-03-19T21:53:49Z

hmm... if the attribute so neatly fits into being row-subsettable, why would it be an attribute and not an additional column? Also, if it neatly fits into being column subsettable, why would the attribute be set on the whole tibble and not on the column itself? Are you using attributes to implement user-hidden columns?

vspinu · 2018-03-19T23:03:57Z

I am not using attributes vectorized on rows or columns, but from what I see others do and, if I understand correctly, this is the foremost reason why attributes are dropped by dplyr verbs in the first place.

attribute and not an additional column?

Because it's internal to the class representation and do not constitute user level data.

krlmlr · 2018-03-21T22:43:25Z

I think e.g. sf is doing the right thing by keeping the geometry in a regular column. Supporting "special" attributes that act as an extra column or row feels out of scope.

The opt-in interface (attributes³) feels like the safer option, though. This only affects packages that subclass tibbles, I'm happy to make this easier but I'd rather not trade this for the safety of just stripping all attributes if in doubt.

Deleetdk · 2018-03-21T23:01:43Z

In my case, the information stored in attributes usually has a different structure than the main data. Often it's a single row data frame with other rows than the main one, used for various metadata. E.g. a chromosomes data frame has 24 chromosomes (rows) and their metrics (e.g. copy number, in columns), while the attribute has some genome level information (e.g. ISCN karyotype).

hadley · 2018-09-07T15:20:24Z

As a short-term solution, I think the functions in tibble that create new tibbles from an existing tibble (e.g. [.tbl_df) should copy over all attributes — and I think the short-term if that doesn't work for your class, you'll need to override the method. (In other words, the default will change from dropping all attributes to preserving all attributes). This also allows random additional attributes to be carried along for the ride.

In the long-term, these function should call sloop::reconstruct() (which is likely to move into the vctrs package and get a new name); that way people extending tibble with row or col based attributes will only need to override a single method.

hadley · 2018-09-07T15:21:14Z

This issue can be closed a PR. We should then open a new issue that lists all functions that create a tibble, and that will need to use reconstruct() in the future.

hadley · 2018-09-28T15:07:00Z

Note that sloop::reconstruct() is now vctrs::vec_restore()

github-actions · 2020-12-09T00:40:03Z

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

krlmlr closed this as completed Aug 26, 2016

krlmlr reopened this Aug 26, 2016

krlmlr added this to the 1.3 milestone Aug 26, 2016

jennybc mentioned this issue Sep 7, 2016

print.tbl_df needs extra args #170

Closed

tinu-schneider mentioned this issue Oct 14, 2016

class of element in data_frame is dropped when subsetting with [ tidyverse/dplyr#2172

Closed

hadley mentioned this issue Jul 5, 2017

Clarify how to extend tibbles #275

Closed

krlmlr added the category:interface label Mar 1, 2018

krlmlr removed this from the 1.4.0 milestone Sep 28, 2018

krlmlr added this to the 1.5.0 milestone Oct 5, 2018

This was referenced Oct 13, 2018

Copy most attributes in [.tbl_df() #501

Closed

vec_restore_tbl_df() implementation #512

Merged

jennybc mentioned this issue Oct 19, 2018

add_column loses rows if applied to a tibble with custom class & [ method for the class #511

Closed

krlmlr closed this as completed in #512 Oct 24, 2018

github-actions bot locked and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`[` drops attributes of df #155

`[` drops attributes of df #155

t-kalinowski commented Aug 17, 2016 •

edited by krlmlr

Loading

krlmlr commented Aug 18, 2016

t-kalinowski commented Aug 18, 2016

krlmlr commented Aug 18, 2016

krlmlr commented Aug 26, 2016

t-kalinowski commented Aug 26, 2016 •

edited

Loading

jennybc commented Aug 26, 2016 •

edited

Loading

krlmlr commented Aug 26, 2016

t-kalinowski commented Aug 27, 2016 •

edited

Loading

jennybc commented Aug 27, 2016

t-kalinowski commented Oct 11, 2016

krlmlr commented Apr 17, 2017

t-kalinowski commented Apr 18, 2017 •

edited

Loading

jennybc commented Apr 18, 2017

krlmlr commented Mar 1, 2018

Deleetdk commented Mar 9, 2018

krlmlr commented Mar 12, 2018

vspinu commented Mar 18, 2018 •

edited

Loading

vspinu commented Mar 18, 2018

krlmlr commented Mar 19, 2018

t-kalinowski commented Mar 19, 2018 •

edited

Loading

vspinu commented Mar 19, 2018

t-kalinowski commented Mar 19, 2018 •

edited

Loading

vspinu commented Mar 19, 2018

krlmlr commented Mar 21, 2018

Deleetdk commented Mar 21, 2018

hadley commented Sep 7, 2018

hadley commented Sep 7, 2018

hadley commented Sep 28, 2018

github-actions bot commented Dec 9, 2020

[ drops attributes of df #155

[ drops attributes of df #155

Comments

t-kalinowski commented Aug 17, 2016 • edited by krlmlr Loading

krlmlr commented Aug 18, 2016

t-kalinowski commented Aug 18, 2016

krlmlr commented Aug 18, 2016

krlmlr commented Aug 26, 2016

t-kalinowski commented Aug 26, 2016 • edited Loading

jennybc commented Aug 26, 2016 • edited Loading

krlmlr commented Aug 26, 2016

t-kalinowski commented Aug 27, 2016 • edited Loading

jennybc commented Aug 27, 2016

t-kalinowski commented Oct 11, 2016

krlmlr commented Apr 17, 2017

t-kalinowski commented Apr 18, 2017 • edited Loading

jennybc commented Apr 18, 2017

krlmlr commented Mar 1, 2018

Deleetdk commented Mar 9, 2018

krlmlr commented Mar 12, 2018

vspinu commented Mar 18, 2018 • edited Loading

vspinu commented Mar 18, 2018

krlmlr commented Mar 19, 2018

t-kalinowski commented Mar 19, 2018 • edited Loading

vspinu commented Mar 19, 2018

t-kalinowski commented Mar 19, 2018 • edited Loading

vspinu commented Mar 19, 2018

krlmlr commented Mar 21, 2018

Deleetdk commented Mar 21, 2018

hadley commented Sep 7, 2018

hadley commented Sep 7, 2018

hadley commented Sep 28, 2018

github-actions bot commented Dec 9, 2020

`[` drops attributes of df #155

`[` drops attributes of df #155

t-kalinowski commented Aug 17, 2016 •

edited by krlmlr

Loading

t-kalinowski commented Aug 26, 2016 •

edited

Loading

jennybc commented Aug 26, 2016 •

edited

Loading

t-kalinowski commented Aug 27, 2016 •

edited

Loading

t-kalinowski commented Apr 18, 2017 •

edited

Loading

vspinu commented Mar 18, 2018 •

edited

Loading

t-kalinowski commented Mar 19, 2018 •

edited

Loading

t-kalinowski commented Mar 19, 2018 •

edited

Loading