Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tibble preserves names attribute of added vector in column (and as.data.frame preserves as well) #837

Closed
ghost opened this issue Jan 12, 2021 · 7 comments · Fixed by #927
Milestone

Comments

@ghost
Copy link

ghost commented Jan 12, 2021

I wonder whether a tibble should preserve a vector's names attribute when added as a column or not. A data frame drops this attribute when a column is added.

?tibble might contain a hint that this is intended behaviour, but it is not clear from that ("tibble() is much lazier than base::data.frame() in terms of transforming the user's input. Character vectors are not coerced to factor. List-columns are expressly anticipated and do not require special tricks. Column names are not modified.").

See the example below about the difference in behaviour.

I would "vote" mimicing base R data frame's behaviour here.

At least when converting back to data frame via as.data.frame.tbl_df the names attribute should be pruned from columns.

library(tibble)
a <- 1:5
b <- 6:10
names(b) <- letters[1:5]

df <- data.frame(a, a * 2)
tb <- tibble(df)

df$b <- b
tb$b <- b

lapply(df, names) # data frame's columns do not have names
#> $a
#> NULL
#> 
#> $a...2
#> NULL
#> 
#> $b
#> NULL
lapply(tb, names) # tibble's column b preserved names
#> $a
#> NULL
#> 
#> $a...2
#> NULL
#> 
#> $b
#> [1] "a" "b" "c" "d" "e"

df_tb <- as.data.frame(tb)
lapply(df_tb, names) # data frame's column b has names - illegal data frame?
#> $a
#> NULL
#> 
#> $a...2
#> NULL
#> 
#> $b
#> [1] "a" "b" "c" "d" "e"
@krlmlr
Copy link
Member

krlmlr commented Feb 14, 2021

Thanks. This has been added in tibble 3.0.0 as an experimental feature.

@ghost
Copy link
Author

ghost commented Feb 14, 2021

Thank you for clarification that this is intended for a tibble. What about as.data.frame.tbl_df though? Shouldn't that produce a data frame withouth "inner names" as such a data frame cannot be generated by default data frame methods?

@krlmlr
Copy link
Member

krlmlr commented Feb 14, 2021

I missed that part. Can you show an example where this is important?

@ghost
Copy link
Author

ghost commented Feb 14, 2021

It led to a hiccup in data processing of a package, see here https://stackoverflow.com/a/65646743/4640346, as the package did not cater for the case when there are names in columns (why should it? a data frame does not have them). Data was pruned by as.data.frame before processing.
Further - and only slightly related - see this PR#18034 about R https://bugs.r-project.org/bugzilla/show_bug.cgi?id=18034

@krlmlr krlmlr added this to the 3.1.2 milestone Apr 16, 2021
@krlmlr krlmlr modified the milestones: 3.1.2, 3.1.3 Jun 24, 2021
@krlmlr krlmlr modified the milestones: 3.1.3, 3.1.4 Jul 17, 2021
@krlmlr
Copy link
Member

krlmlr commented Jul 30, 2021

Let's remove inner names in as.data.frame.tbl_df() .

krlmlr added a commit that referenced this issue Jul 31, 2021
- `as.data.frame.tbl_df()` strips inner column names (#837).
@krlmlr
Copy link
Member

krlmlr commented Jul 31, 2021

Done now, also tweaked documentation.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2022

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant