Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grouped_df throws errors in add_column #303

Closed
t-kalinowski opened this issue Sep 6, 2017 · 15 comments
Closed

grouped_df throws errors in add_column #303

t-kalinowski opened this issue Sep 6, 2017 · 15 comments

Comments

@t-kalinowski
Copy link

@t-kalinowski t-kalinowski commented Sep 6, 2017

library(dplyr, warn.conflicts = FALSE)
library(tibble)

df <- tibble(x = 1:3, y = 1:3)

df %>% add_column(z = 4:6, .before = "y")
#> # A tibble: 3 x 3
#>       x     z     y
#>   <int> <int> <int>
#> 1     1     4     1
#> 2     2     5     2
#> 3     3     6     3
df %>% group_by(x) %>% add_column(z = 4:6, .before = "y")
#> Error: is.data.frame(df) is not TRUE

digging into this a little it seems to be caused by cbind() returning a matrix when called on a grouped_df

cbind(group_by(df, x), tibble(z = 1:3))
#>   [,1]      [,2]     
#> x Integer,3 Integer,3
#> y Integer,3 Integer,3
cbind(df, tibble(z = 1:3))
#>   x y z
#> 1 1 1 1
#> 2 2 2 2
#> 3 3 3 3
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Sep 6, 2017

Does bind_cols() work?

@t-kalinowski
Copy link
Author

@t-kalinowski t-kalinowski commented Sep 6, 2017

Yes (with dev version of dplyr and tibble)

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Sep 12, 2017

Is the behavior any different with the CRAN version?

@t-kalinowski
Copy link
Author

@t-kalinowski t-kalinowski commented Sep 12, 2017

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Sep 12, 2017

Closing, we don't currently support cbind().

@krlmlr krlmlr closed this Sep 12, 2017
@t-kalinowski
Copy link
Author

@t-kalinowski t-kalinowski commented Sep 12, 2017

The issue is aboutadd_column, not cbind.

@t-kalinowski
Copy link
Author

@t-kalinowski t-kalinowski commented Sep 12, 2017

The problem is caused by the call of cbind here:

tibble/R/add.R

Line 141 in 6320d77

out <- cbind(df, .data)

@krlmlr krlmlr reopened this Sep 13, 2017
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Sep 13, 2017

Bummer. This seems to mean that we need to make add_column() generic, with an implementation in dplyr.

@t-kalinowski
Copy link
Author

@t-kalinowski t-kalinowski commented Sep 13, 2017

Hmm, seems not ideal.
Would an approach based on [[<-to add the columns and then [ to reorder the columns be simpler?

@DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Sep 15, 2017

@krlmlr I was just about to post an issue asking for add_column() to be generic.

For packages that extend tibble, like my new tibbletime package, I have to handle the dropping of extra classes and attributes that occurs with most tidyverse functions. Normally I create a tidyverse_fun.tbl_time generic version that drops the classes and attributes that I dont want to lose, calls the function and then adds them back.

Since add_column isn't generic, I can't do that there, and that call to cbind() removes the classes and attributes that I need to keep. At the end of add_column(), the classes are added back, but the attributes are not, which causes a lot of problems!

So for my problem, either add_column would need to be generic or it should also try and add the attributes of the original .data back on. Any help there is definitely welcome!

@t-kalinowski
Copy link
Author

@t-kalinowski t-kalinowski commented Sep 15, 2017

Most classes that build on top of tbl_df I think already need to define a [ method.... my preference would be to try to minimize the total number of methods that need to be defined in order to extend and build ontop of tbl_df. I think building add_column with [ is one such opportunity.

@DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Sep 16, 2017

Solution

I think I have something that fixes everyone's problems using [.

https://github.com/DavisVaughan/tibble/blob/master/R/add.R#L141

Passes all tests for test-add.R.

Fixes t-kalinowski's problem

library(dplyr, warn.conflicts = FALSE)
library(tibble)

df <- tibble(x = 1:3, y = 1:3)

df %>% add_column(z = 4:6, .before = "y")
#> # A tibble: 3 x 3
#>       x     z     y
#>   <int> <int> <int>
#> 1     1     4     1
#> 2     2     5     2
#> 3     3     6     3

df %>% group_by(x) %>% add_column(z = 4:6, .before = "y")
#> # A tibble: 3 x 3
#> # Groups: x [3]
#>       x     z     y
#>   <int> <int> <int>
#> 1     1     4     1
#> 2     2     5     2
#> 3     3     6     3

Fixes my own problem

Classes and attrs are no longer lost.
Classes of the original .data are kept.

library(tibbletime)
#> 
#> Attaching package: 'tibbletime'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(tibble)

data(FB)
FB <- as_tbl_time(FB, date)

class(FB)
#> [1] "tbl_time"   "tbl_df"     "tbl"        "data.frame"
attr(FB, "index")
#> <quosure: frame>
#> ~date

FB <- FB %>% add_column(adj2 = FB$adjusted + 1)

class(FB)
#> [1] "tbl_time"   "tbl_df"     "tbl"        "data.frame"
attr(FB, "index")
#> <quosure: frame>
#> ~date
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Sep 16, 2017

Thanks, happy to review a PR.

krlmlr added a commit that referenced this issue Sep 17, 2017
- Prevent `add_column()` from dropping classes and attributes by removing the use of `cbind()`. Additionally this ensures that `add_column()` can be used with grouped data frames (#303, @DavisVaughan).
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Oct 4, 2017

Closed by #306.

@krlmlr krlmlr closed this Oct 4, 2017
krlmlr added a commit that referenced this issue Nov 3, 2017
- In `glimpse()`, compute `type_sum()` from data frame for dbplyr compatibility (#328).
- `as_tibble.matrix()` repairs column names.
- Compatible with R 3.1 (#323).
- Make add_case() and alias for add_row() (#324, @LaDilettante).
- `add_column()` to an empty zero-row tibble with a variable of nonzero length now produces a correct error message (#319).
- Logical indexes are supported, a warning is raised if the length does not match the number of rows or 1 (#318).
- Tibbles now support character subsetting (#312).
- `as_tibble()` gains `rownames` argument (#288, #289).
- Remove Rcpp dependency (#313, @patperry).
- ``` `[.tbl_df`() ``` supports `drop = TRUE` and omits the warning if `j` is passed. The calls `df[i, j, drop = TRUE]` and `df[i, drop = TRUE]` are now compatible with data frames again (#307, #311).
- Prevent `add_column()` from dropping classes and attributes by removing the use of `cbind()`. Additionally this ensures that `add_column()` can be used with grouped data frames (#303, @DavisVaughan).
- Fixed width for word wrapping of the extra information (#301).
krlmlr added a commit that referenced this issue Dec 28, 2017
New formatting
--------------

The new pillar package is now responsible for formatting tibbles. Pillar will try to display as many columns as possible, if necessary truncating or shortening the output. Colored output highlights important information and guides the eye. The vignette in the tibble package describes how to adapt custom data types for optimal display in a tibble.

New features
------------

- Make `add_case()` an alias for `add_row()` (#324, @LaDilettante).
- `as_tibble()` gains `rownames` argument (#288, #289).
- `as_tibble.matrix()` repairs column names.
- Tibbles now support character subsetting (#312).
- ``` `[.tbl_df`() ``` supports `drop = TRUE` and omits the warning if `j` is passed. The calls `df[i, j, drop = TRUE]` and `df[i, drop = TRUE]` are now compatible with data frames again (#307, #311).

Bug fixes
---------

- Improved compatibility with remote data sources for `glimpse()` (#328).
- Logical indexes are supported, a warning is raised if the length does not match the number of rows or 1 (#318).
- Fixed width for word wrapping of the extra information (#301).
- Prevent `add_column()` from dropping classes and attributes by removing the use of `cbind()`. Additionally this ensures that `add_column()` can be used with grouped data frames (#303, @DavisVaughan).
- `add_column()` to an empty zero-row tibble with a variable of nonzero length now produces a correct error message (#319).

Internal changes
----------------

- Reexporting `has_name()` from rlang, instead of forwarding, to avoid warning when importing both rlang and tibble.
- Compatible with R 3.1 (#323).
- Remove Rcpp dependency (#313, @patperry).
@github-actions
Copy link

@github-actions github-actions bot commented Dec 12, 2020

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants