New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grouped_df throws errors in add_column #303

Closed
t-kalinowski opened this Issue Sep 6, 2017 · 14 comments

Comments

Projects
None yet
3 participants
@t-kalinowski

t-kalinowski commented Sep 6, 2017

library(dplyr, warn.conflicts = FALSE)
library(tibble)

df <- tibble(x = 1:3, y = 1:3)

df %>% add_column(z = 4:6, .before = "y")
#> # A tibble: 3 x 3
#>       x     z     y
#>   <int> <int> <int>
#> 1     1     4     1
#> 2     2     5     2
#> 3     3     6     3
df %>% group_by(x) %>% add_column(z = 4:6, .before = "y")
#> Error: is.data.frame(df) is not TRUE

digging into this a little it seems to be caused by cbind() returning a matrix when called on a grouped_df

cbind(group_by(df, x), tibble(z = 1:3))
#>   [,1]      [,2]     
#> x Integer,3 Integer,3
#> y Integer,3 Integer,3
cbind(df, tibble(z = 1:3))
#>   x y z
#> 1 1 1 1
#> 2 2 2 2
#> 3 3 3 3
@krlmlr

This comment has been minimized.

Member

krlmlr commented Sep 6, 2017

Does bind_cols() work?

@t-kalinowski

This comment has been minimized.

t-kalinowski commented Sep 6, 2017

Yes (with dev version of dplyr and tibble)

@krlmlr

This comment has been minimized.

Member

krlmlr commented Sep 12, 2017

Is the behavior any different with the CRAN version?

@t-kalinowski

This comment has been minimized.

t-kalinowski commented Sep 12, 2017

@krlmlr

This comment has been minimized.

Member

krlmlr commented Sep 12, 2017

Closing, we don't currently support cbind().

@krlmlr krlmlr closed this Sep 12, 2017

@t-kalinowski

This comment has been minimized.

t-kalinowski commented Sep 12, 2017

The issue is aboutadd_column, not cbind.

@t-kalinowski

This comment has been minimized.

t-kalinowski commented Sep 12, 2017

The problem is caused by the call of cbind here:

tibble/R/add.R

Line 141 in 6320d77

out <- cbind(df, .data)

@krlmlr krlmlr reopened this Sep 13, 2017

@krlmlr

This comment has been minimized.

Member

krlmlr commented Sep 13, 2017

Bummer. This seems to mean that we need to make add_column() generic, with an implementation in dplyr.

@t-kalinowski

This comment has been minimized.

t-kalinowski commented Sep 13, 2017

Hmm, seems not ideal.
Would an approach based on [[<-to add the columns and then [ to reorder the columns be simpler?

@DavisVaughan

This comment has been minimized.

Member

DavisVaughan commented Sep 15, 2017

@krlmlr I was just about to post an issue asking for add_column() to be generic.

For packages that extend tibble, like my new tibbletime package, I have to handle the dropping of extra classes and attributes that occurs with most tidyverse functions. Normally I create a tidyverse_fun.tbl_time generic version that drops the classes and attributes that I dont want to lose, calls the function and then adds them back.

Since add_column isn't generic, I can't do that there, and that call to cbind() removes the classes and attributes that I need to keep. At the end of add_column(), the classes are added back, but the attributes are not, which causes a lot of problems!

So for my problem, either add_column would need to be generic or it should also try and add the attributes of the original .data back on. Any help there is definitely welcome!

@t-kalinowski

This comment has been minimized.

t-kalinowski commented Sep 15, 2017

Most classes that build on top of tbl_df I think already need to define a [ method.... my preference would be to try to minimize the total number of methods that need to be defined in order to extend and build ontop of tbl_df. I think building add_column with [ is one such opportunity.

@DavisVaughan

This comment has been minimized.

Member

DavisVaughan commented Sep 16, 2017

Solution

I think I have something that fixes everyone's problems using [.

https://github.com/DavisVaughan/tibble/blob/master/R/add.R#L141

Passes all tests for test-add.R.

Fixes t-kalinowski's problem

library(dplyr, warn.conflicts = FALSE)
library(tibble)

df <- tibble(x = 1:3, y = 1:3)

df %>% add_column(z = 4:6, .before = "y")
#> # A tibble: 3 x 3
#>       x     z     y
#>   <int> <int> <int>
#> 1     1     4     1
#> 2     2     5     2
#> 3     3     6     3

df %>% group_by(x) %>% add_column(z = 4:6, .before = "y")
#> # A tibble: 3 x 3
#> # Groups: x [3]
#>       x     z     y
#>   <int> <int> <int>
#> 1     1     4     1
#> 2     2     5     2
#> 3     3     6     3

Fixes my own problem

Classes and attrs are no longer lost.
Classes of the original .data are kept.

library(tibbletime)
#> 
#> Attaching package: 'tibbletime'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(tibble)

data(FB)
FB <- as_tbl_time(FB, date)

class(FB)
#> [1] "tbl_time"   "tbl_df"     "tbl"        "data.frame"
attr(FB, "index")
#> <quosure: frame>
#> ~date

FB <- FB %>% add_column(adj2 = FB$adjusted + 1)

class(FB)
#> [1] "tbl_time"   "tbl_df"     "tbl"        "data.frame"
attr(FB, "index")
#> <quosure: frame>
#> ~date
@krlmlr

This comment has been minimized.

Member

krlmlr commented Sep 16, 2017

Thanks, happy to review a PR.

krlmlr added a commit that referenced this issue Sep 17, 2017

Merge pull request #306 from DavisVaughan/master
- Prevent `add_column()` from dropping classes and attributes by removing the use of `cbind()`. Additionally this ensures that `add_column()` can be used with grouped data frames (#303, @DavisVaughan).
@krlmlr

This comment has been minimized.

Member

krlmlr commented Oct 4, 2017

Closed by #306.

@krlmlr krlmlr closed this Oct 4, 2017

krlmlr added a commit that referenced this issue Nov 3, 2017

Merge tag 'v1.3.4.9002'
- In `glimpse()`, compute `type_sum()` from data frame for dbplyr compatibility (#328).
- `as_tibble.matrix()` repairs column names.
- Compatible with R 3.1 (#323).
- Make add_case() and alias for add_row() (#324, @LaDilettante).
- `add_column()` to an empty zero-row tibble with a variable of nonzero length now produces a correct error message (#319).
- Logical indexes are supported, a warning is raised if the length does not match the number of rows or 1 (#318).
- Tibbles now support character subsetting (#312).
- `as_tibble()` gains `rownames` argument (#288, #289).
- Remove Rcpp dependency (#313, @patperry).
- ``` `[.tbl_df`() ``` supports `drop = TRUE` and omits the warning if `j` is passed. The calls `df[i, j, drop = TRUE]` and `df[i, drop = TRUE]` are now compatible with data frames again (#307, #311).
- Prevent `add_column()` from dropping classes and attributes by removing the use of `cbind()`. Additionally this ensures that `add_column()` can be used with grouped data frames (#303, @DavisVaughan).
- Fixed width for word wrapping of the extra information (#301).

krlmlr added a commit that referenced this issue Dec 28, 2017

Merge tag 'v1.4.1'
New formatting
--------------

The new pillar package is now responsible for formatting tibbles. Pillar will try to display as many columns as possible, if necessary truncating or shortening the output. Colored output highlights important information and guides the eye. The vignette in the tibble package describes how to adapt custom data types for optimal display in a tibble.

New features
------------

- Make `add_case()` an alias for `add_row()` (#324, @LaDilettante).
- `as_tibble()` gains `rownames` argument (#288, #289).
- `as_tibble.matrix()` repairs column names.
- Tibbles now support character subsetting (#312).
- ``` `[.tbl_df`() ``` supports `drop = TRUE` and omits the warning if `j` is passed. The calls `df[i, j, drop = TRUE]` and `df[i, drop = TRUE]` are now compatible with data frames again (#307, #311).

Bug fixes
---------

- Improved compatibility with remote data sources for `glimpse()` (#328).
- Logical indexes are supported, a warning is raised if the length does not match the number of rows or 1 (#318).
- Fixed width for word wrapping of the extra information (#301).
- Prevent `add_column()` from dropping classes and attributes by removing the use of `cbind()`. Additionally this ensures that `add_column()` can be used with grouped data frames (#303, @DavisVaughan).
- `add_column()` to an empty zero-row tibble with a variable of nonzero length now produces a correct error message (#319).

Internal changes
----------------

- Reexporting `has_name()` from rlang, instead of forwarding, to avoid warning when importing both rlang and tibble.
- Compatible with R 3.1 (#323).
- Remove Rcpp dependency (#313, @patperry).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment