Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corrupt 'group_df'` error with dplyr 0.4.3 #1385

Closed
eipi10 opened this issue Sep 4, 2015 · 8 comments
Closed

corrupt 'group_df'` error with dplyr 0.4.3 #1385

eipi10 opened this issue Sep 4, 2015 · 8 comments
Assignees
Labels
Milestone

Comments

@eipi10
Copy link
Contributor

@eipi10 eipi10 commented Sep 4, 2015

I just installed the latest version of dplyr and began getting corrupt 'group_df' errors with code that used to work. Here's an example error message:

Error: corrupt 'grouped_df', contains 657842 rows, and 657865 rows in groups

My current workaround is to add %>% ungroup() on the end of the code that returns the corrupt data frame, but I'd like to figure out what's causing the error so I can implement a real fix.

I'm working with data that I can't share and I haven't been able to recreate the error with fake data. I will update this post if I can manage to create a reproducible example.

@hadley
Copy link
Member

@hadley hadley commented Sep 4, 2015

Please provide a reproducible example

@eipi10
Copy link
Contributor Author

@eipi10 eipi10 commented Sep 5, 2015

I was able to reproduce the error with fake data, so here is a reproducible example:

library(dplyr)

set.seed(111)
dat1 = data.frame(group=sample(LETTERS[1:3], 1000, replace=TRUE),
                  y=rnorm(1000))
dat1 = dat1 %>% group_by(group)

dat2 = data.frame(group=sample(LETTERS[1:2], 100, replace=TRUE),
                  y=rnorm(100))
dat2 = dat2 %>% group_by(group)

dat3 = rbind(dat1, dat2)

dat3 %>% 
  summarise(n=n())

Error: corrupt 'grouped_df', contains 1100 rows, and 1000 rows in groups

In my actual use case I don't just do, say, dat2 = dat2 %>% group_by(group). After the grouping, I do filtering, mutating, etc., but I wasn't ungrouping after completing those operations. As a result, I ended up rbind-ing two data frames that had been separately grouped.

This kind of thing never caused a problem with earlier versions of dplyr but is causing an error with the new version. For my own edification, is there something wrong (in the sense of good programming style) with separately grouping an rbind-ing, or is this just a bug in dplyr?

@hadley
Copy link
Member

@hadley hadley commented Sep 7, 2015

Use bind_rows() instead of rbind(). Need to see if we can un-break rbind() for this use.

@eipi10
Copy link
Contributor Author

@eipi10 eipi10 commented Sep 7, 2015

Will do. Thanks Hadley.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Sep 10, 2015

yep rbind grabs the attributes from dat1. Perhaps just :

> rbind.tbl_df <- function( x, ..., deparse.level = 1 ) bind_rows(x, ...)

@hadley hadley added this to the 0.5 milestone Oct 21, 2015
@hadley hadley self-assigned this Oct 21, 2015
@hadley
Copy link
Member

@hadley hadley commented Mar 7, 2016

@krlmlr can you take this for tibble please?

@hadley hadley closed this Mar 7, 2016
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Mar 7, 2016

For this particular issue, wouldn't it be enough to define rbind.grouped_df() here?

@hadley
Copy link
Member

@hadley hadley commented Mar 7, 2016

@krlmlr oh good point

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants