Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lost Groups cause dplyr functions to crash R #308

Closed
statsandwich opened this issue Mar 7, 2014 · 2 comments
Closed

lost Groups cause dplyr functions to crash R #308

statsandwich opened this issue Mar 7, 2014 · 2 comments

Comments

@statsandwich
Copy link

@statsandwich statsandwich commented Mar 7, 2014

Losing Groups in a grouped_df, and subsequently using dplyr functions, causes R to crash.

If I mix non-dplyr functions like subset on grouped_dfs, the Groups are lost. When the groups are lost, any use of filter(), select(), mutate(), summarise() causes R to crash.

require( dplyr)
DF <- data.frame( C1 = 1:2, C2 = 1:2) %.% group_by(C1)

DF
Source: local data frame [2 x 2]
Groups: C1

C1 C2
1 1 1
2 2 2

** Using dplyr functions on the following grouped_df will crash R. Notice, no Groups:

DF[, names(DF)] ### instead of using select()
Source: local data frame [2 x 2]
Groups:

C1 C2
1 1 1
2 2 2

subset( DF, C1 == 1) ### instead of using filter()
Source: local data frame [1 x 2]
Groups:

C1 C2
1 1 1

This crash sneaks up over and over in converting old functions to dplyr.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Mar 14, 2014

So what happens is that R's regular [ keeps the class but drops all other attributes instead of setting the class to just data.frame.

> load_all()
Loading dplyr
>
> DF <- data.frame( C1 = 1:2, C2 = 1:2) %.% group_by(C1)
>
> d <- DF[ , names(DF) ]
> str(d)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  2 variables:
 $ C1: int  1 2
 $ C2: int  1 2
>
> summarise(d, x = mean(C2) )

 *** caught segfault ***
address 0x0, cause 'memory not mapped'

Traceback:
 1: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, args, env)
 2: summarise_impl(.data, named_dots(...), environment())
 3: summarise.tbl_df(d, x = mean(C2))
 4: summarise(d, x = mean(C2))

The internal code is now more strict at deciding that an object is a grouped data frame, instead of just relying on the class.

@statsandwich
Copy link
Author

@statsandwich statsandwich commented Mar 14, 2014

Cool - makes sense. Thanks for the explanation.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants