Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling first() in summarise() on a non-existent column causes segfault (and crashes R) #600

Closed
Qzrx opened this issue Sep 15, 2014 · 5 comments
Assignees
Labels
Milestone

Comments

@Qzrx
Copy link

@Qzrx Qzrx commented Sep 15, 2014

R version 3.1.0 (2014-04-10) -- "Spring Dance", OSX 64-bit
dplyr 0.2.0.99

If I take a data frame and then try summarizing using first value of a non-existing column, dplyr gets very unhappy. Which is good--it shouldn't silently eat the error!

Unfortunately it doesn't return an error or warning. Instead it just outright crashes the entire R session. Meep.

foo = data.frame(x = 1:10, y = 1:10)
foo %>%
  group_by(x) %>% 
  summarise(first_y = first(y)) # perfectly happy!
foo = data.frame(x = 1:10, y = 1:10)
foo %>%
  group_by(x) %>% 
  summarise(first_y = first(z)) # CRAAAAASH!
@Qzrx
Copy link
Author

@Qzrx Qzrx commented Sep 15, 2014

Full terminal output for usefulness:

> library(dplyr)

Attaching package:dplyrThe following objects are masked frompackage:stats:

    filter, lag

The following objects are masked frompackage:base:

    intersect, setdiff, setequal, union

> foo = data.frame(x = 1:10, y = 1:10)
> foo %>%
+   group_by(x) %>% 
+   summarise(first_y = first(z)) 

 *** caught segfault ***
address 0x0, cause 'memory not mapped'

Traceback:
 1: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, args, env)
 2: summarise_impl(.data, named_dots(...), environment())
 3: summarise.tbl_df(`foo %>% group_by(x)`, first_y = first(z))
 4: summarise(`foo %>% group_by(x)`, first_y = first(z))
 5: eval(expr, envir, enclos)
 6: eval(e, env)
 7: withVisible(eval(e, env))
 8: foo %>% group_by(x) %>% summarise(first_y = first(z))

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

@Qzrx Qzrx changed the title Calling first() in summarise() on a non-existent column crashes R Calling first() in summarise() on a non-existent column segfaults R Sep 16, 2014
@Qzrx Qzrx changed the title Calling first() in summarise() on a non-existent column segfaults R Calling first() in summarise() on a non-existent column causes segfault (and crashes R) Sep 16, 2014
@hadley
Copy link
Member

@hadley hadley commented Sep 16, 2014

Could you please try with the latest version? It should be fixed in d822e0d

@hadley
Copy link
Member

@hadley hadley commented Sep 22, 2014

@romainfrancois this still crashes for me. Can you please take a look?

@hadley hadley added the bug label Sep 22, 2014
@hadley hadley added this to the 0.3 milestone Sep 22, 2014
@cnjr2
Copy link

@cnjr2 cnjr2 commented Feb 19, 2016

I still seem to get the segfault using the example from above.

Please find my session output below:

All is good with:

foo = data.frame(x = 1:10, y = 1:10)
foo %>%
  group_by(x) %>% 
  summarise(first_y = first(y)) # perfectly happy!
R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin11.4.2 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library("dplyr")

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

> 
> foo = data.frame(x = 1:10, y = 1:10)
> foo %>%
+     group_by(x) %>%
+     summarise(first_y = first(y)) # perfectly happy!
Source: local data frame [10 x 2]

       x first_y
   (int)   (int)
1      1       1
2      2       2
3      3       3
4      4       4
5      5       5
6      6       6
7      7       7
8      8       8
9      9       9
10    10      10

But for:

foo = data.frame(x = 1:10, y = 1:10)
foo %>%
  group_by(x) %>% 
  summarise(first_y = first(z)) # CRAAAAASH!

It crashes:

> foo = data.frame(x = 1:10, y = 1:10)
> foo %>%
+   group_by(x) %>% 
+   summarise(first_y = first(z)) # CRAAAAASH!

 *** caught segfault ***
address 0x0, cause 'unknown'

Traceback:
 1: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
 2: summarise_impl(.data, dots)
 3: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...))
 4: summarise_(.data, .dots = lazyeval::lazy_dots(...))
 5: summarise(., first_y = first(z))
 6: function_list[[k]](value)
 7: withVisible(function_list[[k]](value))
 8: freduce(value, `_function_list`)
 9: `_fseq`(`_lhs`)
10: eval(expr, envir, enclos)
11: eval(quote(`_fseq`(`_lhs`)), env, env)
12: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
13: foo %>% group_by(x) %>% summarise(first_y = first(z))

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

Here is also my session info.

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin11.4.2 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.3

loaded via a namespace (and not attached):
[1] lazyeval_0.1.10 magrittr_1.5    R6_2.1.1        assertthat_0.1  parallel_3.2.2  tools_3.2.2     DBI_0.3.1       Rcpp_0.12.2    

I have installed R via anaconda.

@cnjr2
Copy link

@cnjr2 cnjr2 commented Feb 22, 2016

Despite my sessionInfo() printing that dplyr_0.4.3 is attached, I upgraded my dplyr in anaconda with:

>conda install -c r r-dplyr
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ......
Solving package specifications: .................
Package plan for installation in environment /Users/user/anaconda:

The following packages will be UPDATED:

    r-dplyr: 0.4.1-0 --> 0.4.3-r3.2.2_0
    zlib:    1.2.8-0 --> 1.2.8-1       

Proceed ([y]/n)? y

Unlinking packages ...
[      COMPLETE      ]|###################################################################################################################################################################################################################################################| 100%
Linking packages ...
[      COMPLETE      ]|###################################################################################################################################################################################################################################################| 100%
>

It seems that version 0.4.1-0 was installed afterall... I am not sure where it had the 4.3 from.

Anyways, running the following now works fine!

> foo = data.frame(x = 1:10, y = 1:10)
> foo %>%
+   group_by(x) %>% 
+   summarise(first_y = first(z)) # CRAAAAASH!
Error: variable 'z' not found

Thanks!

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants