Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeatable segfault when using 'do' #1074

Closed
cswarth opened this issue Apr 13, 2015 · 1 comment
Closed

Repeatable segfault when using 'do' #1074

cswarth opened this issue Apr 13, 2015 · 1 comment
Assignees

Comments

@cswarth
Copy link

@cswarth cswarth commented Apr 13, 2015

This simple script will reliably crash R 3.1.3

suppressMessages(library(dplyr))
sessionInfo()
df <- data.frame(groups=c(1, 2, 3, 4,4,4), value=1)
df %>% 
    group_by(groups) %>% 
    do( {
           .[.$value==first(.$value)]
       })

Traceback is,

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: .Call("dplyr_rbind_all", PACKAGE = "dplyr", dots)
 2: rbind_all(out[[1]])
 3: label_output_dataframe(labels, out, groups(.data))
 4: do_.grouped_df(.data, .dots = lazyeval::lazy_dots(...))
 5: do_(.data, .dots = lazyeval::lazy_dots(...))
 6: do(., {    .[.$value == first(.$value)]})
 7: function_list[[k]](value)
 8: withVisible(function_list[[k]](value))
 9: freduce(value, `_function_list`)
10: `_fseq`(`_lhs`)
11: eval(expr, envir, enclos)
12: eval(quote(`_fseq`(`_lhs`)), env, env)
13: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
14: df %>% group_by(groups) %>% do({    .[.$value == first(.$value)]})

Can be avoided by properly indexing dataframe in do() clause, e.g. just putting a ',' to fetch all columns of the group:

df %>% 
    group_by(groups) %>% 
    do( {
           .[.$value==first(.$value),]
       })
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C              LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8     LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                 LC_ADDRESS=C              LC_TELEPHONE=C            LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.1

loaded via a namespace (and not attached):
[1] assertthat_0.1.0.99 DBI_0.3.1           magrittr_1.5        parallel_3.1.3      Rcpp_0.11.4        
@romainfrancois romainfrancois self-assigned this Apr 14, 2015
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 14, 2015

The code was making a corrupt data frame for index 4. Now we get:

> df <- data.frame(groups=c(1, 2, 3, 4,4,4), value=1)
> df %>%
+     group_by(groups) %>%
+     do({
+            .[.$value==first(.$value)]
+
+        })
Erreur : corrupt data frame at index 4

Loading

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants