Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeatable segfault when using 'do' #1074

Closed
cswarth opened this issue Apr 13, 2015 · 1 comment
Closed

Repeatable segfault when using 'do' #1074

cswarth opened this issue Apr 13, 2015 · 1 comment
Assignees

Comments

@cswarth
Copy link

cswarth commented Apr 13, 2015

This simple script will reliably crash R 3.1.3

suppressMessages(library(dplyr))
sessionInfo()
df <- data.frame(groups=c(1, 2, 3, 4,4,4), value=1)
df %>% 
    group_by(groups) %>% 
    do( {
           .[.$value==first(.$value)]
       })

Traceback is,

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: .Call("dplyr_rbind_all", PACKAGE = "dplyr", dots)
 2: rbind_all(out[[1]])
 3: label_output_dataframe(labels, out, groups(.data))
 4: do_.grouped_df(.data, .dots = lazyeval::lazy_dots(...))
 5: do_(.data, .dots = lazyeval::lazy_dots(...))
 6: do(., {    .[.$value == first(.$value)]})
 7: function_list[[k]](value)
 8: withVisible(function_list[[k]](value))
 9: freduce(value, `_function_list`)
10: `_fseq`(`_lhs`)
11: eval(expr, envir, enclos)
12: eval(quote(`_fseq`(`_lhs`)), env, env)
13: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
14: df %>% group_by(groups) %>% do({    .[.$value == first(.$value)]})

Can be avoided by properly indexing dataframe in do() clause, e.g. just putting a ',' to fetch all columns of the group:

df %>% 
    group_by(groups) %>% 
    do( {
           .[.$value==first(.$value),]
       })
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C              LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8     LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                 LC_ADDRESS=C              LC_TELEPHONE=C            LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.1

loaded via a namespace (and not attached):
[1] assertthat_0.1.0.99 DBI_0.3.1           magrittr_1.5        parallel_3.1.3      Rcpp_0.11.4        
@romainfrancois
Copy link
Member

The code was making a corrupt data frame for index 4. Now we get:

> df <- data.frame(groups=c(1, 2, 3, 4,4,4), value=1)
> df %>%
+     group_by(groups) %>%
+     do({
+            .[.$value==first(.$value)]
+
+        })
Erreur : corrupt data frame at index 4

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants