Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on invalid group_by_ and mutate_impl #1750

Closed
karldw opened this issue Apr 4, 2016 · 2 comments
Closed

Segfault on invalid group_by_ and mutate_impl #1750

karldw opened this issue Apr 4, 2016 · 2 comments
Assignees
Labels
bug an unexpected problem or unintended behavior

Comments

@karldw
Copy link
Contributor

karldw commented Apr 4, 2016

Debugging on some group_by_ code, I found I could generate a segfault with these three lines:

n_groups <- 3
sortvar <- 'Sepal.Length'
iris2 <- group_by_(iris, .dots = lazyeval::interp(~ntile(sortvar, n_groups), 
                   .values = list(sortvar=sortvar, n_groups=n_groups)))

(Sometimes the last line has to be run multiple times to segfault, but I can consistently generate a crash with fewer than six attempts.)

It's also interesting that the error from the group_by_ call is not consistent (in the cases where R doesn't segfault). The message seems to alternate between:

  • CHAR() can only be applied to a 'CHARSXP', not a 'logical'
  • CHAR() can only be applied to a 'CHARSXP', not a 'closure'
  • CHAR() can only be applied to a 'CHARSXP', not a 'integer'
  • CHAR() can only be applied to a 'CHARSXP', not a 'character'
  • CHAR() can only be applied to a 'CHARSXP', not a 'NULL'

Here's the segfault traceback:

 *** caught segfault ***
address (nil), cause 'unknown'

Traceback:
 1: .Call("dplyr_mutate_impl", PACKAGE = "dplyr", df, dots)
 2: mutate_impl(.data, dots)
 3: mutate_.tbl_df(tbl_df(.data), .dots = dots)
 4: mutate_(tbl_df(.data), .dots = dots)
 5: as.data.frame(mutate_(tbl_df(.data), .dots = dots))
 6: mutate_.data.frame(.data, .dots = new_groups[needs_mutate])
 7: mutate_(.data, .dots = new_groups[needs_mutate])
 8: group_by_prepare(.data, ..., .dots = .dots, add = add)
 9: group_by_.data.frame(iris, .dots = lazyeval::interp(~ntile(sortvar,     n_groups), .values = list(sortvar = sortvar, n_groups = n_groups)))
10: group_by_(iris, .dots = lazyeval::interp(~ntile(sortvar, n_groups),     .values = list(sortvar = sortvar, n_groups = n_groups)))

The issue is the same for the CRAN and development versions of dplyr (versions 0.4.3 and 0.4.3.9001). Here's my sessionInfo (with the development versions of dplyr and lazyeval):

R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.3.9001

loaded via a namespace (and not attached):
[1] magrittr_1.5   R6_2.1.2       assertthat_0.1 DBI_0.3.1      tools_3.2.4   
[6] tibble_1.0-1   Rcpp_0.12.4    ulimit_0.0-3  

Possibly related: #1668, #1641.

This is not urgent! Obviously, fixing edge cases where users are writing buggy code shouldn't be a priority, but segfaults are always a little disconcerting.

@hadley hadley added bug an unexpected problem or unintended behavior data frame labels Apr 19, 2016
@romainfrancois
Copy link
Member

romainfrancois commented May 3, 2016

I would not expect this to work as the expression you are interpreting is:

> lazyeval::interp(~ntile(sortvar, n_groups), .values = list(sortvar=sortvar, n_groups=n_groups))
~ntile("Sepal.Length", 3)

See the quotes there. This however works:

> iris2 <- group_by_(iris, .dots = lazyeval::interp(~ntile(sortvar, n_groups), .values = list(sortvar=as.name(sortvar), n_groups=n_groups)))
> iris2
Source: local data frame [150 x 6]
Groups: ntile(Sepal.Length, 3) [3]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...
Variables not shown: ntile(Sepal.Length, 3) <int>.

See the as.name.

@romainfrancois romainfrancois self-assigned this May 3, 2016
@romainfrancois
Copy link
Member

So the problem becomes:

> iris %>% mutate( a = ntile("Sepal.Length", 3 ) )
Error in mutate_impl(.data, dots) :
  CHAR() can only be applied to a 'CHARSXP', not a 'list'

romainfrancois added a commit that referenced this issue May 3, 2016
sicarul pushed a commit to sicarul/dplyr that referenced this issue May 4, 2016
as the number of rows of the data to process, fall back to R
eval otherwise.

This deals with the segfault reported in tidyverse#1750.

closes tidyverse#1750
sicarul pushed a commit to sicarul/dplyr that referenced this issue May 4, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants