-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
I've been trying to reproduce this error, but I'm having difficulties. Please bare with me. Code to reproduce appears below!
I have a file with a few columns, which gets red in via read_tsv. I can then go on to group_by and mutate, and if I pipe in to distinct() it throws an error, if and only if, i add .keep_all = TRUE (or if this is the case implicitly, as in dplyr >= 0.7.0.
The error I get is:
Error in distinct_impl(dist$data, dist$vars, dist$keep) :
Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'integer'
In an effort for reproducibility, I created a gist from the file, hoping this would help reproducibility. But sometimes the error 'magically' disappears, sometimes I can reproduce it.
Here's the code that should reproduce it:
library(tidyverse)
gist <- 'https://gist.githubusercontent.com/breichholf/3b2e5eb253a932b8b0e540812811ecb6/raw/2798b2a58e281fcd3867e4dbf4adbe11f8a7b4f3/test.bed'
bed <- read_tsv(gist, col_names = c('chromosome', 'start', 'end', 'gene', 'score', 'strand', 'anno.id', 'interval.id', 'window.id'))
geneBed <-
bed %>%
group_by(interval.id) %>%
mutate(min.start = min(start),
max.end = max(end),
dist.to.start = start - min.start,
exon.len = end - start,
cds.start = min.start,
cds.end = max.end,
all.starts = paste(dist.to.start, collapse=","),
all.lens = paste(exon.len, collapse=","))
> geneBed %>% distinct(interval.id, .keep_all = TRUE)
Error in distinct_impl(dist$data, dist$vars, dist$keep) :
Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'integer'
The reason I figured it might have something to do with the encoding is that write_tsv also throws an error:
> geneBed %>% write_tsv('test.txt')
Error in stream_delim_(df, path, ...) :
'translateCharUTF8' must be called on a CHARSXP
However, as mentioned above geneBed %>% distinct(interval.id) without .keep_all = TRUE performs as expected. Additionally, perhaps of note: unique() also throws an error:
> geneBed %>% unique()
Error in paste(chromosome = c("chr1", "chr10", "chr11", "chr12", "chr11", :
'translateChar' must be called on a CHARSXP
I've tried the same code on another machine (OSX instead of linux), and can reproduce the error if it's from a fresh R session. I've (strangely only sometimes) managed to resolve the error, by splitting up mutate in to several statements, or piping directly into distinct after mutate, but haven't been able to work out how to reproduce the fix so far, unfortunately.
If there's anything I can do or try on my end please let me know.
Relevant session info:
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 7 (wheezy)
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 dplyr_0.7.1 purrr_0.2.2.2 readr_1.1.1
[5] tidyr_0.6.3 tibble_1.3.3 ggplot2_2.2.1 tidyverse_1.1.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 cellranger_1.1.0 compiler_3.4.0 plyr_1.8.4
[5] bindr_0.1 forcats_0.2.0 tools_3.4.0 jsonlite_1.5
[9] lubridate_1.6.0 nlme_3.1-131 gtable_0.2.0 lattice_0.20-35
[13] pkgconfig_2.0.1 rlang_0.1.1 psych_1.7.5 curl_2.7
[17] parallel_3.4.0 haven_1.1.0 xml2_1.1.1 stringr_1.2.0
[21] httr_1.2.1 hms_0.3 grid_3.4.0 glue_1.1.1
[25] R6_2.2.2 readxl_1.0.0 foreign_0.8-69 reshape2_1.4.2
[29] modelr_0.1.0 magrittr_1.5 scales_0.4.1 rvest_0.3.2
[33] assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2 stringi_1.1.5
[37] lazyeval_0.2.0 munsell_0.4.3 broom_0.4.2
Edit
FWIW, after downgrading to dplyr == 0.5.0 makes the above code fine.