The code below works when using "dplyr::n_distinct" but not when just using "n_distinct". The error in the non-working case is below.
Is it a bug or I am missing something? It doesn't appear that "n_distinct" is being picked up from a different package, as RStudio indicates it is coming from 'dplyr'. I have also included the equivalent "length(distinct(data))", which does work.
I have included the code, output, and the session info.
Simplified Example
This example simplifies showing the issue.
library(dplyr)
dat3 <- data.frame(id = c(2,6,7,10,10))
dat3 %>% summarise(n_unique = length(unique(id[id>6])))
## n_unique
## 1 2
dat3 %>% summarise(n_unique = n_distinct(id[id>6]))
## Error in summarise_impl(.data, dots) :
## Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'language'
dat3 %>% summarise(n_unique = dplyr::n_distinct(id[id>6]))
## n_unique
## 1 2
Original Example
NOTE: this is just a reproducible example to show the issue, I know the operation can be approached in a different way.
library(dplyr)
yrdf2 <- data.frame(year = 2012:2015)
dat2 <- data.frame(id = 1:4, start_yr = c(2012, 2012, 2013, 2013))
### WORKS
yrdf2 %>% group_by(year) %>% mutate(count = dplyr::n_distinct( dat2$id[ dat2$start_yr <= year ] ))
## Source: local data frame [4 x 2]
## Groups: year [4]
##
## year count
## (int) (int)
## 1 2012 2
## 2 2013 4
## 3 2014 4
## 4 2015 4
### FAILS
yrdf2 %>% group_by(year) %>% mutate(count = n_distinct( dat2$id[ dat2$start_yr <= year ] ))
## Error in mutate_impl(.data, dots) :
## Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'language'
### WORKS
yrdf2 %>% group_by(year) %>% mutate(count = length(unique(( dat2$id[ dat2$start_yr <= year ]))))
## Source: local data frame [4 x 2]
## Groups: year [4]
##
## year count
## (int) (int)
## 1 2012 2
## 2 2013 4
## 3 2014 4
## 4 2015 4
Session Info
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3.9000 plyr_1.8.3 scales_0.3.0 ggplot2_2.0.0 data.table_1.9.6
[6] bit64_0.9-5 bit_1.1-12 tidyr_0.3.1 lubridate_1.5.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.3 magrittr_1.5 munsell_0.4.2 statmod_1.4.22
[5] colorspace_1.2-6 lattice_0.20-33 R6_2.1.2 stringr_1.0.0
[9] tools_3.2.3 parallel_3.2.3 grid_3.2.3 gtable_0.1.2
[13] nlme_3.1-122 h2o_3.6.0.8 DBI_0.3.1 lazyeval_0.1.10.9000
[17] assertthat_0.1 bitops_1.0-6 RCurl_1.95-4.7 stringi_1.0-1
[21] jsonlite_0.9.19 chron_2.3-47
The code below works when using "dplyr::n_distinct" but not when just using "n_distinct". The error in the non-working case is below.
Is it a bug or I am missing something? It doesn't appear that "n_distinct" is being picked up from a different package, as RStudio indicates it is coming from 'dplyr'. I have also included the equivalent "length(distinct(data))", which does work.
I have included the code, output, and the session info.
Simplified Example
This example simplifies showing the issue.
Original Example
NOTE: this is just a reproducible example to show the issue, I know the operation can be approached in a different way.