I recently noticed that using dtplyr with summarise and .data$var syntax results in some unexpected behavior without warning. I am using dtplyr within my package and have been referencing unquoted variable names with .data$varname as recommended. dtplyr does not seem to recognize that syntax and, at least when used with summarise, returns sums of 0s.
Please see the reprex below.
Thank you,
Sam
library(magrittr) #Normally my package would import just %>%
library(rlang) # Normally my package would import .data
#>
#> Attaching package: 'rlang'
#> The following object is masked from 'package:magrittr':
#>
#> set_names
d<-tibble::tibble(Group=rep(c("A", "B"), 10), Num=1:20)
d
#> # A tibble: 20 x 2
#> Group Num
#> <chr> <int>
#> 1 A 1
#> 2 B 2
#> 3 A 3
#> 4 B 4
#> 5 A 5
#> 6 B 6
#> 7 A 7
#> 8 B 8
#> 9 A 9
#> 10 B 10
#> 11 A 11
#> 12 B 12
#> 13 A 13
#> 14 B 14
#> 15 A 15
#> 16 B 16
#> 17 A 17
#> 18 B 18
#> 19 A 19
#> 20 B 20
# Works without `dtplyr`
d%>%
dplyr::group_by(.data$Group)%>%
dplyr::summarise(Num=sum(.data$Num, na.rm=TRUE))%>%
dplyr::ungroup()%>%
tibble::as_tibble()
#> # A tibble: 2 x 2
#> Group Num
#> <chr> <int>
#> 1 A 100
#> 2 B 110
# `.data` does not seem to work with `dtplyr`
d%>%
dtplyr::lazy_dt()%>%
dplyr::group_by(.data$Group)%>%
dplyr::summarise(Num=sum(.data$Num, na.rm=TRUE))%>%
dplyr::ungroup()%>%
tibble::as_tibble()
#> Error in eval(bysub, x, parent.frame()): object 'Group' not found
# But if you remove the `.data$` from `group_by` and leave it in
# the `summarise` call, it returns 0s, but no warnings or errors
d%>%
dtplyr::lazy_dt()%>%
dplyr::group_by(Group)%>%
dplyr::summarise(Num=sum(.data$Num, na.rm=TRUE))%>%
dplyr::ungroup()%>%
tibble::as_tibble()
#> # A tibble: 2 x 2
#> Group Num
#> <chr> <int>
#> 1 A 0
#> 2 B 0
# With `group_by_at` (what I was actually trying to use in my case),
# you can use `.data$` but it again returns 0s with no warnings or errors
d%>%
dtplyr::lazy_dt()%>%
dplyr::group_by_at(dplyr::vars(.data$Group))%>%
dplyr::summarise(Num=sum(.data$Num, na.rm=TRUE))%>%
dplyr::ungroup()%>%
tibble::as_tibble()
#> # A tibble: 2 x 2
#> Group Num
#> <chr> <int>
#> 1 A 0
#> 2 B 0
Created on 2019-12-20 by the reprex package (v0.3.0)
I recently noticed that using
dtplyrwithsummariseand.data$varsyntax results in some unexpected behavior without warning. I am usingdtplyrwithin my package and have been referencing unquoted variable names with.data$varnameas recommended.dtplyrdoes not seem to recognize that syntax and, at least when used with summarise, returns sums of 0s.Please see the reprex below.
Thank you,
Sam
Created on 2019-12-20 by the reprex package (v0.3.0)