Skip to content

summarise shouldn't force base R functions like sum and mean #3255

@msberends

Description

@msberends

The summarise function forces the use of base R functions, instead of the expected behaviour that it would use loaded ('active') functions from the own environment.

# (from another package:)
#' @inherit base::mean
#' @export
mean <- function(x, ..., na.rm = TRUE) {
  base::mean(x, ..., na.rm = na.rm)
}

library(dplyr)
df <- tibble(group = LETTERS[1:10], 
             score = c(runif(3, 0, 1), 
                       NA,
                       runif(6, 0, 1)))

mean(df$score) # now na.rm = TRUE by default
# [1] 0.5479927

df %>% summarise(m = mean(score))
# # A tibble: 1 x 1
#       m
#   <dbl>
# 1    NA

> df %>% summarise(m = mean(score, na.rm = TRUE))
# # A tibble: 1 x 1
#       m
#   <dbl>
# 1    0.5479927

Why not use functions from ones own environment? This mean function comes from another package, which I deliberately loaded, so I would expect it to work in all my code. But apparently, the summarise function is an exception. I think you should not force the use of base R functions like base::mean in dplyr.

You might think that package developers just shouldn't overwrite functions like mean, but even dplyr does this (like intersect, setdiff, setequal, union). That too gives other results than when only base R was loaded, just like you would expect. Hence, you loaded dplyr. Now I ask you to let summarise use the latest loaded functions, not just base R. Another example: if I would write a function to overwrite sum to solve #3189, it wouldn't be used by summarise at default, even when my package was loaded after dplyr.

Metadata

Metadata

Labels

bugan unexpected problem or unintended behavior

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions