The type of columns returned by summarise_all() sometimes seems to depend on the Encoding declared in the names of the input. In the example below, this is observed with the max() function. The issue also occurs with min(). I would expect the output type to be independent of names, i.e., all summary columns would be of "num" ("double") type in this case. Now the expectation fails if Encoding is not "unknown". The example was run on R-devel and dplyr dev version, but R 3.5.3 RC and dplyr 0.8.0.1 from CRAN produced the same results (on Windows 10 64-bit).
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
foo_unknown <- data.frame(a = 1L, "\xc2\xaa" = 2L)
foo_latin1 <- foo_unknown
Encoding(names(foo_latin1)) <- "latin1"
foo_utf8 <- foo_unknown
Encoding(names(foo_utf8)) <- "UTF-8"
str(summarise_all(foo_unknown, max))
#> 'data.frame': 1 obs. of 2 variables:
#> $ a : num 1
#> $ ª: num 2
str(summarise_all(foo_latin1, max))
#> 'data.frame': 1 obs. of 2 variables:
#> $ a : num 1
#> $ ª: int 2
str(summarise_all(foo_utf8, max))
#> 'data.frame': 1 obs. of 2 variables:
#> $ a: num 1
#> $ ª: int 2
Created on 2019-03-06 by the reprex package (v0.2.1)
Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#> setting value
#> version R Under development (unstable) (2019-03-03 r76192)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Finnish_Finland.1252
#> ctype Finnish_Finland.1252
#> tz Europe/Helsinki
#> date 2019-03-06
#>
#> - Packages --------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.2)
#> backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.2)
#> callr 3.1.1 2018-12-21 [1] CRAN (R 3.5.2)
#> cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.2)
#> devtools 2.0.1 2018-10-26 [1] CRAN (R 3.5.2)
#> digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.2)
#> dplyr * 0.8.0.9006 2019-03-06 [1] Github (tidyverse/dplyr@2ef1fd9)
#> evaluate 0.13 2019-02-12 [1] CRAN (R 3.5.2)
#> fs 1.2.6 2018-08-23 [1] CRAN (R 3.5.2)
#> glue 1.3.0 2018-07-17 [1] CRAN (R 3.5.2)
#> highr 0.7 2018-06-09 [1] CRAN (R 3.5.2)
#> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.2)
#> knitr 1.21 2018-12-10 [1] CRAN (R 3.5.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.2)
#> pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.2)
#> pkgbuild 1.0.2 2018-10-16 [1] CRAN (R 3.5.2)
#> pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.2)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.2)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.2)
#> processx 3.2.1 2018-12-05 [1] CRAN (R 3.5.2)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2)
#> purrr 0.3.0 2019-01-27 [1] CRAN (R 3.5.2)
#> R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2)
#> Rcpp 1.0.0 2018-11-07 [1] CRAN (R 3.5.2)
#> remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.2)
#> rlang 0.3.1 2019-01-08 [1] CRAN (R 3.5.2)
#> rmarkdown 1.11 2018-12-08 [1] CRAN (R 3.5.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.2)
#> stringi 1.3.1 2019-02-13 [1] CRAN (R 3.5.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.2)
#> testthat 2.0.1 2018-10-13 [1] CRAN (R 3.5.2)
#> tibble 2.0.1 2019-01-12 [1] CRAN (R 3.5.2)
#> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.2)
#> usethis 1.4.0 2018-08-14 [1] CRAN (R 3.5.2)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.2)
#> xfun 0.5 2019-02-20 [1] CRAN (R 3.5.2)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.2)
#>
#> [1] C:/Omat/R/win-library/3.6
#> [2] C:/Program Files/R/R-devel/library
The type of columns returned by
summarise_all()sometimes seems to depend on theEncodingdeclared in thenamesof the input. In the example below, this is observed with themax()function. The issue also occurs withmin(). I would expect the output type to be independent of names, i.e., all summary columns would be of"num"("double") type in this case. Now the expectation fails if Encoding is not"unknown". The example was run on R-devel and dplyr dev version, but R 3.5.3 RC and dplyr 0.8.0.1 from CRAN produced the same results (on Windows 10 64-bit).Created on 2019-03-06 by the reprex package (v0.2.1)
Session info