Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return type of summarise_all(x, max) depends on Encoding(names(x)) #4258

Closed
mvkorpel opened this issue Mar 6, 2019 · 3 comments
Closed

Return type of summarise_all(x, max) depends on Encoding(names(x)) #4258

mvkorpel opened this issue Mar 6, 2019 · 3 comments
Assignees
Labels
Milestone

Comments

@mvkorpel
Copy link

@mvkorpel mvkorpel commented Mar 6, 2019

The type of columns returned by summarise_all() sometimes seems to depend on the Encoding declared in the names of the input. In the example below, this is observed with the max() function. The issue also occurs with min(). I would expect the output type to be independent of names, i.e., all summary columns would be of "num" ("double") type in this case. Now the expectation fails if Encoding is not "unknown". The example was run on R-devel and dplyr dev version, but R 3.5.3 RC and dplyr 0.8.0.1 from CRAN produced the same results (on Windows 10 64-bit).

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

foo_unknown <- data.frame(a = 1L, "\xc2\xaa" = 2L)
foo_latin1 <- foo_unknown
Encoding(names(foo_latin1)) <- "latin1"
foo_utf8 <- foo_unknown
Encoding(names(foo_utf8)) <- "UTF-8"

str(summarise_all(foo_unknown, max))
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a : num 1
#>  $ ª: num 2
str(summarise_all(foo_latin1, max))
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a : num 1
#>  $ ª: int 2
str(summarise_all(foo_utf8, max))
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: num 1
#>  $ ª: int 2

Created on 2019-03-06 by the reprex package (v0.2.1)

Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                                             
#>  version  R Under development (unstable) (2019-03-03 r76192)
#>  os       Windows 10 x64                                    
#>  system   x86_64, mingw32                                   
#>  ui       RTerm                                             
#>  language (EN)                                              
#>  collate  Finnish_Finland.1252                              
#>  ctype    Finnish_Finland.1252                              
#>  tz       Europe/Helsinki                                   
#>  date     2019-03-06                                        
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib source                          
#>  assertthat    0.2.0      2017-04-11 [1] CRAN (R 3.5.2)                  
#>  backports     1.1.3      2018-12-14 [1] CRAN (R 3.5.2)                  
#>  callr         3.1.1      2018-12-21 [1] CRAN (R 3.5.2)                  
#>  cli           1.0.1      2018-09-25 [1] CRAN (R 3.5.2)                  
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.2)                  
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.5.2)                  
#>  devtools      2.0.1      2018-10-26 [1] CRAN (R 3.5.2)                  
#>  digest        0.6.18     2018-10-10 [1] CRAN (R 3.5.2)                  
#>  dplyr       * 0.8.0.9006 2019-03-06 [1] Github (tidyverse/dplyr@2ef1fd9)
#>  evaluate      0.13       2019-02-12 [1] CRAN (R 3.5.2)                  
#>  fs            1.2.6      2018-08-23 [1] CRAN (R 3.5.2)                  
#>  glue          1.3.0      2018-07-17 [1] CRAN (R 3.5.2)                  
#>  highr         0.7        2018-06-09 [1] CRAN (R 3.5.2)                  
#>  htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.5.2)                  
#>  knitr         1.21       2018-12-10 [1] CRAN (R 3.5.2)                  
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.5.2)                  
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.5.2)                  
#>  pillar        1.3.1      2018-12-15 [1] CRAN (R 3.5.2)                  
#>  pkgbuild      1.0.2      2018-10-16 [1] CRAN (R 3.5.2)                  
#>  pkgconfig     2.0.2      2018-08-16 [1] CRAN (R 3.5.2)                  
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.5.2)                  
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.5.2)                  
#>  processx      3.2.1      2018-12-05 [1] CRAN (R 3.5.2)                  
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.5.2)                  
#>  purrr         0.3.0      2019-01-27 [1] CRAN (R 3.5.2)                  
#>  R6            2.4.0      2019-02-14 [1] CRAN (R 3.5.2)                  
#>  Rcpp          1.0.0      2018-11-07 [1] CRAN (R 3.5.2)                  
#>  remotes       2.0.2      2018-10-30 [1] CRAN (R 3.5.2)                  
#>  rlang         0.3.1      2019-01-08 [1] CRAN (R 3.5.2)                  
#>  rmarkdown     1.11       2018-12-08 [1] CRAN (R 3.5.2)                  
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.5.2)                  
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.5.2)                  
#>  stringi       1.3.1      2019-02-13 [1] CRAN (R 3.5.2)                  
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.5.2)                  
#>  testthat      2.0.1      2018-10-13 [1] CRAN (R 3.5.2)                  
#>  tibble        2.0.1      2019-01-12 [1] CRAN (R 3.5.2)                  
#>  tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.5.2)                  
#>  usethis       1.4.0      2018-08-14 [1] CRAN (R 3.5.2)                  
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.2)                  
#>  xfun          0.5        2019-02-20 [1] CRAN (R 3.5.2)                  
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.2)                  
#> 
#> [1] C:/Omat/R/win-library/3.6
#> [2] C:/Program Files/R/R-devel/library
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Mar 6, 2019

Thanks. There's actually two problems:

  • the hybrid implementation, defined by the MinMax class always return a numeric vector
  • hybrid does not kick in when encoding mismatch, so you get the standard R evaluation, which is correctly returning an integer vector.

@romainfrancois romainfrancois added this to the 0.8.1 milestone Mar 6, 2019
@romainfrancois romainfrancois self-assigned this Mar 6, 2019
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Mar 7, 2019

The first one is tricky, because of the flexible return type of min() in R, which does not play along nicely with C++.

image

@lock
Copy link

@lock lock bot commented Sep 15, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Sep 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants