Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windowed rank functions don't work with character columns in tibbles #2988

Closed
foo-bar-baz-qux opened this issue Jul 21, 2017 · 9 comments
Closed
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@foo-bar-baz-qux
Copy link
Contributor

A follow-on from #2792, it appears many of the windowed ranking functions do not work on character columns when using tibbles.

library(dplyr)

df <- data.frame(a = c("a", "C", "z"))
df_t <- data_frame(a = c("a", "C", "z"))

print(df %>% mutate(r = dense_rank(a)))
#>   a r
#> 1 a 1
#> 2 C 2
#> 3 z 3
print(df_t %>% mutate(r = dense_rank(a)))
#> Error in mutate_impl(.data, dots): STRING_ELT() can only be applied to a 'character vector', not a 'char'

print(df %>% mutate(r = min_rank(a)))
#>   a r
#> 1 a 1
#> 2 C 2
#> 3 z 3
print(df_t %>% mutate(r = min_rank(a)))
#> Error in mutate_impl(.data, dots): STRING_ELT() can only be applied to a 'character vector', not a 'char'

print(df %>% mutate(r = cume_dist(a)))
#>   a         r
#> 1 a 0.3333333
#> 2 C 0.6666667
#> 3 z 1.0000000
print(df_t %>% mutate(r = cume_dist(a)))
#> Error in mutate_impl(.data, dots): STRING_ELT() can only be applied to a 'character vector', not a 'char'

print(df %>% mutate(r = percent_rank(a)))
#>   a   r
#> 1 a 0.0
#> 2 C 0.5
#> 3 z 1.0
print(df_t %>% mutate(r = percent_rank(a)))
#> Error in mutate_impl(.data, dots): STRING_ELT() can only be applied to a 'character vector', not a 'char'
@krlmlr
Copy link
Member

krlmlr commented Jul 27, 2017

Thanks, confirmed. Slightly less confusing reprex:

# Packages already on the search path:
suppressPackageStartupMessages(library(dplyr))

# User code:
df_f <- data_frame(a = factor(c("a", "C", "z")))
df_s <- data_frame(a = c("a", "C", "z"))

print(df_f %>% mutate(r = dense_rank(a)))
#> # A tibble: 3 x 2
#>        a     r
#>   <fctr> <int>
#> 1      a     1
#> 2      C     2
#> 3      z     3
print(df_s %>% mutate(r = dense_rank(a)))
#> Error in mutate_impl(.data, dots): STRING_ELT() can only be applied to a 'character vector', not a 'NULL'

@krlmlr krlmlr added bug an unexpected problem or unintended behavior data frame labels Jul 27, 2017
@krlmlr krlmlr modified the milestone: 0.7.3 Aug 16, 2017
@krlmlr
Copy link
Member

krlmlr commented Aug 23, 2017

Please use dplyr::dense_rank() for now to fall back to standard evaluation.

@krlmlr
Copy link
Member

krlmlr commented Aug 23, 2017

We need to figure out how to sort character vectors quickly and consistently with base R first (#3044), before we can usefully look into this problem.

@romainfrancois
Copy link
Member

Getting this now on @foo-bar-baz-qux code:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- data.frame(a = c("a", "C", "z"))
df_t <- data_frame(a = c("a", "C", "z"))

print(df %>% mutate(r = dense_rank(a)))
#>   a r
#> 1 a 1
#> 2 C 2
#> 3 z 3
print(df_t %>% mutate(r = dense_rank(a)))
#> # A tibble: 3 x 2
#>   a         r
#>   <chr> <int>
#> 1 a         2
#> 2 C         1
#> 3 z         3
print(df %>% mutate(r = min_rank(a)))
#>   a r
#> 1 a 1
#> 2 C 2
#> 3 z 3
print(df_t %>% mutate(r = min_rank(a)))
#> # A tibble: 3 x 2
#>   a         r
#>   <chr> <int>
#> 1 a         2
#> 2 C         1
#> 3 z         3
print(df %>% mutate(r = cume_dist(a)))
#>   a         r
#> 1 a 0.3333333
#> 2 C 0.6666667
#> 3 z 1.0000000
print(df_t %>% mutate(r = cume_dist(a)))
#> # A tibble: 3 x 2
#>   a         r
#>   <chr> <dbl>
#> 1 a     0.667
#> 2 C     0.333
#> 3 z     1.00
print(df %>% mutate(r = percent_rank(a)))
#>   a   r
#> 1 a 0.0
#> 2 C 0.5
#> 3 z 1.0
print(df_t %>% mutate(r = percent_rank(a)))
#> # A tibble: 3 x 2
#>   a         r
#>   <chr> <dbl>
#> 1 a     0.500
#> 2 C     0.   
#> 3 z     1.00

Created on 2018-03-26 by the reprex package (v0.2.0).

@romainfrancois
Copy link
Member

And this on @krlmlr code:

suppressPackageStartupMessages(library(dplyr))

# User code:
df_f <- data_frame(a = factor(c("a", "C", "z")))
df_s <- data_frame(a = c("a", "C", "z"))

print(df_f %>% mutate(r = dense_rank(a)))
#> # A tibble: 3 x 2
#>   a         r
#>   <fct> <int>
#> 1 a         1
#> 2 C         2
#> 3 z         3
print(df_s %>% mutate(r = dense_rank(a)))
#> # A tibble: 3 x 2
#>   a         r
#>   <chr> <int>
#> 1 a         2
#> 2 C         1
#> 3 z         3

Created on 2018-03-26 by the reprex package (v0.2.0).

@romainfrancois
Copy link
Member

Perhaps this was fixed as a side effect of something else @krlmlr ?

@krlmlr
Copy link
Member

krlmlr commented Mar 26, 2018

Works for me now, even with v0.7.4 from CRAN. Victor, can you confirm?

@krlmlr krlmlr closed this as completed Mar 26, 2018
@foo-bar-baz-qux
Copy link
Contributor Author

Hey @krlmlr, confirmed that it's now working for me on v0.7.4 from CRAN.

@lock
Copy link

lock bot commented Sep 23, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Sep 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants