The actual behavior of function fct_inseq() is not documented well (see here) and the behavior of this function violates some principles of the Tidyverse:
- I expect the function to change the order of levels only. It must not modify the labels of the levels and must not modify the values of a variable (see Ex. A-1, C-1 and read the comments there).
- In some situations, the function fails, but it should not (see Ex. B-1).
- If something is changed in a variable, I expect at least a warning and not silent modifications.
- What I really expected, is the output similar to the result of
gtools::mixedsort() (see Ex. A-2, B-2, C-2). Can this be implemented in forcats?
- I expect all peculiarities of the function to be well documented with a section of representative examples.
- Shouldn't there be a unit test that checks if all levels are included (despite the order of levels)?
The examples of fct_inseq() (A-1, B-1, C-1 shows actual and A-2, B-2, C-2 shows expected behavior of fct_inseq()):
my_fct_a <- factor(c("010", "001", "005"))
# (Ex. A-1):
# 1. values are converted to NA;
# 2. leading zeros were dropped;
# 3. no warning was issued.
forcats::fct_inseq(my_fct_a)
#> [1] <NA> <NA> <NA>
#> Levels: 1 5 10
# (Ex. A-2):
# The expected result for my_fct_a
forcats::fct_relabel(my_fct_a, gtools::mixedsort)
#> [1] 010 001 005
#> Levels: 001 005 010
my_fct_b <- factor(c("10", "001", "1", "23", "05", "01"))
# (Ex. B-1):
# 1. Original values are different, but function fails with error
forcats::fct_inseq(my_fct_b)
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated
# (Ex. B-2)
# The expected result for my_fct_b
forcats::fct_relabel(my_fct_b, gtools::mixedsort)
#> [1] 10 001 05 23 1 01
#> Levels: 001 01 1 05 10 23
my_fct_c <- factor(c("A10", 9, "A1", "B1", 6, 5, "A05", "01"))
# (Ex. C-1)
# 1) The levels of the result are different than the original values
# 2) some values are SILENTLY converted to NAs
forcats::fct_inseq(my_fct_c)
#> [1] <NA> 9 <NA> <NA> 6 5 <NA> <NA>
#> Levels: 1 5 6 9
# (Ex. C-2)
# The expected result for my_fct_c
forcats::fct_relabel(my_fct_c, gtools::mixedsort)
#> [1] A10 9 A05 B1 6 5 A1 01
#> Levels: 01 5 6 9 A1 A05 A10 B1
# (Ex. C-3)
# 1) mixed sorting (C-2) gives different result than regular sorting (C-3)
forcats::fct_relabel(my_fct_c, sort)
#> [1] A10 9 A1 B1 6 5 A05 01
#> Levels: 01 5 6 9 A05 A1 A10 B1
Created on 2019-11-19 by the reprex package (v0.3.0)
For unit testing:
x <- factor(c("A10", 9, "A1", "B1", 6, 5, "A05", "01"))
testthat::expect_silent({x_after <- forcats::fct_inseq(x)})
testthat::expect_true(all(levels(x_after) %in% levels(x)))
testthat::expect_true(all(unique(x_after) %in% unique(x)))
Session info
```r
> sessioninfo::session_info("forcats")
- Session info -------------------------------------------------------------------------------------------------
setting value
version R version 3.6.1 (2019-07-05)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz Europe/Helsinki
date 2019-11-19
- Packages -----------------------------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.1)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
digest 0.6.22 2019-10-21 [1] CRAN (R 3.6.1)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0)
forcats 0.4.0 2019-02-17 [1] CRAN (R 3.6.0)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
rlang 0.4.1.9000 2019-11-17 [1] Github (r-lib/rlang@88deeb2)
tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.1)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
The actual behavior of function
fct_inseq()is not documented well (see here) and the behavior of this function violates some principles of the Tidyverse:gtools::mixedsort()(see Ex. A-2, B-2, C-2). Can this be implemented inforcats?The examples of
fct_inseq()(A-1, B-1, C-1 shows actual and A-2, B-2, C-2 shows expected behavior offct_inseq()):Created on 2019-11-19 by the reprex package (v0.3.0)
For unit testing:
Session info
```r > sessioninfo::session_info("forcats") - Session info ------------------------------------------------------------------------------------------------- setting value version R version 3.6.1 (2019-07-05) os Windows 10 x64 system x86_64, mingw32 ui RStudio language (EN) collate English_United States.1252 ctype English_United States.1252 tz Europe/Helsinki date 2019-11-19package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.1)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
digest 0.6.22 2019-10-21 [1] CRAN (R 3.6.1)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0)
forcats 0.4.0 2019-02-17 [1] CRAN (R 3.6.0)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
rlang 0.4.1.9000 2019-11-17 [1] Github (r-lib/rlang@88deeb2)
tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.1)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)