Skip to content

Feature proposal: enhanced base::factor() with coherance warnings #299

@DanChaltiel

Description

@DanChaltiel

The more I review my code and others', the more I realize that base::factor() causes a lot of problems by not throwing warnings when encountering an unknown level. Instead, it silently generates NA, which can cause heavy misunderstanding later.

Since it makes little sense to me to specify levels that do not exist in the actual input, I would expect some verbosity about it.

Here is an example:

library(glue)
library(rlang)
library(magrittr)

df_labels = read.table(header=TRUE, text="
    level label
    setsa SETO                #typo
    verssicolor VERSICO #typo
    virginica VIRGINI
")

x=as.character(iris$Species)

f1 = factor(x, levels=df_labels$level, labels=df_labels$label)
table(f1)
#> f1
#>    SETO VERSICO VIRGINI 
#>       0       0      50

Before calling table(), the user has no idea that the previous call had "failing" cases.

Here is the function I'm using instead:

fct = function(x=character(), levels, labels=levels, ...){
    miss_x = !x %in% levels
    if(any(miss_x)){
        miss_x_s = unique(x[miss_x]) %>% glue_collapse(", ")
        warn(c("Unknown factor level in `x`, NA generated.", 
               x=glue("Unknown levels: {miss_x_s}")))
    }
    factor(x, levels, labels, ...)
}
f2 = fct(x, levels=df_labels$level, labels=df_labels$label)
#> Warning: Unknown factor level in `x`, NA generated.
#> x Unknown levels: setosa, versicolor
table(f2)
#> f2
#>    SETO VERSICO VIRGINI 
#>       0       0      50

Created on 2022-02-13 by the reprex package (v2.0.1)

As more and more people rely on tidyverse to write cleaner code, I would guess this could belong here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions