Skip to content

lag() working incorrectly on factors when used inside mutate() #955

@yhf8377

Description

@yhf8377

I want to compare each element in a factor vector against the previous element in the same vector in order to identify those elements that are different from the previous one.

For example (expected behaviour):

> test_factor <- factor(rep(c('A','B','C'), each = 3))
> test_factor
[1] A A A B B B C C C
Levels: A B C
> test_factor != lag(test_factor)
[1]    NA FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE

However, when using the same technique inside a mutate() function, it does not work as intended.

> test_df <- tbl_df(data.frame(test = test_factor))
> str(test_df)
Classestbl_df’, ‘tbland 'data.frame':   9 obs. of  1 variable:
 $ test: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 3 3 3
> test_df %>% mutate(is_diff = (test != lag(test)))
Source: local data frame [9 x 2]

  test is_diff
1    A      NA
2    A    TRUE
3    A    TRUE
4    B    TRUE
5    B    TRUE
6    B    TRUE
7    C    TRUE
8    C    TRUE
9    C    TRUE

I found a work-around that is to explicitly convert the factor to either numeric or character vector. However this is inconvenient and reduces code readability.

> test_df %>% mutate(is_diff = (as.numeric(test) != lag(as.numeric(test))))
Source: local data frame [9 x 2]

  test is_diff
1    A      NA
2    A   FALSE
3    A   FALSE
4    B    TRUE
5    B   FALSE
6    B   FALSE
7    C    TRUE
8    C   FALSE
9    C   FALSE
> test_df %>% mutate(is_diff = (as.character(test) != lag(as.character(test))))
Source: local data frame [9 x 2]

  test is_diff
1    A      NA
2    A   FALSE
3    A   FALSE
4    B    TRUE
5    B   FALSE
6    B   FALSE
7    C    TRUE
8    C   FALSE
9    C   FALSE

Please let me know if it was because something I misunderstood. Thanks.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions