Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lag() working incorrectly on factors when used inside mutate() #955

Closed
yhf8377 opened this issue Feb 9, 2015 · 2 comments
Closed

lag() working incorrectly on factors when used inside mutate() #955

yhf8377 opened this issue Feb 9, 2015 · 2 comments
Assignees
Milestone

Comments

@yhf8377
Copy link

@yhf8377 yhf8377 commented Feb 9, 2015

I want to compare each element in a factor vector against the previous element in the same vector in order to identify those elements that are different from the previous one.

For example (expected behaviour):

> test_factor <- factor(rep(c('A','B','C'), each = 3))
> test_factor
[1] A A A B B B C C C
Levels: A B C
> test_factor != lag(test_factor)
[1]    NA FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE

However, when using the same technique inside a mutate() function, it does not work as intended.

> test_df <- tbl_df(data.frame(test = test_factor))
> str(test_df)
Classestbl_df’, ‘tbland 'data.frame':   9 obs. of  1 variable:
 $ test: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 3 3 3
> test_df %>% mutate(is_diff = (test != lag(test)))
Source: local data frame [9 x 2]

  test is_diff
1    A      NA
2    A    TRUE
3    A    TRUE
4    B    TRUE
5    B    TRUE
6    B    TRUE
7    C    TRUE
8    C    TRUE
9    C    TRUE

I found a work-around that is to explicitly convert the factor to either numeric or character vector. However this is inconvenient and reduces code readability.

> test_df %>% mutate(is_diff = (as.numeric(test) != lag(as.numeric(test))))
Source: local data frame [9 x 2]

  test is_diff
1    A      NA
2    A   FALSE
3    A   FALSE
4    B    TRUE
5    B   FALSE
6    B   FALSE
7    C    TRUE
8    C   FALSE
9    C   FALSE
> test_df %>% mutate(is_diff = (as.character(test) != lag(as.character(test))))
Source: local data frame [9 x 2]

  test is_diff
1    A      NA
2    A   FALSE
3    A   FALSE
4    B    TRUE
5    B   FALSE
6    B   FALSE
7    C    TRUE
8    C   FALSE
9    C   FALSE

Please let me know if it was because something I misunderstood. Thanks.

@romainfrancois romainfrancois self-assigned this Apr 23, 2015
@hadley hadley added this to the 0.5 milestone May 19, 2015
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
@hadley @yhf8377 @romainfrancois and others