Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutate and ifelse() fail upon NA comparison and is grouping dependent #958

Closed
ryanabdella opened this issue Feb 11, 2015 · 11 comments
Closed

Mutate and ifelse() fail upon NA comparison and is grouping dependent #958

ryanabdella opened this issue Feb 11, 2015 · 11 comments
Assignees
Labels
Milestone

Comments

@ryanabdella
Copy link

@ryanabdella ryanabdella commented Feb 11, 2015

I've encountered an issue trying to select the maximum value of a column with groupings and mutating that column with NAs present.

x <- rep(c("Bob", "Jane"), each = 36)
y <- rep(rep(c("A", "B", "C"), each = 12), 2)
day <- rep(rep(1:12, 3), 2)
values <- rep(rep(c(10, 11, 30, 12, 13, 14, 15, 16, 17, 18, 19, 20), 3), 2)

df <- data.frame(x = x, y = y, day = day, values = values)
df$values[1:12] <- NA

df_works <- df %>% 
  group_by(x, y) %>%
  mutate(max.sum = day[which.max(values)[1]]) %>%
  group_by(y) %>%
  mutate(adjusted_values = ifelse(day < max.sum, 30, values))

This works. The regrouping seems to be required though...

df_logic_fail <- df %>% 
  group_by(x, y) %>%
  mutate(max.sum = day[which.max(values)[1]]) %>%
  mutate(adjusted_values = ifelse(day < max.sum, 30, values))
Error: incompatible types, expecting a logical vector

Even though after the first mutate the data frame still appears to be grouped...

df_still_grouped <- df %>% 
  group_by(x, y) %>%
  mutate(max.sum = day[which.max(values)[1]])
Source: local data frame [72 x 5]
Groups: x, y

     x y day values max.sum
1  Bob A   1     NA      NA
2  Bob A   2     NA      NA
3  Bob A   3     NA      NA
4  Bob A   4     NA      NA
5  Bob A   5     NA      NA
6  Bob A   6     NA      NA
7  Bob A   7     NA      NA
8  Bob A   8     NA      NA
9  Bob A   9     NA      NA
10 Bob A  10     NA      NA
.. ... . ...    ...     ...

What also doesn't work is if your only grouping by one column

x2 <- rep(c("Bob", "Jane"), each = 12)
day2 <- rep(1:12, 2)
values2 <- rep(c(10, 11, 30, 12, 13, 14, 15, 16, 17, 18, 19, 20), 2)

df2 <- data.frame(x = x2, day = day2, values = values2)
df2$values[1:12] <- NA

df_test2 <- df2 %>%
  group_by(x) %>%
  mutate(max.sum = day[which.max(values)[1]]) %>%
  group_by(x) %>%
  mutate(adjusted_values = ifelse(day < max.sum, 30, values))
Error: incompatible types, expecting a logical vector
@ryanabdella ryanabdella changed the title Mutate and ifelse() fail with upon NA comparison and single grouping Mutate and ifelse() fail upon NA comparison and is grouping dependent Feb 11, 2015
@hadley hadley added this to the 0.5 milestone May 19, 2015
@rubenarslan
Copy link

@rubenarslan rubenarslan commented Jun 12, 2015

I think this is down to groups that turn all NA through the mutate. Bit me a couple of times today, I made a reproducible gist. I think dplyr misbehaves and coerces to logical (and then errors out) where it shouldn't when one group has all missings as the result of a mutate command.

@hadley
Copy link
Member

@hadley hadley commented Aug 24, 2015

@romainfrancois is this related to #1334?

@romainfrancois romainfrancois self-assigned this Aug 25, 2015
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Aug 25, 2015

What happens is that the result of the first ifelse gives an all NA vector, so logical and the second one gives numbers:

> x <- rep(c("Bob", "Jane"), each = 36)
> y <- rep(rep(c("A", "B", "C"), each = 12), 2)
> day <- rep(rep(1:12, 3), 2)
> values <- rep(rep(c(10, 11, 30, 12, 13, 14, 15, 16, 17, 18, 19, 20), 3), 2)
>
> df <- data.frame(x = x, y = y, day = day, values = values)
> df$values[1:12] <- NA
>
> res <- df_logic_fail <- df %>%
+   group_by(x, y) %>%
+   mutate(max.sum = day[which.max(values)[1]])
>
> res %>%  mutate(adjusted_values = { print(day<max.sum); print( ifelse(day < max.sum, 30, values) ) } )
 [1] NA NA NA NA NA NA NA NA NA NA NA NA
 [1] NA NA NA NA NA NA NA NA NA NA NA NA
 [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [1] 30 30 30 12 13 14 15 16 17 18 19 20
Error: incompatible types, expecting a logical vector

I fixed something similar before.

@hadley
Copy link
Member

@hadley hadley commented Aug 25, 2015

I'm going to close this for now. I think we could do better with the error message, but basically I think it's better for dplyr to force you to think through what you actually want - that ifelse() doesn't make sense to me since sometimes it returns a logical and sometimes it returns a numeric.

@hadley hadley closed this Aug 25, 2015
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Aug 25, 2015

Ah. I was having a go at it. The problem is that is not the fault of the user, but the fault of ifelse that returns a logical full of NA when the condition is a NA only.

@hadley
Copy link
Member

@hadley hadley commented Aug 25, 2015

But what about the third return which is a logical vector? Maybe it's a combination of user error and dplyr bug?

@hadley
Copy link
Member

@hadley hadley commented Aug 25, 2015

Unless this is an easy fix, lets leave this for 0.5.

@hadley hadley reopened this Aug 25, 2015
@hadley hadley added the bug label Aug 25, 2015
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Aug 25, 2015

You mean in my modified version I guess ?
I'm printing the result of the condition and the result of the ifelse

So the lines:

 [1] NA NA NA NA NA NA NA NA NA NA NA NA
 [1] NA NA NA NA NA NA NA NA NA NA NA NA

are for the first group and the lines :

 [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [1] 30 30 30 12 13 14 15 16 17 18 19 20

for the second group.

We do have a test case for what I think you mean :

test_that("mutate errors when results are not compatible accross groups (#299)",{
  d <- data.frame(x = rep(1:5, each = 3))
  expect_error(mutate(group_by(d,x),val = ifelse(x < 3, "foo", 2)))
})

and that's fine. It's an easy fix anyway, I'll submit the PR shortly.

@hadley
Copy link
Member

@hadley hadley commented Aug 25, 2015

Oh oops. I didn't read closely enough.

@rgayler
Copy link

@rgayler rgayler commented Jun 22, 2016

Possible regression?

I am using dplyr 0.4.3 and just bumped into a problem using mutate, ifelse, groups and NA in the ifelse conditional.

My groups are very small (1 - 4 rows), so it's quite possible that the conditional is sometimes all NA.

It fails with a message like:
Error: incompatible types, expecting an integer vector

@hadley
Copy link
Member

@hadley hadley commented Jun 22, 2016

@rgayler please open a new issue with minimal reprex

@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants