{{ message }}

# group_by() %>% mutate(factor()) strange behavior#1414

Closed
opened this issue Sep 17, 2015 · 3 comments
Closed

# group_by() %>% mutate(factor()) strange behavior#1414

opened this issue Sep 17, 2015 · 3 comments

### paulanka commented Sep 17, 2015

 Using factor() in mutate on a grouped data frame gives strange results ```require(Lahman) d <- Batting %>% group_by(lgID,yearID) %>% summarise(s = sum(G)) %>% mutate(f0=s>9000, f1=factor(s>9000)) xtabs(~f0,d) # 20 250 xtabs(~f1,d) # 136 134 ( wrong)``` using ungroup() before mutate() solves the problem ```d <- Batting %>% group_by(lgID,yearID) %>% summarise(s = sum(G)) %>% ungroup() %>% mutate(f0=s>9000, f1=factor(s>9000)) xtabs(~f0,d) # 20 250 xtabs(~f1,d) # 20 250``` The text was updated successfully, but these errors were encountered:

### romainfrancois commented Sep 18, 2015

 This is because the distribution of unique values is not the same across groups: ```> fact <- function(x) { print(unique(x)); factor(x) } > > > d <- Batting %>% group_by(lgID,yearID) %>% + summarise(s = sum(G)) %>% + mutate(f0=s>9000,f1= fact(s>9000) )  FALSE TRUE  TRUE  TRUE  FALSE  FALSE TRUE  TRUE  FALSE```

### paulanka commented Sep 18, 2015

 Thank you for your quick reply. This is a bit tricky. Maybe the doc should warn about using factor with grouped df.

### romainfrancois commented Sep 18, 2015

 Nah that's a bug. I'll probably pick it up tonight.