Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate doesn't work on first grouped column when more than one column in group_by #177

Closed
benmarwick opened this issue Mar 28, 2016 · 1 comment

Comments

@benmarwick
Copy link

commented Mar 28, 2016

In a dataframe where two column names are passed to group_by, the separate function will not find the first column name (but will find the second one).

Here's an example...

df <- data.frame(the_num = 1:30,
        the_chr1 = rep(sapply(1:10, function(i) paste(sample(c(letters,LETTERS),3),collapse="")),3),
        the_chr2 = sapply(1:30, function(i) paste(sample(c(letters,LETTERS),3),collapse="")))
head(df)
  the_num the_chr1 the_chr2
1       1      Fiq      LMH
2       2      Ozf      hdv
3       3      NVK      ROc
4       4      IRe      HpE
5       5      Aeq      Rrd
6       6      vaU      Qkt

And so if we attempt to pipe together a few functions, and then separate the contents of the first column in the group_by, which in this example is the_chr1, here's what happens:

library(dplyr); library(tidyr)
df %>% 
  group_by(the_chr1, the_chr2) %>% 
  summarize(mean_i = mean(the_num)) %>% 
  separate(the_chr1, c('first_bit', 'second_bit'), sep = 1)

The result is an unexpected Error: unknown column 'the_chr1'

However, if we try to separate the second column in the group_by (here it's the_chr2), it works fine:

 df %>% 
  group_by(the_chr1, the_chr2) %>% 
  summarize(mean_i = mean(the_num)) %>% 
  separate(the_chr2, c('first_bit', 'second_bit'), sep = 1)
Source: local data frame [30 x 4]
Groups: the_chr1 [10]

   the_chr1 first_bit second_bit mean_i
     (fctr)     (chr)      (chr)  (dbl)
1       Aeq         h         uX     15
2       Aeq         R         rd      5
3       Aeq         W         GJ     25
4       Fiq         F         OU     11
5       Fiq         L         MH      1
6       Fiq         y         IE     21
7       FlV         G         da     19
8       FlV         i         pU     29
9       FlV         l         Yn      9
10      hPy         A         MN      7
..      ...       ...        ...    ...

Of course it works file if we group_by and separate on the same one column:

df %>% 
  group_by(the_chr2) %>% 
  summarize(mean_i = mean(the_num)) %>% 
  separate(the_chr2, c('first_bit', 'second_bit'), sep = 1)
Source: local data frame [30 x 3]

   first_bit second_bit mean_i
       (chr)      (chr)  (dbl)
1          A         MN      7
2          C         ur     18
3          e         rc     24
4          F         OU     11
5          G         da     19
6          h         dv      2
7          H         pE      4
8          h         uX     15
9          h         Wv     28
10         I         JP     27
..       ...        ...    ...

So there seems to be a bit of a problem with separate handling data frames with multiple grouping variables.

@hadley hadley closed this in 3dbcbf7 May 16, 2016

@benmarwick

This comment has been minimized.

Copy link
Author

commented May 16, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.