New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should gather default to character instead of factor for the key column? #96

Closed
klmr opened this Issue Aug 13, 2015 · 4 comments

Comments

Projects
None yet
4 participants
@klmr

klmr commented Aug 13, 2015

gather currently dispatches to reshape2::melt, and the resulting key column is of type factor, unless convert=TRUE is specified, in which case it’s converted to another type, depending on the actual data.

This default makes sense if the resulting table is used with modelling functions, but not much else. In particular, a very common workflow for me is to use gather followed by inner_join, e.g. (where counts is a gene expression experiment consisting of several libraries, and col_data is the experiment’s column data, in Bioconductor parlance, i.e. describes the experiments):

counts = gather(counts, Library, Count) %>% inner_join(col_data, by = 'Library')

This invariably yields a warning from inner_join:

joining factor and character vector, coercing into character vector

And while harmless, it’s a bit annoying, especially as this is usually in a knitr document and I don’t want to clutter my output.

convert=TRUE isn’t really what I want: For instance, I almost always want the result to be of type character, regardless of the actual data format.

Am I right in thinking that using factor rather than character as the type of the key column is more for historical reasons than actually due to a hard advantage? Would it make sense to change the default?

@hadley

This comment has been minimized.

Member

hadley commented Aug 24, 2015

I used it in order to preserve the order of the columns, which is sometimes important. It's fairly easy to control in reshape2 via the factorsAsStrings argument (which is terrible name!). Maybe for tidyr, I should default it to FALSE, and provide a way to override it.

@mr-majkel

This comment has been minimized.

mr-majkel commented Sep 28, 2015

I would also be interested in gather() returning character for the key column. I have had the same "problem" with reshape2::melt().

@hrbrmstr

This comment has been minimized.

hrbrmstr commented Oct 1, 2015

Adding my $0.02USD in support of gather not defaulting to factor for character vectors (if i can scrounge some cycles, that may be a PR at some point).

@hadley

This comment has been minimized.

Member

hadley commented Dec 30, 2015

Any ideas about what to call the argument? I'm going with factor_key for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment