Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should gather default to character instead of factor for the key column? #96

Closed
klmr opened this issue Aug 13, 2015 · 4 comments
Closed

Comments

@klmr
Copy link

klmr commented Aug 13, 2015

gather currently dispatches to reshape2::melt, and the resulting key column is of type factor, unless convert=TRUE is specified, in which case it’s converted to another type, depending on the actual data.

This default makes sense if the resulting table is used with modelling functions, but not much else. In particular, a very common workflow for me is to use gather followed by inner_join, e.g. (where counts is a gene expression experiment consisting of several libraries, and col_data is the experiment’s column data, in Bioconductor parlance, i.e. describes the experiments):

counts = gather(counts, Library, Count) %>% inner_join(col_data, by = 'Library')

This invariably yields a warning from inner_join:

joining factor and character vector, coercing into character vector

And while harmless, it’s a bit annoying, especially as this is usually in a knitr document and I don’t want to clutter my output.

convert=TRUE isn’t really what I want: For instance, I almost always want the result to be of type character, regardless of the actual data format.

Am I right in thinking that using factor rather than character as the type of the key column is more for historical reasons than actually due to a hard advantage? Would it make sense to change the default?

@hadley
Copy link
Member

hadley commented Aug 24, 2015

I used it in order to preserve the order of the columns, which is sometimes important. It's fairly easy to control in reshape2 via the factorsAsStrings argument (which is terrible name!). Maybe for tidyr, I should default it to FALSE, and provide a way to override it.

@mr-majkel
Copy link

I would also be interested in gather() returning character for the key column. I have had the same "problem" with reshape2::melt().

@hrbrmstr
Copy link

hrbrmstr commented Oct 1, 2015

Adding my $0.02USD in support of gather not defaulting to factor for character vectors (if i can scrounge some cycles, that may be a PR at some point).

@hadley
Copy link
Member

hadley commented Dec 30, 2015

Any ideas about what to call the argument? I'm going with factor_key for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants