New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nest_string()/unnest_string() functions #69

Merged
merged 14 commits into from May 24, 2016

Conversation

Projects
None yet
3 participants
@aaronwolen
Contributor

aaronwolen commented Mar 25, 2015

This PR slightly modifies unnest() so the transform step from the example isn't necessary and can just be written as:

df <- data.frame(
  x = 1:3,
  y = c("a", "d,e,f", "g,h"),
  stringsAsFactors = FALSE
)
unnest(df, y)

I'm often doing these kind of unnesting operations and just wanted to save myself some typing by letting unnest() handle the string splitting.

This also adds a nest() function so it's possible to round trip:

df %>% unnest(y) %>% nest(y)
@hadley

This comment has been minimized.

Member

hadley commented Mar 25, 2015

Hmmmm, that's only one type of nesting - the other is when you have something like:

dplyr::data_frame(x = c(1, 2), y = list(1:3, 9:10))

So maybe a more specific name like unnest_string() ?

@aaronwolen

This comment has been minimized.

Contributor

aaronwolen commented Mar 25, 2015

Ah, good call. unnest_string() makes sense and fits with the precedent set by extract_numeric().

But which type of nesting do you think is more common? If it's string nesting then maybe unnest_list() makes more sense?

@aaronwolen

This comment has been minimized.

Contributor

aaronwolen commented Mar 25, 2015

Thinking about it for the past 15 seconds, unnest_list() sounds more like a base::unsplit() replacement than a data.frame function. unnest_string() is probably the way to go.

Are you envisioning nest_string()/unnest_string() and nest()/unnest() for list columns? Or is that nest overload?

@aaronwolen aaronwolen changed the title from Simplify unnest's interface and add nest function to Add nest_string()/unnest_string() functions Mar 25, 2015

@aaronwolen aaronwolen force-pushed the aaronwolen:nest branch from c05cd39 to b4b4e4c Mar 27, 2015

@hadley

This comment has been minimized.

Member

hadley commented Mar 30, 2015

To me, unnest() seems like the atomic operation (in the sense you combine it with other atoms to do something useful), so I'd like to keep it the primary verb.

#' stringsAsFactors = FALSE
#' )
#' unnest_string(df, y)
unnest_string <- function(data, col = NULL, sep = "[^[:alnum:]]+", ...) {

This comment has been minimized.

@hadley

hadley Mar 30, 2015

Member

Doesn't this always need a col?

#' stringsAsFactors = FALSE
#' )
#' nest_string(df, y)
nest_string <- function(data, col = NULL, sep = ",") {

This comment has been minimized.

@hadley

hadley Mar 30, 2015

Member

I'm not sure I want nest_string() - I think unnest_string() is useful, but the reverse seems a little ill-specified to me (esp how the grouping work)

@hadley

This comment has been minimized.

Member

hadley commented Mar 30, 2015

Could you PTAL at the build failure?

@aaronwolen

This comment has been minimized.

Contributor

aaronwolen commented Mar 30, 2015

Looks like the build failure was introduced in #59. I added a fix here (9c6fcc4) but can submit a separate PR If you prefer.

@aaronwolen

This comment has been minimized.

Contributor

aaronwolen commented Mar 30, 2015

Thanks for the comments. I see your point about nest_string() and removed it to wrap up this PR.

Any interest in extending unnest() (and unnest_string()) to accept multiple columns as suggested in #44?

DESCRIPTION Outdated
@@ -17,6 +17,8 @@ Imports:
dplyr (>= 0.2),
stringi,
lazyeval
Suggets:

This comment has been minimized.

@hadley

hadley Apr 16, 2015

Member

Was this deliberate?

This comment has been minimized.

@aaronwolen

aaronwolen Apr 16, 2015

Contributor

I meant to add data.table to the suggested packages to address this build failure, but listing it under a redundant/misspelled key was just a silly mistake (insert embarrassed emoji here).

This comment has been minimized.

@hadley

hadley Apr 16, 2015

Member

Ah I just discovered that too and fixed it (in a slightly different way). Could you please merge/rebase?

@hadley

This comment has been minimized.

Member

hadley commented Apr 16, 2015

Would you mind including a couple of unit tests too please?

@aaronwolen

This comment has been minimized.

Contributor

aaronwolen commented Apr 16, 2015

Sure, I'll rebase and add a few tests.

@aaronwolen aaronwolen force-pushed the aaronwolen:nest branch from e5048cb to a907a56 Apr 16, 2015

@coveralls

This comment has been minimized.

coveralls commented Apr 16, 2015

Coverage Status

Coverage increased (+5.75%) to 63.83% when pulling a907a56 on aaronwolen:nest into a2bab46 on hadley:master.

@aaronwolen aaronwolen force-pushed the aaronwolen:nest branch 2 times, most recently from 3c3e044 to 5938ed1 May 26, 2015

@hadley

This comment has been minimized.

Member

hadley commented May 22, 2016

I wonder if this should be unnest_separate() since it's similar to separate?

@aaronwolen

This comment has been minimized.

Contributor

aaronwolen commented May 22, 2016

Yeah, I think that makes sense. Should I update the PR?

@hadley

This comment has been minimized.

Member

hadley commented May 22, 2016

Yes, that would be great! I think maybe it should have col and convert arguments to be as similar as possible to separate(). And maybe it should be called separate_rows()? Or something like that.

It would also be useful give this function, extract(), and separate() a common @family, maybe @family string splitting ?

@aaronwolen

This comment has been minimized.

Contributor

aaronwolen commented May 23, 2016

From a readability perspective I definitely prefer separate_rows() to unnest_separate().

However, I think an argument can be made that this function is conceptually more similar to unnest(), which produces new rows, than it is to separate(), which produces a new column. If you buy that argument, unnest_*something*() might be more inline with user expectations, and if you don't... well, then separate_rows() it is!

@hadley

This comment has been minimized.

Member

hadley commented May 23, 2016

It's half-way between both, but I think people are more likely to look for it after discovering that separate() doesn't do what they want. I think if you understand unnest() you might not need unnest_string().

@aaronwolen aaronwolen force-pushed the aaronwolen:nest branch from eb8ea1d to 5a0640e May 24, 2016

aaronwolen added some commits May 23, 2016

Additional tests for separate_rows()
* Preserves grouping
* Avoid modifying grouped variable
* Convert works as expected

@aaronwolen aaronwolen force-pushed the aaronwolen:nest branch from 5a0640e to 38db7d4 May 24, 2016

#' @export
separate_rows_.grouped_df <- function(data, cols, sep = "[^[:alnum:].]+",
convert = FALSE) {

This comment has been minimized.

@aaronwolen

aaronwolen May 24, 2016

Contributor

Should separate_rows() prevent modification of grouped variables, @hadley?

This comment has been minimized.

@hadley

hadley May 24, 2016

Member

It shouldn't - you can copy the approach from separate_:

#' @export
separate_.grouped_df <- function(data, col, into, sep = "[^[:alnum:]]+",
                                 remove = TRUE, convert = FALSE,
                                 extra = "warn", fill = "warn", ...) {
  regroup(NextMethod(), data, if (remove) col)
}
@aaronwolen

This comment has been minimized.

Contributor

aaronwolen commented May 24, 2016

Cool, this should be ready for review.

@hadley hadley merged commit 32d9dd2 into tidyverse:master May 24, 2016

3 checks passed

codecov/patch 100% of diff hit (target 92.26%)
Details
codecov/project 92.38% (+0.08%) compared to 3917138
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@hadley

This comment has been minimized.

Member

hadley commented May 24, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment