Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow unpack() to work with vctrs_rcrd #755

Closed
mitchelloharawild opened this issue Sep 23, 2019 · 4 comments
Closed

Allow unpack() to work with vctrs_rcrd #755

mitchelloharawild opened this issue Sep 23, 2019 · 4 comments
Labels
df-col 👜 feature a feature request or enhancement
Milestone

Comments

@mitchelloharawild
Copy link
Contributor

mitchelloharawild commented Sep 23, 2019

Original/motivating issue: tidyverts/fabletools#122

The current application of separate() and extract() is limited to character columns, however there are other vector classes that you may want to separate() or extract() information from.

For my particular use-case this is the fabletools hilo class. As a vector (specifically a record), it stores a confidence interval for some prediction, consisting of the lower and upper bounds with their confidence level. Stylistically, [lower, upper]level e.g. [-3, 3]95. Some other examples of composite objects can be dreamt up, such as the latlon example class in Extending tibble.

Gaining access to the underlying elements of these classes within a data context is difficult, and I think the separate() and extract() verbs could be the appropriate interface for simplifying this.

To achieve this, separate() and extract() would now call some generic to separate or extract things from a vector (say separate_col() and extract_col()). Existing functionality would be preserved via separate_col.character(x, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...) and new functionality could be added with separate_col.class(x, ...).

For the hilo class, this would look like separate_col.hilo(x, into = c("lower", "upper", "level"), ...) allowing this workflow:

library(fabletools)
library(tidyverse)
df <- tibble(hilo = new_hilo(rnorm(5), rnorm(5, 3), 95))
df
#> # A tibble: 5 x 1
#>                       hilo
#>                     <hilo>
#> 1 [ 0.7348065, 3.369723]95
#> 2 [-0.9783929, 2.784432]95
#> 3 [-0.7608599, 4.152290]95
#> 4 [ 0.4115154, 2.760376]95
#> 5 [-1.4132969, 2.969952]95
df %>% separate(hilo)
#> # A tibble: 5 x 3
#>    lower upper level
#>    <dbl> <dbl> <dbl>
#> 1  0.735  3.37    95
#> 2 -0.978  2.78    95
#> 3 -0.761  4.15    95
#> 4  0.412  2.76    95
#> 5 -1.41   2.97    95

Similarly extract_col.hilo(x, into, what = into, ...) could be used to pull out elements of interest from the hilo.

df %>% extract(hilo, c("lower", "upper"))
#> # A tibble: 5 x 3
#>    lower upper
#>    <dbl> <dbl>
#> 1  0.735  3.37
#> 2 -0.978  2.78
#> 3 -0.761  4.15
#> 4  0.412  2.76
#> 5 -1.41   2.97

I don't think this would introduce breaking changes, although to do so the separate_col.default() must coerce to character instead of erroring (perhaps a warning/message would be useful here?).

Adding this support may make it harder for users to find the right place for documentation of each separation/extraction methods. Using generics style method linking and plenty examples in the docs would be helpful for this.

The alternative (if separate() and extract() shouldn't be generalised) would be to create separate_hilo(data, col, ...) and extract_hilo(data, col, ...). Keeping the status quo would naturally also have some advantages (such as familiarity and not requiring as much knowledge of dispatch methods).

Which approach is preferable? Would generalising separate() and extract() to dispatch on the column type be a useful contribution to tidyr?

@hadley
Copy link
Member

hadley commented Sep 23, 2019

I think this is going to be most useful for record classes, so it feels more like unpacking than separating to me. So maybe unpack() just needs to gain the ability to work with records, and to optionally select which internal columns you want to unpack?

@mitchelloharawild
Copy link
Contributor Author

I haven't looked too closely into the new tidyr verbs yet, although reading the docs for unpack() suggests that this also would be a suitable for records.

In non-record applications, separate() may be a better match as the number of elements to separate wouldn't be specific to the class. Separating HTML nodes in web scraping comes to mind, although this can be thought about more later.

@hadley
Copy link
Member

hadley commented Sep 24, 2019

My sense is that handling this for other object types would be sufficiently different that it would require a new verb.

@hadley hadley changed the title Generalise separate() and extract() to support column class methods Allow unpack() to work with vctrs_rcrd Nov 24, 2019
@hadley hadley added df-col 👜 feature a feature request or enhancement labels Nov 24, 2019
@hadley hadley added this to the v1.1.0 milestone Nov 28, 2019
@hadley
Copy link
Member

hadley commented Nov 28, 2019

After thinking some more, I think this might be a bad idea, or it's at least premature, so I'm going to close for now. Assuming hilo is a record, with the next version of dplyr you'll be able to achieve what you want with df %>% mutate(vec_proxy(hilo)[c("lower", "upper")]

@hadley hadley closed this as completed Nov 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
df-col 👜 feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants