Allow unpack() to work with vctrs_rcrd #755

mitchelloharawild · 2019-09-23T06:03:55Z

Original/motivating issue: tidyverts/fabletools#122

The current application of separate() and extract() is limited to character columns, however there are other vector classes that you may want to separate() or extract() information from.

For my particular use-case this is the fabletools hilo class. As a vector (specifically a record), it stores a confidence interval for some prediction, consisting of the lower and upper bounds with their confidence level. Stylistically, [lower, upper]level e.g. [-3, 3]95. Some other examples of composite objects can be dreamt up, such as the latlon example class in Extending tibble.

Gaining access to the underlying elements of these classes within a data context is difficult, and I think the separate() and extract() verbs could be the appropriate interface for simplifying this.

To achieve this, separate() and extract() would now call some generic to separate or extract things from a vector (say separate_col() and extract_col()). Existing functionality would be preserved via separate_col.character(x, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...) and new functionality could be added with separate_col.class(x, ...).

For the hilo class, this would look like separate_col.hilo(x, into = c("lower", "upper", "level"), ...) allowing this workflow:

library(fabletools)
library(tidyverse)
df <- tibble(hilo = new_hilo(rnorm(5), rnorm(5, 3), 95))
df
#> # A tibble: 5 x 1
#>                       hilo
#>                     <hilo>
#> 1 [ 0.7348065, 3.369723]95
#> 2 [-0.9783929, 2.784432]95
#> 3 [-0.7608599, 4.152290]95
#> 4 [ 0.4115154, 2.760376]95
#> 5 [-1.4132969, 2.969952]95
df %>% separate(hilo)
#> # A tibble: 5 x 3
#>    lower upper level
#>    <dbl> <dbl> <dbl>
#> 1  0.735  3.37    95
#> 2 -0.978  2.78    95
#> 3 -0.761  4.15    95
#> 4  0.412  2.76    95
#> 5 -1.41   2.97    95

Similarly extract_col.hilo(x, into, what = into, ...) could be used to pull out elements of interest from the hilo.

df %>% extract(hilo, c("lower", "upper"))
#> # A tibble: 5 x 3
#>    lower upper
#>    <dbl> <dbl>
#> 1  0.735  3.37
#> 2 -0.978  2.78
#> 3 -0.761  4.15
#> 4  0.412  2.76
#> 5 -1.41   2.97

I don't think this would introduce breaking changes, although to do so the separate_col.default() must coerce to character instead of erroring (perhaps a warning/message would be useful here?).

Adding this support may make it harder for users to find the right place for documentation of each separation/extraction methods. Using generics style method linking and plenty examples in the docs would be helpful for this.

The alternative (if separate() and extract() shouldn't be generalised) would be to create separate_hilo(data, col, ...) and extract_hilo(data, col, ...). Keeping the status quo would naturally also have some advantages (such as familiarity and not requiring as much knowledge of dispatch methods).

Which approach is preferable? Would generalising separate() and extract() to dispatch on the column type be a useful contribution to tidyr?

The text was updated successfully, but these errors were encountered:

hadley · 2019-09-23T12:31:25Z

I think this is going to be most useful for record classes, so it feels more like unpacking than separating to me. So maybe unpack() just needs to gain the ability to work with records, and to optionally select which internal columns you want to unpack?

mitchelloharawild · 2019-09-23T13:57:25Z

I haven't looked too closely into the new tidyr verbs yet, although reading the docs for unpack() suggests that this also would be a suitable for records.

In non-record applications, separate() may be a better match as the number of elements to separate wouldn't be specific to the class. Separating HTML nodes in web scraping comes to mind, although this can be thought about more later.

hadley · 2019-09-24T17:56:49Z

My sense is that handling this for other object types would be sufficiently different that it would require a new verb.

hadley · 2019-11-28T19:26:17Z

After thinking some more, I think this might be a bad idea, or it's at least premature, so I'm going to close for now. Assuming hilo is a record, with the next version of dplyr you'll be able to achieve what you want with df %>% mutate(vec_proxy(hilo)[c("lower", "upper")]

mitchelloharawild mentioned this issue Oct 1, 2019

Unnest_tsibble -- "Error: Can't slice a scalar" tidyverts/fabletools#125

Closed

hadley changed the title ~~Generalise separate() and extract() to support column class methods~~ Allow unpack() to work with vctrs_rcrd Nov 24, 2019

hadley added df-col 👜 feature a feature request or enhancement labels Nov 24, 2019

hadley added this to the v1.1.0 milestone Nov 28, 2019

hadley closed this as completed Nov 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow unpack() to work with vctrs_rcrd #755

Allow unpack() to work with vctrs_rcrd #755

mitchelloharawild commented Sep 23, 2019 •

edited

hadley commented Sep 23, 2019

mitchelloharawild commented Sep 23, 2019

hadley commented Sep 24, 2019

hadley commented Nov 28, 2019

Allow unpack() to work with vctrs_rcrd #755

Allow unpack() to work with vctrs_rcrd #755

Comments

mitchelloharawild commented Sep 23, 2019 • edited

hadley commented Sep 23, 2019

mitchelloharawild commented Sep 23, 2019

hadley commented Sep 24, 2019

hadley commented Nov 28, 2019

mitchelloharawild commented Sep 23, 2019 •

edited