You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current application of separate() and extract() is limited to character columns, however there are other vector classes that you may want to separate() or extract() information from.
For my particular use-case this is the fabletoolshilo class. As a vector (specifically a record), it stores a confidence interval for some prediction, consisting of the lower and upper bounds with their confidence level. Stylistically, [lower, upper]level e.g. [-3, 3]95. Some other examples of composite objects can be dreamt up, such as the latlon example class in Extending tibble.
Gaining access to the underlying elements of these classes within a data context is difficult, and I think the separate() and extract() verbs could be the appropriate interface for simplifying this.
To achieve this, separate() and extract() would now call some generic to separate or extract things from a vector (say separate_col() and extract_col()). Existing functionality would be preserved via separate_col.character(x, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...) and new functionality could be added with separate_col.class(x, ...).
For the hilo class, this would look like separate_col.hilo(x, into = c("lower", "upper", "level"), ...) allowing this workflow:
I don't think this would introduce breaking changes, although to do so the separate_col.default() must coerce to character instead of erroring (perhaps a warning/message would be useful here?).
Adding this support may make it harder for users to find the right place for documentation of each separation/extraction methods. Using generics style method linking and plenty examples in the docs would be helpful for this.
The alternative (if separate() and extract() shouldn't be generalised) would be to create separate_hilo(data, col, ...) and extract_hilo(data, col, ...). Keeping the status quo would naturally also have some advantages (such as familiarity and not requiring as much knowledge of dispatch methods).
Which approach is preferable? Would generalising separate() and extract() to dispatch on the column type be a useful contribution to tidyr?
The text was updated successfully, but these errors were encountered:
I think this is going to be most useful for record classes, so it feels more like unpacking than separating to me. So maybe unpack() just needs to gain the ability to work with records, and to optionally select which internal columns you want to unpack?
I haven't looked too closely into the new tidyr verbs yet, although reading the docs for unpack() suggests that this also would be a suitable for records.
In non-record applications, separate() may be a better match as the number of elements to separate wouldn't be specific to the class. Separating HTML nodes in web scraping comes to mind, although this can be thought about more later.
After thinking some more, I think this might be a bad idea, or it's at least premature, so I'm going to close for now. Assuming hilo is a record, with the next version of dplyr you'll be able to achieve what you want with df %>% mutate(vec_proxy(hilo)[c("lower", "upper")]
Original/motivating issue: tidyverts/fabletools#122
The current application of
separate()
andextract()
is limited to character columns, however there are other vector classes that you may want toseparate()
orextract()
information from.For my particular use-case this is the
fabletools
hilo
class. As a vector (specifically a record), it stores a confidence interval for some prediction, consisting of the lower and upper bounds with their confidence level. Stylistically,[lower, upper]level
e.g.[-3, 3]95
. Some other examples of composite objects can be dreamt up, such as thelatlon
example class in Extending tibble.Gaining access to the underlying elements of these classes within a data context is difficult, and I think the
separate()
andextract()
verbs could be the appropriate interface for simplifying this.To achieve this,
separate()
andextract()
would now call some generic to separate or extract things from a vector (sayseparate_col()
andextract_col()
). Existing functionality would be preserved viaseparate_col.character(x, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...)
and new functionality could be added withseparate_col.class(x, ...)
.For the hilo class, this would look like
separate_col.hilo(x, into = c("lower", "upper", "level"), ...)
allowing this workflow:Similarly
extract_col.hilo(x, into, what = into, ...)
could be used to pull out elements of interest from thehilo
.I don't think this would introduce breaking changes, although to do so the
separate_col.default()
must coerce to character instead of erroring (perhaps a warning/message would be useful here?).Adding this support may make it harder for users to find the right place for documentation of each separation/extraction methods. Using
generics
style method linking and plenty examples in the docs would be helpful for this.The alternative (if
separate()
andextract()
shouldn't be generalised) would be to createseparate_hilo(data, col, ...)
andextract_hilo(data, col, ...)
. Keeping the status quo would naturally also have some advantages (such as familiarity and not requiring as much knowledge of dispatch methods).Which approach is preferable? Would generalising
separate()
andextract()
to dispatch on the column type be a useful contribution totidyr
?The text was updated successfully, but these errors were encountered: