New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In separate an into= column name can be NA to not generate that column #397
Comments
Nice idea! |
Here is a second example . This one is taken from https://stackoverflow.com/questions/48707294/turn-singular-row-with-interval-into-multiple-rows-which-equal-the-interval/48707563#48707563 This example is not as compelling because there is no subsequent reduction in code but it still does reduce the mental load of the developer by freeing them from having to come up with junk names so it still demonstrates an advantage and also reduces the size of the data relative to specifying names for all columns. In this example with the feature under discussion,
|
Here is a third example. The objective is to create two columns. It is based on https://stackoverflow.com/questions/48727230/how-to-split-columns-when-writing-csv-files/48727340#48727360 Had we been able to use NA instead of
Also note that it would have been nice to be able to write the |
+1 for this idea. This is a use case I've found myself in several times and needing to use dummy column names felt clunky. I've tried using Collation of previous examples also using `extract()`library(tidyverse)
### Example 1
ex1 <- structure(
list(pos = 1:2, BZ_SP = c(-300000L, 0L), BZ_SP_m1 = c(2L, 0L),
BZ_SP_m2 = c(3L, 0L), CL_SP = c(2540544L, -118621L),
CL_SP_m1 = c(1L, 3L), CL_SP_m2 = c(2L, 4L)),
.Names = c("pos", "BZ_SP", "BZ_SP_m1", "BZ_SP_m2",
"CL_SP", "CL_SP_m1", "CL_SP_m2"),
class = "data.frame", row.names = c(NA, -2L))
sep1 <- ex1 %>%
gather(variable, value, -pos) %>%
separate(variable, c("CurveGroup", "X", "suffix"), sep = 5:6, fill = "right") %>%
select(-X) %>%
spread(suffix, value)
ext1 <- ex1 %>%
gather(variable, value, -pos) %>%
extract(variable, c("CurveGroup", "suffix"), "(.{5})[_]*(.*)") %>%
spread(suffix, value)
identical(sep1, ext1)
#> [1] TRUE
### Example 2
ex2 <- structure(
list(Col1 = c("a", "b", "c"),
Col2 = c("odd from 1 to 9", "even from 2 to 14", "even from 30 to 50")),
.Names = c("Col1", "Col2"), row.names = c(NA, -3L), class = "data.frame")
sep2 <- ex2 %>%
separate(Col2, into = c("parity", "X1", "from", "X2", "to")) %>%
group_by(Col1) %>%
do(data.frame(Col2 = seq(.$from, .$to, 2))) %>%
ungroup
ext2 <- ex2 %>%
extract(Col2, c("from", "to"), regex = ("([0-9]+) to ([0-9]+)")) %>%
mutate(Col2 = map2(from, to, ~ seq(.x, .y, 2))) %>%
unnest() %>% select(Col1, Col2) %>% as_tibble()
identical(sep2, ext2)
#> [1] TRUE
### Example 3
ex3 <- c("(Wirtschaft, 00:00)", "(Kultur, 23:42)")
sep3 <- ex3 %>%
as.tibble %>%
separate(value, into = c("X1", "Name", "Time", "X2"), sep = ", |[()]") %>%
select(Name, Time)
ext3 <- ex3 %>% as_tibble() %>%
extract(value, into = c("Name", "Time"), "\\((.+), (.+)\\)")
identical(sep3, ext3)
#> [1] TRUE Created on 2018-04-18 by the reprex package (v0.2.0). |
I've proposed a draft PR for this in #445 which can be installed with devtools::install_github("tidyverse/tidyr#445") Short example library(tidyverse)
df <- tribble(
~foo,
"Amanda likes apple and asparagus",
"Barry likes banana and broccoli",
"Cornelia likes cherry and carrot"
)
df %>% separate(foo, into = c("name", NA, "fruit", NA, "vegetable"))
#> # A tibble: 3 x 3
#> name fruit vegetable
#> <chr> <chr> <chr>
#> 1 Amanda apple asparagus
#> 2 Barry banana broccoli
#> 3 Cornelia cherry carrot |
Here is a yet another example, this one based on https://stackoverflow.com/questions/52431841/how-to-find-the-first-space-in-a-sentence-with-regular-expressions-within-r/52432371#52432371 The input is has a single column each element of which looks like
With the suggested feature this one reduces to the following where the last line of code above is no longer needed:
|
In the example below, modified from https://stackoverflow.com/questions/48255809/to-change-tabular-data-to-a-different-format-in-r/48256012#48256012 , the
X
column is a junk column that just contains the second underscore delimiter for those rows that have it. If theseparate
statement would allowNA
or""
, say, as a column name with the meaning do not generate a column for that component then theselect(-X)
could have been omitted and theseparate
statement would becomeseparate(variable, c("CurveGroup", NA, "suffix"), sep = 5:6, fill = "right")
. Not only would this reduce the code by one line but it would also eliminate the need to generate a junk name for a column that is subsequently dropped.giving:
The text was updated successfully, but these errors were encountered: