Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mutate() to choose the position of new columns #2047

Closed
krlmlr opened this issue Jul 30, 2016 · 49 comments · Fixed by #4774
Closed

Allow mutate() to choose the position of new columns #2047

krlmlr opened this issue Jul 30, 2016 · 49 comments · Fixed by #4774
Labels
feature verbs 🏃‍♀️

Comments

@krlmlr
Copy link
Member

krlmlr commented Jul 30, 2016

These arguments would specify the position where the new columns are inserted, before or after some column given as index or name.

See also tidyverse/tibble#99.

@krlmlr

This comment has been minimized.

@hadley

This comment has been minimized.

@krlmlr

This comment has been minimized.

@hadley hadley added data frame feature labels Feb 2, 2017
@hadley

This comment has been minimized.

@krlmlr

This comment has been minimized.

@lionel-

This comment has been minimized.

@hadley
Copy link
Member

hadley commented Feb 2, 2017

To me, that seems overly complicated. I think 90% of the time people just want variables to go in the front, instead of in the back.

@krlmlr

This comment has been minimized.

@krlmlr

This comment has been minimized.

@romainfrancois

This comment has been minimized.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 29, 2018

If we're only interested in supporting adding to the front, maybe we could use transmute()?

@lionel-: What would it take to make the following syntax work?

library(tidyverse)
tibble(a = 2, b = 3) %>% transmute(x = 1, everything())
#> Error in mutate_impl(.data, dots): Evaluation error: No tidyselect variables were registered.

Created on 2018-03-30 by the reprex package (v0.2.0).

@hadley

This comment has been minimized.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 29, 2018

Agreed, but mutate() and transmute() already seem to support select semantics, you can use just the name of an existing variable. Under the hood this is auto-naming and evaluation, but it feels like a selection. If we just supported everything() as a special case here?

Users might be tempted to do the following (I have no idea why the second occurrence of Species is discarded, feels like a bug):

library(tidyverse)
iris %>% transmute(Species, !!!.) %>% head()
#>   Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1  setosa          5.1         3.5          1.4         0.2
#> 2  setosa          4.9         3.0          1.4         0.2
#> 3  setosa          4.7         3.2          1.3         0.2
#> 4  setosa          4.6         3.1          1.5         0.2
#> 5  setosa          5.0         3.6          1.4         0.2
#> 6  setosa          5.4         3.9          1.7         0.4

Created on 2018-03-30 by the reprex package (v0.2.0).

@lionel-

This comment has been minimized.

@krlmlr

This comment has been minimized.

@hadley

This comment has been minimized.

@romainfrancois

This comment has been minimized.

@krlmlr
Copy link
Member Author

krlmlr commented May 30, 2018

This is important when developing data transformation pipes: Especially after a mutate() the effect of the operation is difficult to see if the data has more than just a few columns. While you'd usually want the new column added to the end, you still want to double-check it manually.

I wonder if we could attach an attribute to the result of mutate() that changes how the resulting tibble is printed, so that we show the new data instead of existing columns. This attribute would vanish when the result is further transformed, e.g. with select() or summarize().

Sketch:

library(tidyverse)
options(width = 60)
iris %>%
  as_tibble() %>%
  mutate(Sepal.Area = Sepal.Width * Sepal.Length)
#> # A tibble: 150 x 4
#>    Sepal.Length Sepal.Width Petal.Length | Sepal.Area
#>           <dbl>       <dbl>        <dbl> |      <dbl>
#>  1          5.1         3.5          1.4 |       17.8
#>  2          4.9         3            1.4 |       14.7
#>  3          4.7         3.2          1.3 |       15.0
#>  4          4.6         3.1          1.5 |       14.3
#>  5          5           3.6          1.4 |       18  
#>  6          5.4         3.9          1.7 |       21.1
#>  7          4.6         3.4          1.4 |       15.6
#>  8          5           3.4          1.5 |       17  
#>  9          4.4         2.9          1.4 |       12.8
#> 10          4.9         3.1          1.5 |       15.2
#> # ... with 140 more rows, and 2 more variables:
#> #   Petal.Width <dbl>, Species <fct>

Composed on 2018-05-30 with the help of the reprex package (v0.2.0).

@see24

This comment has been minimized.

@torfason

This comment has been minimized.

@grayskripko

This comment has been minimized.

@cderv

This comment has been minimized.

@romainfrancois

This comment has been minimized.

@grayskripko

This comment has been minimized.

@torfason

This comment has been minimized.

@lionel-

This comment has been minimized.

@lionel-
Copy link
Member

lionel- commented Jan 25, 2019

If we support automatic splicing of unnamed data frames (named ones would be added as df-cols), and sels() and acts() helpers returning data frames, one possible syntax for this would be:

data %>% transmute(new = foo, sels(everything())

@romainfrancois

This comment has been minimized.

@lionel-

This comment has been minimized.

@romainfrancois

This comment has been minimized.

@moodymudskipper

This comment has been minimized.

@hadley
Copy link
Member

hadley commented Feb 9, 2019

It may be worth considering that the primary reason to put variables on the left is so you can see them in the default display. It's possible that a better fix for the underlying issue would also be to show some columns on the far right (i.e. elide columns in the middle, not on the right)

@japhir
Copy link

japhir commented Feb 9, 2019

I think it would be a very useful default to display the first and last few columns in stead of just the first few. I've had to use options(tibble.width = Inf) quite often recently, even though most of the time I'm only interested in the last few ones. I could use glimpse of course, but I prefer to see the actual data.

@krlmlr

This comment has been minimized.

@japhir

This comment has been minimized.

@see24

This comment has been minimized.

@moodymudskipper

This comment has been minimized.

@Sibojang9

This comment has been minimized.

@batpigandme

This comment has been minimized.

@tidyverse tidyverse deleted a comment from Sibojang9 Dec 5, 2019
@tidyverse tidyverse deleted a comment from batpigandme Dec 5, 2019
@tidyverse tidyverse deleted a comment from Sibojang9 Dec 5, 2019
@braiam
Copy link

braiam commented Dec 30, 2019

In MySQL we have two verbs, first and after , to add a column in a table with alter. Maybe dplyr can copy that behavior? It could also solve #4598.

@hadley
Copy link
Member

hadley commented Dec 31, 2019

I'm once again leaning towards a .where argument, which would use tidyselect along the lines of r-lib/tidyselect#151.

@hadley
Copy link
Member

hadley commented Dec 31, 2019

Discussion of dedicated move verb at #4598

You can now use transmute() + across() to add new cols at the start:

df %>%
  transmute(x = y ^ 2, across(everything())

@hadley hadley changed the title FR: before and after arguments to mutate() Allow mutate() to choose the position of new columns Jan 7, 2020
hadley added a commit that referenced this issue Jan 23, 2020
@courtiol
Copy link
Contributor

courtiol commented Feb 8, 2020

@hadley, this is a great commit, but perhaps it should support multiple columns relocation (the thread title is indeed Allow mutate() to choose the position of new columns):

iris %>%
  as_tibble() %>%
  mutate(Petal.Length.mm = Petal.Length * 10,
         Sepal.Length.mm = Sepal.Length * 10,
         .after = c(Petal.Length, Sepal.Length))

produces

# A tibble: 150 x 6
   Petal.Length Sepal.Length Petal.Length.mm Sepal.Length.mm Petal.Width Species

and not

# A tibble: 150 x 6
   Petal.Length Petal.Length.mm Sepal.Length Sepal.Length.mm Petal.Width Species

@lionel-
Copy link
Member

lionel- commented Feb 8, 2020

@courtiol Your suggestion would not work with plural selections like starts_with() or is.numeric.

@courtiol
Copy link
Contributor

courtiol commented Feb 8, 2020

I agree @lionel-, but it would make sense in mutate() calls without suffix (as shown above), wouldn't it?
Perhaps for mutate_if() and mutate_at() a possibility would be to consider a .before = .x (or .after = .x) to imply that each newly created column goes before (or after) its source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature verbs 🏃‍♀️
Projects
None yet
Development

Successfully merging a pull request may close this issue.