Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select column to be last #3051

Closed
jrosen48 opened this issue Aug 24, 2017 · 13 comments
Closed

Select column to be last #3051

jrosen48 opened this issue Aug 24, 2017 · 13 comments

Comments

@jrosen48
Copy link

jrosen48 commented Aug 24, 2017

Brief description of the problem

Using dplyr, it is easy to select a column to be ordered first (i.e., the first column from the left side), or to order all of the columns, but it seems like there is not a straightforward way to select a column to be ordered last (i.e., the first from the right side).

For example, if I run the following, Sepal.Length is still the first column:

library(dplyr)
iris <- tbl_df(iris)
iris <- select(iris, everything(), Sepal.Length)

Here is the output:

> iris
# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
 1          5.1         3.5          1.4         0.2  setosa
 2          4.9         3.0          1.4         0.2  setosa
 3          4.7         3.2          1.3         0.2  setosa
 4          4.6         3.1          1.5         0.2  setosa
 5          5.0         3.6          1.4         0.2  setosa
 6          5.4         3.9          1.7         0.4  setosa
 7          4.6         3.4          1.4         0.3  setosa
 8          5.0         3.4          1.5         0.2  setosa
 9          4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
# ... with 140 more rows

There is the following work-around given in an answer to this Stack Overflow question, however this seems counter-intuitive:

iris <- select(iris, everything(), -Sepal.Length, Sepal.Length)

Here is the output:

> iris
# A tibble: 150 x 5
   Sepal.Width Petal.Length Petal.Width Species Sepal.Length
         <dbl>        <dbl>       <dbl>  <fctr>        <dbl>
 1         3.5          1.4         0.2  setosa          5.1
 2         3.0          1.4         0.2  setosa          4.9
 3         3.2          1.3         0.2  setosa          4.7
 4         3.1          1.5         0.2  setosa          4.6
 5         3.6          1.4         0.2  setosa          5.0
 6         3.9          1.7         0.4  setosa          5.4
 7         3.4          1.4         0.3  setosa          4.6
 8         3.4          1.5         0.2  setosa          5.0
 9         2.9          1.4         0.2  setosa          4.4
10         3.1          1.5         0.1  setosa          4.9
# ... with 140 more rows
@jrosen48 jrosen48 changed the title Move column to be last in a data.frame Select column to be last Aug 24, 2017
@krlmlr krlmlr added the feature a feature request or enhancement label Aug 24, 2017
@krlmlr
Copy link
Member

krlmlr commented Aug 24, 2017

@lionel-: Does this now belong in tidyselect?

@lionel-
Copy link
Member

lionel- commented Aug 24, 2017

Yes. @hadley we could have everything_but(), or a but argument to everything()? Though that's not a frank improvement over the SO solution.

The issue is that everything() returns all positions without other semantic information, so we cannot reason over its output. Maybe it should return a tagged vector? Not sure how much that would complicate the implementation.

@romainfrancois
Copy link
Member

Maybe that can be a new verb, e.g.

stash <- function(data, ...){
  nms <- names(data)
  all_vars <- vars_select( nms, everything() )
  sel_vars <- vars_select( nms, ... )
  
  selection <- c( setdiff( all_vars, sel_vars), sel_vars )
  select( data, one_of(selection) )
}
stash(iris, Sepal.Length)

Can be used to also move to first:

> ir <- as_tibble(iris)
> stash( ir, Sepal.Length )
# A tibble: 150 x 5
   Sepal.Width Petal.Length Petal.Width Species Sepal.Length
         <dbl>        <dbl>       <dbl>  <fctr>        <dbl>
 1         3.5          1.4         0.2  setosa          5.1
 2         3.0          1.4         0.2  setosa          4.9
 3         3.2          1.3         0.2  setosa          4.7
 4         3.1          1.5         0.2  setosa          4.6
 5         3.6          1.4         0.2  setosa          5.0
 6         3.9          1.7         0.4  setosa          5.4
 7         3.4          1.4         0.3  setosa          4.6
 8         3.4          1.5         0.2  setosa          5.0
 9         2.9          1.4         0.2  setosa          4.4
10         3.1          1.5         0.1  setosa          4.9
# ... with 140 more rows
> stash( ir, -Species )
# A tibble: 150 x 5
   Species Sepal.Length Sepal.Width Petal.Length Petal.Width
    <fctr>        <dbl>       <dbl>        <dbl>       <dbl>
 1  setosa          5.1         3.5          1.4         0.2
 2  setosa          4.9         3.0          1.4         0.2
 3  setosa          4.7         3.2          1.3         0.2
 4  setosa          4.6         3.1          1.5         0.2
 5  setosa          5.0         3.6          1.4         0.2
 6  setosa          5.4         3.9          1.7         0.4
 7  setosa          4.6         3.4          1.4         0.3
 8  setosa          5.0         3.4          1.5         0.2
 9  setosa          4.4         2.9          1.4         0.2
10  setosa          4.9         3.1          1.5         0.1
# ... with 140 more rows

@lionel-
Copy link
Member

lionel- commented Aug 24, 2017

Another useful variant might be rotate(). In datasets with many columns, the variables are often sorted thematically.

cc @jennybc

@hadley
Copy link
Member

hadley commented Aug 24, 2017

This also works:

select(iris, -Sepal.Length, Sepal.Length)

i.e. select everything except Sepal.Length, then select Sepal.Length.

This also works:

dplyr::select(iris, -Sepal.Length, everything())

I don't think this is such a common operation that it needs it's own verb.

@jrosen48
Copy link
Author

Is the logic of selecting a column to be first through select(iris, Species, everything()) that everything() works by selecting "everything else"?

If so, then could select(iris, everything(), Sepal.Length) be modified to (implicitly) work as "everything but"?

@lionel-
Copy link
Member

lionel- commented Aug 25, 2017

No, it selects everything. It's like supplying 1, 1:10. The rendundant 1 is ignored and it still appears first in this case. But if you supply 1:10, 1, the last 1 is ignored.

@krlmlr
Copy link
Member

krlmlr commented Jan 19, 2018

This functionality seems to be easy enough to implement with existing functionality, and rare enough that it doesn't need its own verb. We could add an example to the documentation, either here or in tidyselect.

@romainfrancois
Copy link
Member

romainfrancois commented Apr 23, 2018

Another take with some tidyeval :

library(purrr)
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, %||%, as_function, flatten, flatten_chr, flatten_dbl,
#>     flatten_int, flatten_lgl, invoke, list_along, modify, prepend,
#>     rep_along, splice
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

back <- function(data, ...){
  dots <- quos(...)
  ndots <- map(dots, function(q) expr(-!!q) )
  select( data, !!!ndots, !!!dots )
}
back(iris, Species) %>% head
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
back(iris, Species, starts_with("Petal")) %>% head
#>   Sepal.Length Sepal.Width Species Petal.Length Petal.Width
#> 1          5.1         3.5  setosa          1.4         0.2
#> 2          4.9         3.0  setosa          1.4         0.2
#> 3          4.7         3.2  setosa          1.3         0.2
#> 4          4.6         3.1  setosa          1.5         0.2
#> 5          5.0         3.6  setosa          1.4         0.2
#> 6          5.4         3.9  setosa          1.7         0.4

Created on 2018-04-23 by the reprex package (v0.2.0).

@romainfrancois
Copy link
Member

@hadley should we close this as this can easily be done outside of dplyr ?

@krlmlr
Copy link
Member

krlmlr commented Apr 23, 2018

Do we want to enhance the documentation first?

@hadley
Copy link
Member

hadley commented Apr 23, 2018

I think one example in the documentation (along the lines of my code above) would be nice.

@romainfrancois romainfrancois self-assigned this Apr 23, 2018
@krlmlr krlmlr closed this as completed in 5c235ec May 2, 2018
krlmlr added a commit that referenced this issue May 2, 2018
- Add documentation example for moving variable to back in `?select` (#3051).
@lock
Copy link

lock bot commented Oct 29, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Oct 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants