Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrange()-ing by data frame (instead of by variable) results in strange output instead of error message #3153

Closed
huftis opened this issue Oct 23, 2017 · 3 comments
Assignees
Labels

Comments

@huftis
Copy link

@huftis huftis commented Oct 23, 2017

It’s currently possible to use arrange() to sort a data frame by another data frame instead of by a variable. This does not make sense, and should have resulted in an error message. Instead it results in a strangely sorted, truncated version of the original data frame.

Here is an example. I generate a 150-row long tibble with two variables, x and iri. I want to sort the tibble by the iri variable, but accidentally misspells it as iris, which exists as an example data set. The result is a 5-row subset of the original tibble, with the rows in a seemingly random order.

library(dplyr)
set.seed(1)
d = tibble(x = 1:150, iri = rnorm(150))
arrange(d, iris)
#> # A tibble: 5 x 2
#>       x        iri
#>   <int>      <dbl>
#> 1     4  1.5952808
#> 2     3 -0.8356286
#> 3     2  0.1836433
#> 4     5  0.3295078
#> 5     1 -0.6264538

Here’s a similar example, using the mtcars dataset. The 32 rows of the tibble are reduced to 11 rows, again in a seemingly random order:

arrange(d[1:32,], mtcars)
#> # A tibble: 11 x 2
#>        x        iri
#>    <int>      <dbl>
#>  1     7  0.4874291
#>  2    11  1.5117812
#>  3     6 -0.8204684
#>  4     5  0.3295078
#>  5    10 -0.3053884
#>  6     1 -0.6264538
#>  7     2  0.1836433
#>  8     4  1.5952808
#>  9     3 -0.8356286
#> 10     9  0.5757814
#> 11     8  0.7383247
@huftis
Copy link
Author

@huftis huftis commented Oct 23, 2017

Note that iris has 5 columns and mtcars has 11 columns. That’s probably the reason the resulting data frames had 5 and 11 rows, respectively.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Mar 5, 2018

The code at fault comes from #765 and the "feature" I added at the time (https://github.com/tidyverse/dplyr/blob/master/src/arrange.cpp#L48)

  • don't really make sense
  • no longer works
> df <- data.frame( a = 1, b = 1:10, c = 1:10 )
> df$b <- data.frame( x = 10:1, y = 1:10 )
> arrange( df, b )
 Show Traceback
 
 Rerun with Debug
 Error: Column `b` must be a 1d atomic vector or a list 

romainfrancois added a commit that referenced this issue Mar 5, 2018
krlmlr added a commit that referenced this issue Mar 7, 2018
* fail gracefully on data.frame columns in arrange. closes #3153

* + NEWS entry
@lock
Copy link

@lock lock bot commented Sep 1, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Sep 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants