New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`[` mistakenly returns rows, not columns, when subsetting with i and specifying drop #307

Closed
DavisVaughan opened this Issue Sep 17, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@DavisVaughan
Member

DavisVaughan commented Sep 17, 2017

I don't think this is intended behavior, and it is actually causing an issue for me in tibbletime. Here's the simple version.


In base R, specifying drop with a data frame is ignored and the column is returned.

# Iris column subsetting
head(iris[1])
#>   Sepal.Length
#> 1          5.1
#> 2          4.9
#> 3          4.7
#> 4          4.6
#> 5          5.0
#> 6          5.4

# Specifying drop still returns the column
head(iris[1, drop = FALSE])
#> Warning in `[.data.frame`(iris, 1, drop = FALSE): 'drop' argument will be
#> ignored
#>   Sepal.Length
#> 1          5.1
#> 2          4.9
#> 3          4.7
#> 4          4.6
#> 5          5.0
#> 6          5.4

In tibble, if drop is specified it is ignored but it treats i like you are wanting to subset by rows, not columns.

# iris tbl
library(tibble)
iris_tbl <- as_tibble(iris)

# This works as intended
iris_tbl[1]
#> # A tibble: 150 x 1
#>    Sepal.Length
#>           <dbl>
#>  1         5.10
#>  2         4.90
#>  3         4.70
#>  4         4.60
#>  5         5.00
#>  6         5.40
#>  7         4.60
#>  8         5.00
#>  9         4.40
#> 10         4.90
#> # ... with 140 more rows

# This returns rows not columns
iris_tbl[1, drop = FALSE]
#> # A tibble: 1 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fctr> 
#> 1         5.10        3.50         1.40       0.200 setosa

# Same thing but with two columns and drop=TRUE
iris_tbl[c(1,3)]
#> # A tibble: 150 x 2
#>    Sepal.Length Petal.Length
#>           <dbl>        <dbl>
#>  1         5.10         1.40
#>  2         4.90         1.40
#>  3         4.70         1.30
#>  4         4.60         1.50
#>  5         5.00         1.40
#>  6         5.40         1.70
#>  7         4.60         1.40
#>  8         5.00         1.50
#>  9         4.40         1.40
#> 10         4.90         1.50
#> # ... with 140 more rows

# Correctly displays warning, but returns rows
iris_tbl[c(1,3), drop = TRUE]
#> Warning: drop ignored
#> # A tibble: 2 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fctr> 
#> 1         5.10        3.50         1.40       0.200 setosa 
#> 2         4.70        3.20         1.30       0.200 setosa

The issue seems to be coming from the use of if (nargs() <= 2L) in tbl-df.R. When drop is specified, then the number of args becomes 3. Is there another way to check for what you want here? It seems like you just want something like if( !missing(i) & missing(j) ) but I don't think that is a drop in replacement by itself.

@DavisVaughan

This comment has been minimized.

Member

DavisVaughan commented Sep 18, 2017

Here is an easy solution. The problem really stems from that fact that when drop is specified, nargs() counts an extra argument. You really don't care if drop is specified or not, so just ignore drop as an argument.

  # Ignore drop as an argument
  .nargs <- if(!missing(drop)) {
    nargs() - 1
  } else {
    nargs()
  }

# Escape early if nargs() == 2L; ie, column subsetting
if (.nargs <= 2L) {
   ...
}

With this implemented, all tests still pass and the following works:

library(tibble)
iris_tbl <- as_tibble(iris)

# This works as intended
iris_tbl[1]
#> # A tibble: 150 x 1
#>    Sepal.Length
#>           <dbl>
#>  1         5.10
#>  2         4.90
#>  3         4.70
#>  4         4.60
#>  5         5.00
#>  6         5.40
#>  7         4.60
#>  8         5.00
#>  9         4.40
#> 10         4.90
#> # ... with 140 more rows

# drop is ignored, so .nargs = 2
iris_tbl[1, drop = FALSE]
#> # A tibble: 150 x 1
#>    Sepal.Length
#>           <dbl>
#>  1         5.10
#>  2         4.90
#>  3         4.70
#>  4         4.60
#>  5         5.00
#>  6         5.40
#>  7         4.60
#>  8         5.00
#>  9         4.40
#> 10         4.90
#> # ... with 140 more rows

# drop is ignored, so .nargs = 3
# same behavior as base so this is good
iris_tbl[1, , drop = FALSE]
#> # A tibble: 1 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fctr> 
#> 1         5.10        3.50         1.40       0.200 setosa

# Same thing but with two columns and drop=TRUE
iris_tbl[c(1,3)]
#> # A tibble: 150 x 2
#>    Sepal.Length Petal.Length
#>           <dbl>        <dbl>
#>  1         5.10         1.40
#>  2         4.90         1.40
#>  3         4.70         1.30
#>  4         4.60         1.50
#>  5         5.00         1.40
#>  6         5.40         1.70
#>  7         4.60         1.40
#>  8         5.00         1.50
#>  9         4.40         1.40
#> 10         4.90         1.50
#> # ... with 140 more rows

# Correctly displays warning, and now returns column
iris_tbl[c(1,3), drop = TRUE]
#> Warning: drop ignored
#> # A tibble: 150 x 2
#>    Sepal.Length Petal.Length
#>           <dbl>        <dbl>
#>  1         5.10         1.40
#>  2         4.90         1.40
#>  3         4.70         1.30
#>  4         4.60         1.50
#>  5         5.00         1.40
#>  6         5.40         1.70
#>  7         4.60         1.40
#>  8         5.00         1.50
#>  9         4.40         1.40
#> 10         4.90         1.50
#> # ... with 140 more rows

Or, more compactly,

# Ignore drop as an argument
  .nargs <- nargs() - !missing(drop)

@krlmlr krlmlr closed this in 1b53228 Oct 4, 2017

@krlmlr

This comment has been minimized.

Member

krlmlr commented Oct 4, 2017

Thanks for raising this, and for suggesting a solution.

krlmlr added a commit that referenced this issue Nov 3, 2017

Merge tag 'v1.3.4.9002'
- In `glimpse()`, compute `type_sum()` from data frame for dbplyr compatibility (#328).
- `as_tibble.matrix()` repairs column names.
- Compatible with R 3.1 (#323).
- Make add_case() and alias for add_row() (#324, @LaDilettante).
- `add_column()` to an empty zero-row tibble with a variable of nonzero length now produces a correct error message (#319).
- Logical indexes are supported, a warning is raised if the length does not match the number of rows or 1 (#318).
- Tibbles now support character subsetting (#312).
- `as_tibble()` gains `rownames` argument (#288, #289).
- Remove Rcpp dependency (#313, @patperry).
- ``` `[.tbl_df`() ``` supports `drop = TRUE` and omits the warning if `j` is passed. The calls `df[i, j, drop = TRUE]` and `df[i, drop = TRUE]` are now compatible with data frames again (#307, #311).
- Prevent `add_column()` from dropping classes and attributes by removing the use of `cbind()`. Additionally this ensures that `add_column()` can be used with grouped data frames (#303, @DavisVaughan).
- Fixed width for word wrapping of the extra information (#301).

krlmlr added a commit that referenced this issue Dec 28, 2017

Merge tag 'v1.4.1'
New formatting
--------------

The new pillar package is now responsible for formatting tibbles. Pillar will try to display as many columns as possible, if necessary truncating or shortening the output. Colored output highlights important information and guides the eye. The vignette in the tibble package describes how to adapt custom data types for optimal display in a tibble.

New features
------------

- Make `add_case()` an alias for `add_row()` (#324, @LaDilettante).
- `as_tibble()` gains `rownames` argument (#288, #289).
- `as_tibble.matrix()` repairs column names.
- Tibbles now support character subsetting (#312).
- ``` `[.tbl_df`() ``` supports `drop = TRUE` and omits the warning if `j` is passed. The calls `df[i, j, drop = TRUE]` and `df[i, drop = TRUE]` are now compatible with data frames again (#307, #311).

Bug fixes
---------

- Improved compatibility with remote data sources for `glimpse()` (#328).
- Logical indexes are supported, a warning is raised if the length does not match the number of rows or 1 (#318).
- Fixed width for word wrapping of the extra information (#301).
- Prevent `add_column()` from dropping classes and attributes by removing the use of `cbind()`. Additionally this ensures that `add_column()` can be used with grouped data frames (#303, @DavisVaughan).
- `add_column()` to an empty zero-row tibble with a variable of nonzero length now produces a correct error message (#319).

Internal changes
----------------

- Reexporting `has_name()` from rlang, instead of forwarding, to avoid warning when importing both rlang and tibble.
- Compatible with R 3.1 (#323).
- Remove Rcpp dependency (#313, @patperry).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment