New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix returned number of rows while subsetting zero columns. #241

Merged
merged 3 commits into from May 5, 2017

Conversation

Projects
None yet
3 participants
@echasnovski
Contributor

echasnovski commented May 3, 2017

Hello.
Recently I experienced an odd behaviour of [.tbl_df:

iris_tbl <- tibble::as_tibble(iris)
nrow_iris <- nrow(iris_tbl)

nrow(iris_tbl[, 0]) # equals to nrow_iris

nrow(iris_tbl[, 0][1:10, ]) # equals to 10
nrow(iris_tbl[1:10, ][, 0]) # equals to 10
nrow(iris_tbl[1:10, 0]) # expected to be equal to 10 but it's 0

nrow(iris_tbl[, 0][-(1:10), ]) # equals to nrow_iris - 10
nrow(iris_tbl[-(1:10), ][, 0]) # equals to nrow_iris - 10
nrow(iris_tbl[-(1:10), 0]) # expected to be equal to nrow_iris - 10 but it's 0

I think that the two unexpected cases weren't designed to behave like this.
I think the problem is in the command nr <- length(attr(x, "row.names")[i]) of [.tbl_df which computes the resulting number of rows in case of zero columns by applying R's standard subsetting mechanism to some vector of appropriate length. But in case of zero columns attr(x, "row.names") equals to NULL at that point of function evaluation and the resulting nr equals to 0 regardless of i. Changing attr(x, "row.names") to seq_len(nr) resolves the issue.
Also added some tests for checking number of rows in case of zero columns.

R/tbl-df.r Outdated
@@ -69,7 +69,7 @@ print.tbl_df <- function(x, ..., n = NULL, width = NULL, n_extra = NULL) {
# Next, subset rows
if (!missing(i)) {
if (length(x) == 0) {
nr <- length(attr(x, "row.names")[i])
nr <- length(seq_len(nr)[i])

This comment has been minimized.

@krlmlr

krlmlr May 3, 2017

Member

Thanks. Can we keep the original code if has_rownames() is TRUE for the original data frame?

This comment has been minimized.

@echasnovski

echasnovski May 4, 2017

Contributor

I don't quite understand your request. Even if for original data frame has_rownames() is TRUE in case of zero columns after the command x <- .subset(x, j) x becomes NULL and the original
length(attr(x, "row.names")[i]) gives not what is expected.
Now I have an interesting idea. How about just change the order of subsetting in the original code: first subset rows and then columns? Seems like it will also fix the issue. The possible drawback is slightly increased computation time in the case of many columns due to
map(x, `[`, i). What do you think?

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

Let's keep subsetting columns first. Can you keep the original x and use a different variable name for the result after subsetting columns?

@codecov-io

This comment has been minimized.

codecov-io commented May 4, 2017

Codecov Report

Merging #241 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #241      +/-   ##
=========================================
+ Coverage   89.48%   89.5%   +0.02%     
=========================================
  Files          21      21              
  Lines         875     877       +2     
=========================================
+ Hits          783     785       +2     
  Misses         92      92
Impacted Files Coverage Δ
R/tbl-df.r 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08af6b0...acdc690. Read the comment docs.

R/tbl-df.r Outdated
if (length(x) == 0) {
nr <- length(seq_len(nr)[i])
if (length(result) == 0) {
nr <- length(attr(x, "row.names")[i])

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

Did you mean to use seq_len(nr) here?

This comment has been minimized.

@echasnovski

echasnovski May 4, 2017

Contributor

Actually no: I thought that the closer to original code the better. There is no need to use seq_len(nr) with this logic. If it is fine to use seq_len(nr) then maybe the first variant is the best.

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

What if the original DF had 0 columns?

This comment has been minimized.

@echasnovski

echasnovski May 4, 2017

Contributor

I don't see why it might be an issue. For DF with 0 columns attr(x, "row.names") returns the appropriate vector. Also the result value seems correct before the if (length(result) == 0 command.

@echasnovski

This comment has been minimized.

Contributor

echasnovski commented May 4, 2017

Also what do you think about new tests? Is there a need to change something? Because I noticed that these expectations almost duplicate previous test_that but it's somewhat incomplete without them:

expect_is(iris_tbl, "tbl_df")
expect_equal(nrow(iris_tbl[, 0]), nrow_iris)
iris_tbl <- as_tibble(iris)
nrow_iris <- nrow(iris_tbl)
expect_is(iris_tbl, "tbl_df")

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

Feels like this line should go to a separate test.

This comment has been minimized.

@echasnovski

echasnovski May 4, 2017

Contributor

Well I just imitated the structure from previous test_that. This expect_is ensures that we test exactly the [.tbl_df function in current test. I actually can remove it because such check is made in previous test but I think this way feels more bulletproof. Of course it is up to you to decide.

This comment has been minimized.

@krlmlr

krlmlr May 5, 2017

Member

This test isn't about testing the behavior of as_tibble(), so I'd rather keep only the relevant parts.

expect_equal(nrow(iris_tbl[, 0]), nrow_iris)
expect_equal(nrow(iris_tbl[, 0][1:10, ]), 10)
expect_equal(nrow(iris_tbl[1:10, ][, 0]), 10)

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

Do we also need to test [0]?

This comment has been minimized.

@echasnovski

echasnovski May 4, 2017

Contributor

No need for current issue but of course I can add expect_equal(nrow(iris_tbl[0]), nrow_iris) for more reliability.

This comment has been minimized.

@krlmlr

krlmlr May 5, 2017

Member

Accessing [0] seems to take a different code path, would you mind adding the tests (i.e., to also test [0] wherever you test [, 0]).

@krlmlr

Thanks. The code looks great, could you please adapt the tests so we can merge?

iris_tbl <- as_tibble(iris)
nrow_iris <- nrow(iris_tbl)
expect_is(iris_tbl, "tbl_df")

This comment has been minimized.

@krlmlr

krlmlr May 5, 2017

Member

This test isn't about testing the behavior of as_tibble(), so I'd rather keep only the relevant parts.

expect_equal(nrow(iris_tbl[, 0]), nrow_iris)
expect_equal(nrow(iris_tbl[, 0][1:10, ]), 10)
expect_equal(nrow(iris_tbl[1:10, ][, 0]), 10)

This comment has been minimized.

@krlmlr

krlmlr May 5, 2017

Member

Accessing [0] seems to take a different code path, would you mind adding the tests (i.e., to also test [0] wherever you test [, 0]).

@echasnovski

This comment has been minimized.

Contributor

echasnovski commented May 5, 2017

Updated test. How it looks now?

@krlmlr krlmlr merged commit 299839d into tidyverse:master May 5, 2017

1 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@krlmlr

This comment has been minimized.

Member

krlmlr commented May 5, 2017

Thanks!

krlmlr added a commit that referenced this pull request May 8, 2017

Merge tag 'v1.3.0.9002'
- New `rowid_to_column()` that adds a `rowid` column as first column and removes row names (#243, @barnettjacob).
- Use `rlang` functions (#244).
- The `all.equal.tbl_df()` method has been removed, calling `all.equal()` now forwards to `base::all.equal.data.frame()`. To compare tibbles ignoring row and column order, please use `dplyr::all_equal()` (#247).
- The `microbenchmark` package is now used conditionaly (#245).
- Subsetting zero columns no longer returns wrong number of rows (#241, @echasnovski).
- `tribble()` now handles values that have a class (#237, @NikNakk).

krlmlr added a commit that referenced this pull request May 17, 2017

Merge tag 'v1.3.1'
- Subsetting zero columns no longer returns wrong number of rows (#241, @echasnovski).

- New `set_tidy_names()` and `tidy_names()`, a simpler version of `repair_names()` which works unchanged for now (#217).
- New `rowid_to_column()` that adds a `rowid` column as first column and removes row names (#243, @barnettjacob).
- The `all.equal.tbl_df()` method has been removed, calling `all.equal()` now forwards to `base::all.equal.data.frame()`. To compare tibbles ignoring row and column order, please use `dplyr::all_equal()` (#247).

- Printing now uses `x` again instead of the Unicode multiplication sign, to avoid encoding issues (#216).
- String values are now quoted when printing if they contain non-printable characters or quotes (#253).
- The `print()`, `format()`, and `tbl_sum()` methods are now implemented for class `"tbl"` and not for `"tbl_df"`. This allows subclasses to use tibble's formatting facilities. The formatting of the header can be tweaked by implementing `tbl_sum()` for the subclass, which is expected to return a named character vector. The `print.tbl_df()` method is still implemented for compatibility with downstream packages, but only calls `NextMethod()`.
- Own printing routine, not relying on `print.data.frame()` anymore. Now providing `format.tbl_df()` and full support for Unicode characters in names and data, also for `glimpse()` (#235).

- Improve formatting of error messages (#223).
- Using `rlang` instead of `lazyeval` (#225, @lionel-), and `rlang` functions (#244).
- `tribble()` now handles values that have a class (#237, @NikNakk).
- Minor efficiency gains by replacing `any(is.na())` with `anyNA()` (#229, @csgillespie).
- The `microbenchmark` package is now used conditionally (#245).
- `pkgdown` website.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment