Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change initial columns selected based on first selector (fix #2275) #2289

Merged
merged 6 commits into from Jan 26, 2017

Conversation

@r2evans
Copy link
Contributor

@r2evans r2evans commented Dec 4, 2016

Fixes #2275.

Copy link
Member

@krlmlr krlmlr left a comment

Thanks. I'd like to be very sure that this PR doesn't introduce regressions.


# first-selector ----------------------------------------------------------

test_that("mid-selector fails don't clear previous selectors (issue #2264)", {
Copy link
Member

@krlmlr krlmlr Dec 5, 2016

Please double-check issue number.

Copy link
Contributor Author

@r2evans r2evans Dec 5, 2016

Oops. Duh.

@@ -59,3 +59,38 @@ test_that("one_of tolerates but warns for unknown variables", {
test_that("one_of converts names to positions", {
expect_equal(one_of("a", "z", vars = letters), c(1L, 26L))
})

# first-selector ----------------------------------------------------------
Copy link
Member

@krlmlr krlmlr Dec 5, 2016

Do we still have explicit tests for #1176? We might also need to test combination with inclusion and exclusion of single columns ( data %>% select(starts_with(...), column, -column2) ).

Copy link
Contributor Author

@r2evans r2evans Dec 5, 2016

Related to my comment #2275 (comment), I thought I'd go with exhaustive testing of all combinations of number and type of selectors. Currently it looks like this (rather verbose):

  cn <- setNames(colnames(mtcars), nm = colnames(mtcars))

  ### Single Column Selected
  # single columns (present), explicit
  expect_equal(select_vars(cn, mpg), cn["mpg"])
  expect_equal(select_vars(cn, -mpg), cn[ cn != "mpg" ])
  # single columns (present), matched
  expect_equal(select_vars(cn, contains("mpg")), cn["mpg"])
  expect_equal(select_vars(cn, -contains("mpg")), cn[ cn != "mpg" ])
  # single columns (not present), explicit
  expect_error(select_vars(cn, foo), "object 'foo' not found")
  expect_error(select_vars(cn, -foo), "object 'foo' not found")
  # single columns (not present), matched
  expect_named(res <- select_vars(cn, contains("foo")))
  expect_length(res, 0)
  expect_equal(select_vars(cn, -contains("foo")), cn)

  ### Multiple Columns Selected
  # explicit(present) + matched(present)
  expect_equal(select_vars(cn, mpg, contains("vs")), cn[c("mpg", "vs")])
  expect_equal(select_vars(cn, mpg, -contains("vs")), cn["mpg"])
  expect_equal(select_vars(cn, -mpg, contains("vs")), cn[ cn != "mpg" ])
  expect_equal(select_vars(cn, -mpg, -contains("vs")), cn[ ! cn %in% c("mpg", "vs") ])
  # explicit(present) + matched(not present)
  expect_equal(select_vars(cn, mpg, contains("foo")), cn["mpg"])
  expect_equal(select_vars(cn, mpg, -contains("foo")), cn["mpg"])
  expect_equal(select_vars(cn, -mpg, contains("foo")), cn[ cn != "mpg" ])
  expect_equal(select_vars(cn, -mpg, -contains("foo")), cn[ cn != "mpg" ])
  # matched(present) + explicit(present)
  expect_equal(select_vars(cn, contains("vs"), mpg), cn[c("vs", "mpg")])
  expect_equal(select_vars(cn, contains("vs"), -mpg), cn["vs"])
  expect_equal(select_vars(cn, -contains("vs"), mpg), cn[cn != "vs"])
  expect_equal(select_vars(cn, -contains("vs"), -mpg), cn[ ! cn %in% c("mpg", "vs") ])
  # matched(not present) + explicit(not present)
  expect_error(select_vars(cn, contains("foo"), bar), "object 'bar' not found")
  expect_error(select_vars(cn, contains("foo"), -bar), "object 'bar' not found")
  expect_error(select_vars(cn, -contains("foo"), bar), "object 'bar' not found")
  expect_error(select_vars(cn, -contains("foo"), -bar), "object 'bar' not found")
  # matched(present) + matched(present)
  expect_equal(select_vars(cn, contains("vs"), contains("mpg")), cn[c("vs", "mpg")])
  expect_equal(select_vars(cn, contains("vs"), -contains("mpg")), cn["vs"])
  expect_equal(select_vars(cn, -contains("vs"), contains("mpg")), cn[cn != "vs"])
  expect_equal(select_vars(cn, -contains("vs"), -contains("mpg")), cn[! cn %in% c("vs", "mpg")])
  # matched(present) + matched(not present)
  expect_equal(select_vars(cn, contains("vs"), contains("foo")), cn["vs"])
  expect_equal(select_vars(cn, contains("vs"), -contains("foo")), cn["vs"])
  expect_equal(select_vars(cn, -contains("vs"), contains("foo")), cn[cn != "vs"])
  expect_equal(select_vars(cn, -contains("vs"), -contains("foo")), cn[cn != "vs"])
  # matched(not present) + matched(present)
  expect_equal(select_vars(cn, contains("foo"), contains("mpg")), cn["mpg"])
  expect_named(res <- select_vars(cn, contains("foo"), -contains("mpg")))
  expect_length(res, 0)
  expect_equal(select_vars(cn, -contains("foo"), contains("mpg")), cn)
  expect_equal(select_vars(cn, -contains("foo"), -contains("mpg")), cn[cn != "mpg"])
  # matched(not present) + matched(not present)
  expect_named(res <- select_vars(cn, contains("foo"), contains("bar")))
  expect_length(res, 0)
  expect_named(res <- select_vars(cn, contains("foo"), -contains("bar")))
  expect_length(res, 0)
  expect_equal(select_vars(cn, -contains("foo"), contains("bar")), cn)
  expect_equal(select_vars(cn, -contains("foo"), -contains("bar")), cn)

This is rather busy/loud, but it definitely encapsulates all of the problems in both #1176 and #2275. Though some of them are relatively identical to some in test-select.r, I suggest keeping these here and together helps contrast the different expectations.

Is this too much? I can remove the "obvious" ones if need be, though there are varying levels of "obvious".

I also have a three-selector to ensure the first (match) isn't removed by the second (no match).

  expect_equal(
    select_vars(colnames(mtcars), contains("am"), contains("FOO"), contains("vs")),
    c(am = "am", vs = "vs")
  )

Copy link
Member

@krlmlr krlmlr Dec 5, 2016

I think this is a good start, but should be split in at least two test_that() calls.

@hadley: Should select_vars(cn, contains("foo")) throw an error if foo doesn't exist?

Copy link
Member

@hadley hadley Dec 5, 2016

I can't remember. I think we decided it shouldn't.

Copy link
Contributor Author

@r2evans r2evans Dec 5, 2016

My opinion: select(mtcars, matches("foo")) is different from select_(mtcars, .dots = "foo"), so I think an error in the latter would not make sense. It makes sense to me that I can rely on them to return zero or more columns without erring.

ind_list <- lazyeval::lazy_eval(args, names_list)
names(ind_list) <- names2(args)
# if the first selector is exclusive (negative), start with all columns
initial_case <- if (as.character(args[[1]]$expr)[1] == "-") list(seq_along(vars)) else integer(0)
Copy link
Member

@krlmlr krlmlr Dec 5, 2016

Could you test identical(args[[1]]$expr, quote(`-`)) ?

Copy link
Contributor Author

@r2evans r2evans Dec 5, 2016

debug(select_vars)
select_vars(cn, -mpg)
# ... step into select_vars_, past line 66 where args is assigned
# Browse[3]> 
args[[1]]
# <lazy>
#   expr: -mpg
#   env:  <environment: R_GlobalEnv>
as.character(args[[1]]$expr)[1]
# [1] "-"
as.character(args[[1]]$expr)[1] == "-"
# [1] TRUE
identical(args[[1]]$expr, quote(`-`))
# [1] FALSE
identical(as.character(args[[1]]$expr)[1], quote(`-`))
# [1] FALSE
identical(as.character(args[[1]]$expr)[1], "-")
# [1] TRUE
microbenchmark(
a = as.character(args[[1]]$expr)[1] == "-",
b = identical(as.character(args[[1]]$expr)[1], "-")
)
# Unit: microseconds
#  expr   min    lq   mean median    uq    max neval
#     a 2.553 2.918 3.6000  3.282 3.283 22.246   100
#     b 3.282 3.648 4.4606  4.012 4.376 45.585   100

Copy link
Member

@krlmlr krlmlr Dec 5, 2016

You're not testing the expression I gave. Besides, I think it's cleaner to test if the first result set is a positive or negative match, and act accordingly.

Copy link
Contributor Author

@r2evans r2evans Dec 5, 2016

Not sure what you mean. I explicitly tested your expression and it returned false when it should have been true. (I then compared a different command for timing, but that's extraneous and probably should not have been there. I was intending to compare yours with the others, but when I saw it was false I removed it.)

By result set, do you mean the actual return value from contains() (et al)? I don't think checking that for positive/negative will work (-integer(0) ?).

(BTW: how did you get a single backtick within an inline code block? I can't get it to render.

Copy link
Member

@hadley hadley Dec 5, 2016

I think it would be identical(args[[1]]$expr[[1]], quote(`-`))

(You surround with `` if you want to include a `)

Copy link
Member

@hadley hadley Dec 5, 2016

Yeah, that's why we can't just look at the return value.

Copy link
Member

@hadley hadley Dec 5, 2016

It's probably that's what I was missing in the previous incarnation of this code

Copy link
Contributor Author

@r2evans r2evans Dec 5, 2016

@hadley, I wasn't suggesting checking against -integer(0) within args. I was suggesting that checking the return value from a helper function would not be fruitful. I believe by the time select_vars_ can look at the return values from each of the args's calls, they have already lost the ability to check against quote(`-`). (The values are already realized by that time, right? Is ind_list initially full of promises not-yet-realized?)

I guess select(mtcars, -integer(0)) could work (and does with this code). Though it seems a bit obscure (when select(mtcars, everything()) is much clearer).

Copy link
Contributor Author

@r2evans r2evans Dec 5, 2016

Unfortunately @hadley, I can't get your call working:

select_vars(cn, -mpg, mpg)
### stepping into select_vars_ and once args is assigned (line 66)
# Browse[3]> 
args
# [[1]]
# <lazy>
#   expr: -mpg
#   env:  <environment: R_GlobalEnv>
# [[2]]
# <lazy>
#   expr: mpg
#   env:  <environment: R_GlobalEnv>
# attr(,"class")
# [1] "lazy_dots"
# Browse[3]> 
identical(args[[1]]$expr[[1]], quote(`-`))
# [1] TRUE
# Browse[3]> 
identical(args[[2]]$expr[[1]], quote(`-`))
# Error in args[[2]]$expr[[1]] (from select-vars.R#70) : object of type 'symbol' is not subsettable
# Browse[3]> 
args[[2]]$expr
# mpg

Is there a an alternate way to check negativity?

Copy link
Member

@hadley hadley Dec 5, 2016

I'd recommend something like:

is_negated <- function(x) {
  if (!is.call(x)) return(FALSE)
  if (length(x) != 2) return(FALSE)

  identical(x[[1]], quote(`-`))
)

is_negated(args[[1]]$expr)

If you want to test interactively, use quote to generate the language objects:

is_negated(quote(-1))
is_negated(quote(-a))
is_negated(quote(a))
is_negated(quote(a - b))

@r2evans r2evans changed the title change initial columns selected based on first selector (fix #2264) change initial columns selected based on first selector (fix #2275) Dec 5, 2016
Bill Evans added 2 commits Dec 5, 2016
- single vs multiple selectors
- first selector positive versus negative
- explicit versus matching
- present versus not-present
@r2evans
Copy link
Contributor Author

@r2evans r2evans commented Dec 8, 2016

@krlmlr, do you see any further changes that need to be implemented? (I still see the "changes requested" tag, implying I did not resolve your suggestions. I think the testing of #1176 is covered by the updates.)

Copy link
Member

@krlmlr krlmlr left a comment

Thanks, just a few minor nits.

expect_error(select_vars(cn, foo), "object 'foo' not found")
expect_error(select_vars(cn, -foo), "object 'foo' not found")
# single columns (not present), matched
expect_named(res <- select_vars(cn, contains("foo")))
Copy link
Member

@krlmlr krlmlr Dec 8, 2016

Could you please extract the assignment out of the function call? I saw other instances below.

Copy link
Contributor Author

@r2evans r2evans Dec 8, 2016

I can do both of these (setNames and res), but neither is consistent with the current coding style. All references (in dplyr) to setNames explicitly sets order. I can find no use of expect_* here where the assignment is outside of the code.

@hadley @krlmlr These changes are certainly easy to do, but they are inconsistent with the rest of the package. Are you changing styles?

Copy link
Member

@krlmlr krlmlr Dec 8, 2016

I'm not sure about @hadley's preference for setNames(), but the only instances where assignment <- is used inside an expect_() function is for expect_warning(), expect_message() and such (where there seems to be no other way to retain the value and check for warning/message).

grep -r "expect_.*[<]-" tests

Copy link
Contributor Author

@r2evans r2evans Dec 8, 2016

I see your point. My gut feeling has been that res <- expect_* implies a return value from expect which is not the case, ergo using res <- inside was more explicit. Patch coming.

I don't mean to be argumentative, honestly. I know that code style needs to be consistent with the target repo and authors, so I am trying to avoid nitnoid changes that diverge from this effort.

# first-selector ----------------------------------------------------------

test_that("initial (single) selector defaults correctly (issue #2275)", {
cn <- setNames(colnames(mtcars), nm = colnames(mtcars))
Copy link
Member

@krlmlr krlmlr Dec 8, 2016

setNames(nm = ...) would be shorter.

expect_error(select_vars(cn, foo), "object 'foo' not found")
expect_error(select_vars(cn, -foo), "object 'foo' not found")
# single columns (not present), matched
res <- expect_named(select_vars(cn, contains("foo")))
Copy link
Member

@krlmlr krlmlr Dec 8, 2016

Actually, I thought about:

res <- select_vars(...)
expect_named(res)
expect_length(res, 0)

Sorry about the confusion.

krlmlr
krlmlr approved these changes Dec 8, 2016
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Dec 8, 2016

@hadley: PTAL.

})

test_that("initial (of multiple) selectors default correctly (issue #2275)", {
cn <- setNames(nm = colnames(mtcars))
Copy link
Member

@hadley hadley Dec 8, 2016

I think this test could be simplified a bit by using cv <- c(x = "x", y = "y", z = "z"). Then could you replace then computation of the expected values with inline values.

Otherwise, looks great, and thanks for all your work on this. We are trying to move towards a more consistent dplyr style, and @krlmlr are still figuring out how collaboratively handle PRs most effectively, so we appreciate you bearing with us!

@krlmlr krlmlr merged commit f9ef30a into tidyverse:master Jan 26, 2017
1 of 2 checks passed
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Jan 26, 2017

Thanks!

@lock
Copy link

@lock lock bot commented Jan 18, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants