-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
select(df, colname)
sometimes impersonates select_(df, .dots=colname)
#2904
Comments
This is just consistent semantics of evaluation, if a symbol is not found in the data frame, it is looked up in the context. However it might make sense to make an exception for |
(John Mount here) In my opinion the "Species" name should not be given two chances to match the frame. I get that if you match the frame you go to the frame, and then if not go to the context. But what is happening is if the name "Species" or the contents of "Species" match the frame the frame wins, and only then the system looks out to the context. Under this interpretation a user can not read a code fragment and know what it does without also knowing what value (if any) "Species" may be carrying. |
That's dplyr semantics since the beginning so we're not going to change them. Scoping has always been data first, context second. With tidyeval we offer several ways of being more explicit. That said, |
It's worth noting that model formulas from base R and even S work the same way. |
Formulas in base R do not have the double lookup in the frame property. I agree looking two places (frame then environment) is good. But trying the frame twice is not good (new or not). In a base R formula For
For |
No, this is only for verbs with selection semantics. For those verbs, the columns in the data environment represent column positions, not column values. Then we select based on those positions. This helps simplifying the implementation of selecting helpers. This also explains the behaviour we're seeing here. Again: we will consider making an exception for the scoping of symbols in selection verbs because the semantics are special. |
The semantics were reconsidered as part of tidyselect and will be incorporated into dplyr once we use tidyselect. |
select(df, colname)
should issue an error message whencolname
is not a column indf
. However, if there exists a (global)colname
variable which is a character vector, the columns indf
corresponding to the elements incolname
are instead returned. Basically,select(df, colname)
works likeselect_(df, .dots=colname)
iff there is no column namedcolname
indf
.The last command gives the error message
#> Error in overscope_eval_next(overscope, expr): object 'myvar2' not found
. The next to last line should have also given a similar error message, since there is nomyvar
colmn indf
. But instead, it returns a tibble with the columnsSepal.Width
andPetal.Length
. This is unexpected and dangerous behaviour.I observe this bug with dplyr 0.7.1 and the latest GitHub version (as of 2017-06-24).
The text was updated successfully, but these errors were encountered: