Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify hybrid evaluation #2190

Merged
merged 45 commits into from
Dec 9, 2016
Merged

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Oct 20, 2016

Fixes #912.

Fixes #1012 (but untestable because of hadley/lazyeval#78)

Fixes #1400.

Fixes #1452.

Fixes #1719.

Fixes #2018.

Fixes #2125.

Fixes #2160.

Fixes #2223.

Fixes #2244.

@krlmlr krlmlr changed the title Simplify hybrid evaluation, with tests WIP: Simplify hybrid evaluation, with tests Oct 20, 2016
@hadley
Copy link
Member

hadley commented Oct 21, 2016

For testing, that the hybrid evaluator is actually doing something it might be useful to implement #1691, and also define an in_hybrid() function that is defined as function() FALSE in R, and but returns TRUE in the C implementation.

@krlmlr
Copy link
Member Author

krlmlr commented Nov 11, 2016

Performance suffers, a faster Rcpp_eval() would help:

![1](https://cloud.githubusercontent.com/assets/1741643/20225370/47dc2812-a843-11e6-8f69-2e10374a7210.png) ![2](https://cloud.githubusercontent.com/assets/1741643/20225371/47dc443c-a843-11e6-89de-00f3e2021268.png) ![3](https://cloud.githubusercontent.com/assets/1741643/20225373/47f1f836-a843-11e6-90d1-9143f24e1842.png) ![4](https://cloud.githubusercontent.com/assets/1741643/20225372/47f10d0e-a843-11e6-8f1a-e3f975e0ba1e.png) ![5](https://cloud.githubusercontent.com/assets/1741643/20225374/47fc0628-a843-11e6-8f39-2f7d5538d493.png)

@codecov-io
Copy link

Current coverage is 64.32% (diff: 3.57%)

Merging #2190 into master will increase coverage by 0.38%

@@             master      #2190   diff @@
==========================================
  Files           194        193     -1   
  Lines          7967       7908    -59   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           5094       5087     -7   
+ Misses         2873       2821    -52   
  Partials          0          0          

Powered by Codecov. Last update 93619cd...cb92df4

@krlmlr krlmlr force-pushed the f-1719-hybrid-with-tests branch 2 times, most recently from d197be9 to c75de70 Compare November 12, 2016 14:58
@krlmlr
Copy link
Member Author

krlmlr commented Nov 12, 2016

@hadley: Much better when we create + populate the environment with the active bindings only once (ad4f698). In the tested scenarios run time was slower by < 50%. We still can look into a faster Rcpp_eval(), but that would be orthogonal to this PR. I'll do some more local testing before suggesting a merge.

![1](https://cloud.githubusercontent.com/assets/1741643/20238931/0d223c80-a8f6-11e6-8a2a-d22797b2b927.png) ![2](https://cloud.githubusercontent.com/assets/1741643/20238932/0d349d9e-a8f6-11e6-9213-0b3b4388afd8.png) ![3](https://cloud.githubusercontent.com/assets/1741643/20238933/0d415034-a8f6-11e6-8a2e-4a908a5fb585.png) ![4](https://cloud.githubusercontent.com/assets/1741643/20238934/0d41d11c-a8f6-11e6-88f9-7cb921cf45ca.png) ![5](https://cloud.githubusercontent.com/assets/1741643/20238935/0d4248a4-a8f6-11e6-9b0d-b41a010ea94b.png)

@hadley
Copy link
Member

hadley commented Nov 12, 2016

That doesn't look too bad at all.

(Also it looks like every big change is currently with both before and after commits. Probably on the the after label is necessary)

@krlmlr
Copy link
Member Author

krlmlr commented Nov 12, 2016

Thanks. I've thought about showing only "after", but I like having "before" as a way to double-check and resolve potential ambiguities. Please also note that for the level established at 2c095 in the "windowed" plot, no "before" entry is detected before 7d918.

to properly collect C++ coverage
@krlmlr
Copy link
Member Author

krlmlr commented Dec 8, 2016

@hadley: PTAL. I'll release bindrcpp to CRAN soon, but this shouldn't affect the review. Tests now show one warning, which should go with #2298.

NEWS entry:

  • Breaking change: The column() and global() functions have been removed. They were never documented officially.
  • Expressions in verbs are now interpreted correctly in many cases that failed before (e.g., use of $, case_when(), nonstandard evaluation, ...). This is possible, because these expressions are now evaluated in a specially constructed environment that retrieves column data on demand with the help of the bindrcpp package (Simplify hybrid evaluation #2190).

@krlmlr krlmlr requested a review from hadley December 8, 2016 13:41
@krlmlr krlmlr changed the title WIP: Simplify hybrid evaluation, with tests Simplify hybrid evaluation Dec 8, 2016
@hadley
Copy link
Member

hadley commented Dec 8, 2016

Could you do another benchmark plot please?

Should the news bullets also include mention of the new .env and .data pronouns? (And if you haven't implemented them yet, can you please do so?)

@krlmlr
Copy link
Member Author

krlmlr commented Dec 8, 2016

@hadley: Will do a benchmark plot. What should happen if the data frame contains columns named .env and/or .data?

@hadley
Copy link
Member

hadley commented Dec 8, 2016

@krlmlr I think our definition should win so if you wanted to access a column named .env you'd need to do .data$.env

@krlmlr
Copy link
Member Author

krlmlr commented Dec 9, 2016

@hadley: Done. Performance is very much comparable with master, see plots below. NEWS entries:

  • Breaking change: The new .data and .env environments can be used inside all verbs that operate on data: .data$column_name accesses the column column_name, whereas .env$var accesses the external variable var. Columns or external variables named .data or .env are shadowed, use .data$... and/or .env$... to access them.
  • Breaking change: The column() and global() functions have been removed. They were never documented officially. Use the new .data and .env environments instead.
  • Expressions in verbs are now interpreted correctly in many cases that failed before (e.g., use of $, case_when(), nonstandard evaluation, ...). These expressions are now evaluated in a specially constructed environment that retrieves column data on demand with the help of the bindrcpp package (Simplify hybrid evaluation #2190).
Updated plots

1

2

3

4

5

b <- 2L

testthat::expect_equal(
filter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting of this seems a bit off

@hadley
Copy link
Member

hadley commented Dec 9, 2016

LGTM

@krlmlr
Copy link
Member Author

krlmlr commented Dec 9, 2016

@hadley: Added minor reformatting tweaks and NEWS, will squash-merge after CI has run. Thanks for the fast reviews.

@krlmlr krlmlr merged commit eef12c8 into tidyverse:master Dec 9, 2016
@krlmlr krlmlr deleted the f-1719-hybrid-with-tests branch December 9, 2016 20:45
@krlmlr
Copy link
Member Author

krlmlr commented Dec 12, 2016

@lionel-: This is the core of the new hybrid evaluator: Two new environments, one with the variables, one with .data and .env. The GroupedHybridCall class currently does too many things, splitting in #2315.

@lock
Copy link

lock bot commented Jan 18, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants