Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache Execution evaluations #1057

Merged
merged 2 commits into from
Sep 19, 2014
Merged

Cache Execution evaluations #1057

merged 2 commits into from
Sep 19, 2014

Conversation

johnynek
Copy link
Collaborator

closes #1055

@johnynek
Copy link
Collaborator Author

@egonina Want to take a look? I think this fixes the issue you noted today. This is aggressively caching everything, which is probably appropriate for map/reduce. But if someone gets the idea to set up a server running in an infinite recursive flatMap loop, this is going to OOM. I think we can cross that bridge if we ever get a bug report.

* error to add something to the cache twice clearly).
*/
def getOrElseInsert[T](ex: Execution[T],
res: => (EvalCache, Future[(T, ExecutionCounters, EvalCache)]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the res function parameter is doing? Is it the result of the execution, in which case, should it also be a Future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using the lazy parameter (: =>) in the same way a mutable map does with .getOrElseUpdate. So, if we have not already evaluated ex, put the code to evaluate it there. Now you can actually get the future into the cache before it is complete and so parallel operations (such as the zip) will not both recompute parents they need. That is why we return both the cache BEFORE the future is complete (BUT containing this execution) and after it is complete, as the flatMap may cause more Executions to be evaluated along the way.

If we only had the cache after the future, zips would be made into a sequence, and the case of a diamond (b and c depend on a and b.zip(c) is what you are evaluating), we would still serialize. Here, we will get a into the cache when we start return from b, and then c can look at the cache now, and see a, and only wait for a, not b.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would zips made into a sequence here because in case of b.zip(c) c considers b to be a parent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b.zip(c) should generally not have b being a parent of c, but it could be. Imagine: b.zip(b.map(fn)). we would first schedule the execution of b, then when scheduling c = b.map(fn) we would find b in the cache, and would reuse the future there.

@egonina
Copy link
Contributor

egonina commented Sep 18, 2014

Have you tested this?

@johnynek
Copy link
Collaborator Author

I have tested it, in that, the current kmeans test runs and exercises it some (and the test runs much faster). But I am working on adding some more comprehensive tests.

egonina added a commit that referenced this pull request Sep 19, 2014
@egonina egonina merged commit cafc7d9 into develop Sep 19, 2014
@egonina egonina deleted the cache-execution branch September 19, 2014 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Execution does not cache evaluations
2 participants