-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AdaptiveCache #419
AdaptiveCache #419
Conversation
extends StatefulSummer[Map[K, V]] { | ||
|
||
require(maxCapacity >= 0, "Cannot have negative capacity") | ||
var currentCapacity = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make the member's private? Especially the vars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, yep, fixed
@johnynek added a test and I think addressed your other issues |
LGTM. |
Thanks. I'm running some stress tests with it and seeing some OOMs that I'm trying to figure out, so don't merge yet; will update. |
@johnynek @reconditesea ok, it turned out that using a SoftReference for every key in the map created too much memory pressure and GC overhead dealing with the refs themselves. I've much simplified it to use a single SoftReference for the entire HashMap, and am just careful to never hold a strong reference to the map while doing anything that's likely to allocate much. It now performs well in stress tests. |
Anything else you'd like from this pre-merge? |
def isFlushed = summingCache.isFlushed | ||
|
||
private var maxReportedSentinelSize = 0 | ||
case class Stats(hits: Int, cacheGrowth: Int, sentinelGrowth: Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit pick: call the case class CacheStats? Nothing major, just it might be confused with the Scalding Stat and these two are likely to be used together when recording these metrics to Hadoop counters.
@reconditesea re CacheStats, no problem, done. |
Seems there is a conflicts so cannot be automatically merged. |
Looks good to me. We can merge this when it is clean (not sure why this is not clean now, I don't know what has conflicted). |
@avibryant bump fyi, just blocked on non-clean merge |
6a7f7c2
to
a6f1bdf
Compare
Ping! This should merge cleanly now. |
Merged, thanks |
This is an attempt to fix the problem of sometimes having to lower the capacity used by
SummingCache
when experiencing OOM errors duringMapsideReduce
in Scalding.The general idea is that along with a
SummingCache
, we keep a second, "sentinel" cache that sums the evicted items, but holds onto these values with soft references. If we're ever experiencing memory pressure, the GC will clear these references and notify us. But if the sentinel cache has grown to size X without that happening, that's good evidence that we should be ok to grow our main SummingCache by something close to X (since, had we used that larger capacity to begin with, we empirically would have had enough memory to cache all of the values that we evicted instead).