-
-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't retain the head of a sequence when profiling #56
Comments
I'm not sure I understand why, but I did have success with the following approach, which uses code from (defn process
[items]
(let [id "process"
t0 (System/nanoTime)]
(try
(reduce (fn [count item] (inc count)) 0 events)
(finally
(let [t-elapsed (- (System/nanoTime) t0)]
(swap! p/*pdata* #(assoc % id (conj (% id []) t-elapsed)))))))) The main difference, I think, is that there is no inner anonymous function with this approach. I hope that this approach could serve as the basis for a |
Hi Jeff,
What makes you suspect that something's retaining a head? Just the OOM error? Depending on what you've got wrapped for profiling, keep in mind that profiling itself involves quite a bit of data collection and could contribute to (or by itself cause) an OOM if you're already operating near your memory limits and/or if the way things have been wrapped means that the data collection doesn't get a chance to terminate regularly. In particular, note that anything wrapped in a An alternative would be to gather cumulative stats on each wrapped Or (this occurs to me now), we could strike a balance between the two and gather the cumulative stats every 10k calls, say. I've never seen a case of this being an issue though, so we should confirm first: 1) That this is actually the cause of the OOM, and 2) That your So, yeah - try get me a reproducible case if you can and let's go from there. Cheers! :-) |
OK, here's a REPL example that demonstrates the problem:
The only difference between the two examples the profiling, so I think profiling is causing the issue. I thought about the vector of times. That could also cause problems, but I think it doesn't in my case. (I have about 5 functions that are each being called about 3 million times. That's a lot, but probably not too much for 3G of heap space.) I had thought about a streaming approach to computing the statistics, but I like your "chunked" approach better. I'd love to see this change, too, although it isn't as pressing for me at the moment. Thanks for your help, and for |
This is quite interesting! Have reduced it further: ((fn [coll] (reduce (fn [n x] (inc n)) 0 coll)) (range 1e8)) ; fine (~6s)
((fn [coll] ((fn [] (reduce (fn [n x] (inc n)) 0 coll)))) (range 1e8)) ; OOMs This is reproducible with (at least) Clojure 1.6.0-RC1 and 1.4.1. So it's indeed the additional fn wrapping that's causing the problem, though it's not obvious to me why that would be the case. UPDATE: In hindsight it's actually quite simple - the inner fn is realising the full range but the calling fn is still retaining a reference which the JVM can't release. Will push a fix shortly.
You're very welcome, thank you for saying so! Appreciate the bug reports :-) Cheers! |
Okay, just pushed |
Thanks very much—for the fix and the timeliness! This is a big help for me. I'll file an issue for the periodic stats calculations, to keep a bound on the size of the times arrays in |
No problem, am happy you caught this.
Hmm - I'm on the fence about that one. Brought it up since I suspected it may have been the cause of the error you were seeing. After ruling it out, I'm not convinced it'd be a beneficial change in practice. The memory overhead we're looking at is ~8 bytes per unterminated If the chunked stats change were completely free I'd say why not - but here it'd involve a reasonable amount of code modding and extra complication. Not sure if that tradeoff makes sense. You're welcome to open an issue (and offer a counter argument if you have one) - it's certainly worth thinking about! What are your thoughts? |
Sorry, I already filed the issue as #57. I'll reply to your comments there. |
I have a large sequence that cannot fit in memory. I also have a function that takes this sequence as an argument and
reduce
s it to a value. So, with names changed to protect the innocent, my code looks like:This version works fine because
reduce
consumes the sequence without retaining the head. (This claim is not obvious, but it is true, at least in my case.)The problem is, if I replace
defn
withdefnp
as defined bytaoensso.timbre.profiling
, I now have a function that retains the head of the sequence, with expanded code equivalent to the following:In this version, the
reduce
starts consuming the sequence, but the enclosingprocess
function retains the head. As a result, I get anOutOfMemoryError
instead of an answer.For more on head retention, see this thread and this SO question.
Regarding a solution, my best guess would be to change
pspy
so that it used Clojure metadata instead of wrapping code in a function. I'm not sure this would be compatible with the other features oftaoensso.timbre.profiling
, but my guess is that there's some way to make that work.The text was updated successfully, but these errors were encountered: