-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak / hoard with XRootD 5.0.2 (in checksumming code?) #1291
Comments
I'd use tcmalloc / gperftools and dump memory profile every time total grows by 1GB, HEAP_PROFILE_INUSE_INTERVAL=1073741824, then look at the difference. If it's so bad, it should be obvious ;) |
https://gperftools.github.io/gperftools/heapprofile.html You can LD_PRELOAD libtcmalloc.so. |
Thanks, that looks like something I can even try in production :-). Will do that tomorrow. |
I actually did a quick run just now, doing dumps every 100 MB, and diffing an early and a later one.
Indeed, I do not see where this buffer: xrootd/src/XrdCks/XrdCksManOss.cc Lines 141 to 145 in 9fc407e
is ever freed. But I wonder even more why this did never bite us before XRootD 5... Indeed, we sometimes see (when a lot of transfers happen) that the other end claims there has been a timeout during checksum from our end. Maybe this leads to repeated checksumming, enhancing the problem only now. |
I'm asking this for Andy, as I don't know this part of the code :) Where is this being called from? If you do graphics output, it draws a graph where boxes show different stack trace locations. |
I reproduced it and collected a callgraph from the difference of two heap traces): |
Yup, perfect ... thanks! |
@abh3 and @osschar Many thanks for the quick fix! Indeed the fix seems as trivial as expected, I'm just really astonished I've never hit that (or maybe just did not hit it so hard) before upgrading to 5.x (the code was there before...). |
Hi Oliver,
Thank you for hunting it down. Actually, the pre-existing problem was
triggered in R5. Prior to R5 that code path had to be explicitly
requested. In R5 we made that code path the default.
…On Mon, 28 Sep 2020, Oliver Freyermuth wrote:
@abh3 and @osschar Many thanks for the quick fix! Indeed the fix seems as trivial as expected, I'm just really astonished I've never hit that (or maybe just did not hit it so hard) before upgrading to 5.x (the code was there before...).
But let's not worry about the past, this plugs a significant leak (if checksumming is used) and should help a lot of users :-).
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#1291 (comment)
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|
Hi Andy, ah, thanks for the explanation! And I learned a bit of |
Since upgrading from 4.12.4 to 5.0.2, we observe huge memory usage for XRootD processes even after just a few hours of runtime. Sadly, not easily visible in our test setup, but only with a heavy rate of incoming requests as seen in production.
It seems to affect only the data transfer nodes, not the redirector. On the transfer nodes, I see RSS up to 27 GB after 4-6 hours of heavy transfers (thousands of connections). One less ugly example is:
I first wanted to ask if there's a "recommended debugging" way for these matters — of course I know my way around
valgrind
andgdb
, but attaching these to a productionxrootd
instance is not really a possibility. I will try to reproduce this in our test setup when a sufficient time slot pops up, but if there is a "best practice" or something like a dump function to dump current allocated memory segments and their use in XRootD please let me know.The text was updated successfully, but these errors were encountered: