-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
large client memory allocations from specific client resource #142
Comments
This is a known problem with the old client when used in the context of a proxy server. In fact, starting in SL6 it's a known problem for any kind of data server not only xrootd). If you are using this in proxy mode then add this to the config file: pss.setopt ReadCacheSize 0 That effectively turns off the per-file cache which can consume huge amount of memory if many files are being opened. For any kind of server (proxy or not) you will want to add the following (based on shell) to your sysinit script prior to starting the server: export MALLOC_ARENA_MAX=4 This limits storage fragmentation which can become a serious issue in a multi-threaded server. Alternatively, you can install tcmalloc (from Google) or jemalloc (from Facebook) that pretty much solves the brain-dead malloc included in SL6. |
While the problem may be sl6 malloc as that's the OS we have installed, the suggestions didn't help. We're currently not running a proxy as the access should not transit any firewalls and setting the MALLOC_ARENA_MAX didn't change the behavior. I haven't suggested tcmalloc or jemalloc to the admins yet. Any other ideas how to debug? I still think it's odd that access from one cluster with the same OS & same xrootd client binaries has this issue but access from the other cluster doesn't. thanks |
OK, I totally misunderstood your problem. Based on the latest information, there is no problem with the xrootd server, it's a problem with your application, right? If so, we need to tackle this in another way. Is it true that running the identical client job on one cluster encounters the problem but does not on the other cluster? It would seem that's what is being said. I assume you have logs from the client job. Could you post them indicating which one worked and which one failed. Based on what I think is going on here, I would say that the two environments are not really identical in some critical way.We just need to find out what the difference is. |
This happens when
|
Thanks - I think it is a combination of problems. The 'malloc' was indeed part of it. I preloaded jemalloc when running the client and the memory issue represented by: "Warning: set address range perms: large range" and growing utilization is gone. I can now read large number of files without a problem as long as they all come from the same data server. However, if I access files spread over a few servers, I then get either the bad_alloc error or, more often, Xrd: PhyConnection: Can't run reader thread: out of system resources. Critical error. This seems to point to a thread limit as the memory appears stable. I see the maxproc limit is different on the two systems: 256 for the problem system, Carver, and 1024 for the one that works, PDSF. When I tried monitoring that late yesterday, I only witnessed 5 threads being spawned. I'll double check that and check further with the admins to come up with more diagnostics. |
So it seems that the low resource limits on the interactive nodes were in the end causing the failures. I ran successfully via interactive batch which has much looser limits. Overall I found that using jemalloc cut the memory usage by 1/3 for a specific test job (400MB vs 130MB). So that was indeed a useful outcome. thanks to all |
Great. It would be useful to see how tcmalloc compares. Do you think you can run the same job using that? |
I tried building gperftools but it failed finding libunwind. I'll try adding that and see if I can build it. I too would like to find out too. thx |
We got gperftools installed & I re-ran the test with tcmalloc. The job reads 12 files stored on 12 different xrootd servers with a total size of 5.5GB. I also put those 12 files on a shared file system (GPFS). I used the LD_PRELOAD directive to switch between tcmalloc, jemalloc, and native malloc.
Both tcmalloc and jemalloc got rid of the large address allocation warnings from valgrind. |
Thanks for the test! Now if only standard Linux used one of the two better performing mallocs instead of the really brain-dead one they decided on. |
Thanks for the benchmarks @rjefferson! |
Hi,
We have two distinct compute clusters (PDSF & Carver) that we use to access an xrootd system that exists on one of those clusters (PDSF). The two clusters run the same OS and we link our software to the same client libraries. We have no problems accessing the service from the cluster where the service is located. It also functions from the second cluster (Carver) however the process allocates a huge amount of memory (seemingly related to file size and # of files read) that is not relinquished such that if we try to read multiple files, the jobs die with:
Xrd: PhyConnection: Can't run reader thread: out of system resources. Critical error.
(or sometimes crash with a 'bad alloc' error). I can recreate the symptom using xrdcp (our xrd version is v3.3.4). When I run with --debug #, I don't see any difference between the two systems. When I then run in valgrind, they are identical except valgrind issues several warnings from Carver of the type:
==22839== Warning: set address range perms: large range [0x39431000, 0x49432000) (defined)
with an address range nearly always 0x10001000 wide. The warning seems to be just a notice from valgrind that a large address range was allocated. No such warnings appear when run from PDSF.
At this point, the only difference I see between running from the two resources is the network topology, which I believe is largely IB connected. The admins are available to debug will some help for what to target.
thanks,
Jeff
The text was updated successfully, but these errors were encountered: