-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XRootD 5.4.3 memory corruption for pgRead #1743
Comments
Hi Elvin, Most of the pgread traffic comes via xcache and no one reported crashes except when a read timeout occurs in a particular part of the code (that is being addressed as we speak0. From the trace that does not look like anything that has been reported so far. I did reply via a separate thread to Michal who brought this up first. The traceback looks fine the issue is we need to see who has decided to read more than 2 MB into a properly allocated 2 MB buffer. Can you provide access to a core file I can look at? |
Hi Andy, Let me have another look at this, since I might have rushed a bit - too much enthusiasm after the holidays. The problem might actually come from the EOS code. Put this aside for the moment until I confirm everything looks ok on the EOS side. Thanks, |
…t to avoid head-buffer-overflow corruptions Fixes xrootd#1743
Hi @abh3 , Could you please review the linked pull request? The problem is very simple to reproduce, at least inside EOS, by doing a simple pgRead requests with an offset which is not aligned. For example a pgRead with offset 1 and length 4MB triggers this memory corruption. After applying this patch all the tests in EOS pass. Thanks, |
…t to avoid heap-buffer-overflow corruptions Fixes xrootd#1743
Thank you for catching that! I will merge it as soon as all the checks complete. |
…t to avoid heap-buffer-overflow corruptions Fixes xrootd#1743
We are using a custom build version of XRootD 5.4.3 with 3 extra commits to address some bugs that were affecting some of the more demanding EOS instances at CERN. The 3 extra commits are the following:
4df4cda
624daad
50da3f0
Unfortunately, using this XRootD 5.4.3++ version we see crashes (SEGV) in "random" places of the code which don't make much sense. Therefore, we deployed an ASAN enabled version of EOS on some of the diskservers that were crashing and it detected a memory corruption when handling pgRead operations. These operations come most likely from new xrdcp commands that are probably the only ones that trigger the pgRead functionality on the server side.
Below you have a sample output of the ASAN report:
Has anyone experienced similar crashes with the latest XRootD 5.4.3? We assume this is not a side effect of any of the 3 extra commits that we are using.
Thanks,
Elvin
The text was updated successfully, but these errors were encountered: