Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory corruption when using curl in tpc #1531

Closed
abh3 opened this issue Oct 9, 2021 · 15 comments
Closed

Possible memory corruption when using curl in tpc #1531

abh3 opened this issue Oct 9, 2021 · 15 comments

Comments

@abh3
Copy link
Member

abh3 commented Oct 9, 2021

Lately we have gotten many reports of heap corruption. This appears to happen when there is a heavy load using http TPC which uses curl. There have been other people reporting curl memory corruption as well, especially when not resetting options upon handle reuse. See curl/curl#4143

The weird thing is that what this bug report talks about seems apparent in the tpc code where XrdTpcTPC.cc:382 is
//curl_easy_setopt(curl, CURLOPT_BUFFERSIZE, 128*1024);

However, not resetting the buffer size may lead curl to corrupt the heap. Why was this line commented out?

@bbockelm
Copy link
Contributor

Hi Andy!

Do you have any specific reports to share on this (logfiles, traceback, core files, etc).

Don't recall why that line was commented out.

B

@abh3
Copy link
Member Author

abh3 commented Oct 11, 2021 via email

@ccaffy
Copy link
Contributor

ccaffy commented Oct 11, 2021

Hello all,

I can take a look this week. I will try to get the commit that commented out this line and try to understand what happens :)

@abh3
Copy link
Member Author

abh3 commented Oct 11, 2021 via email

@ccaffy
Copy link
Contributor

ccaffy commented Oct 11, 2021

Hi Andy,

Yes, @simonmichal suggested exactly the same thing to me :)

My pleasure :)

Cheers,

Cedric

@bbockelm
Copy link
Contributor

Cedric - a few thoughts:

  1. I think Andy's onto something with the buffer allocation he mentioned. Worth chasing down in the curl code.
  2. valgrind can sometimes be too disruptive to operations, especially if the problem is race conditions (as the extra-slow program can sometimes lose the race).
  3. Another useful trick is setting MALLOC_CONF=junk:true and using jemalloc. That causes junk values to be written to memory allocations and can be a good way to detect reliance on uninitialized data.
  4. A third trick is compiling with -fsanitize=memory. That will activate the MemorySanitizer features (I think also available in GCC these days; definitely in clang) which is able to detect most of the errors valgrind does but at far less (though still significant) runtime cost. You will need to compile both libcurl and xrootd with this to make it useful, however.

@ccaffy
Copy link
Contributor

ccaffy commented Oct 11, 2021

Hi Brian,

Ok thanks for the hints ! I will do my best ;)

@abh3
Copy link
Member Author

abh3 commented Oct 14, 2021

Any progress on finding out what is happening?

@ccaffy
Copy link
Contributor

ccaffy commented Oct 14, 2021

Not yet, I have a data challenge this week so i am a little bit busy...

Also, I have troubles with compiling xrootd with ASAN:

Here is my cmake command I run to compile with asan:

cmake3 -DCMAKE_C_FLAGS="-fsanitize=address" -DCMAKE_CXX_FLAGS="-fsanitize=address" -DCMAKE_SHARED_LINKER_FLAGS="-fsanitize=address" -DCMAKE_MAKE_PROGRAM=/bin/make -DCMAKE_C_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/gcc -DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++ -DCMAKE_INSTALL_PREFIX=/opt/xrootd  -Wno-dev ..

After compilation, I get this error:

[root@xrootd-ccaffy-dev01 cmake-build-build-xrootd]# xrootd
==16886==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.

I also tried this:

cmake3 -DCMAKE_C_FLAGS="-fsanitize=address" -DCMAKE_CXX_FLAGS="-fsanitize=address" -DCMAKE_SHARED_LINKER_FLAGS="-fsanitize=address -static-libasan" -DCMAKE_MAKE_PROGRAM=/bin/make -DCMAKE_C_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/gcc -DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++ -DCMAKE_INSTALL_PREFIX=/opt/xrootd  -Wno-dev ..

But after compilation, I get now another error:

[root@xrootd-ccaffy-dev01 cmake-build-build-xrootd]# xrootd
xrootd: symbol lookup error: /opt/xrootd/lib64/libXrdUtils.so.3: undefined symbol: __asan_option_detect_stack_use_after_return

I tried to add the libasan.so's directory to the LD_LIBRARY_PATH variable but this does not help.

I surely do something wrong, but I do not get what :/

Do you have any idea on what is wrong ?

Thanks in advance :)

@ccaffy
Copy link
Contributor

ccaffy commented Oct 14, 2021

I am also trying to run a HTTP TPC transfer between two xrootd servers on my dev environment. But I don't succeed as well. Do I need to request a certificate to enable HTTPS and have a macaroon or X509 authentication to make this work ?
I followed the instructions here and here. But maybe it's not the way to go...
I tried to run the Python script in src/XrdTpc/xrootd-test-tpc

[root@xrootd-ccaffy-dev01 XrdTpc]# python xrootd-test-tpc -t ~/xrootd-run/token --dest-token ~/xrootd-run/token --src-token ~/xrootd-run/token --push  http://localhost:1094//tmp/test.txt http://localhost:1095//tmp/test_written.txt
400
{'connection': 'Close', 'content-length': '15'}
Request unknown

Thanks again :)

@abh3
Copy link
Member Author

abh3 commented Oct 14, 2021

Another thing has been brougght to light. The site that experiencing memory corruption is running curl 7.29.0-57. Looking at the curl changelog
2020-06-02 - Kamil Dudka kdudka@redhat.com - 7.29.0-59

  • http: free protocol-specific struct in setup_connection callback (#1836773)
    2020-03-23 - Kamil Dudka kdudka@redhat.com - 7.29.0-58
  • fix heap buffer overflow in function tftp_receive_packet() (CVE-2019-5482)
    it appears a memory corruption issue was fixed in 7.29.0-58. At least two sites running http-tpc that have never experienced memory corruption run 7.29.0-59. So this strongly suggests that the site should upgrade curl. We have requested it do so.

@bbockelm
Copy link
Contributor

bbockelm commented Nov 2, 2021

Hi - wanted to check up on things here. Did the sites having issues upgrade? Did the problems go away? Shall we close this out?

@xrootd-dev
Copy link

xrootd-dev commented Nov 2, 2021 via email

@ccaffy
Copy link
Contributor

ccaffy commented Nov 3, 2021

Hi,

I tried with curl 7.28.0, compiled with address sanitizer on my dev box. I could not see any memory corruption on my side, even with 10000 HTTP TPC pull transfers...
BUT, my local xrootd server crashes if I launch 500 parallel curl commands to trigger HTTP TPC pull transfers. The error I get is failed to mmap. I am not sure it is linked to the problem described in this ticket though.

If the problem of this ticket went away with the new curl version, we can probably close this ticket :)

@abh3
Copy link
Member Author

abh3 commented Nov 3, 2021

7.28.0 is a pretty old release (2012) as curl is now at 7.79.1. All we know is that memory corruptions was introduced somewhere after that. The current version does not have the problem and apparently neither do very old versions. As for the crash, it appears that curl simply ran out of memory. So, I am going to close this ticket.

@abh3 abh3 closed this as completed Nov 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants