Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XrdTpc] Do not modify curl handle after curl_easy_cleanup() #1449

Merged
merged 6 commits into from
May 8, 2021

Conversation

jthiltges
Copy link
Contributor

When ProcessPushReq() and ProcessPullReq() leave scope, TPC::State::~State() is called. The destructor modifies the curl handle, but the handle may already have been through curl_easy_cleanup(), and any further use may cause memory corruption.

Make the curl handle NULL after curl_easy_cleanup().

Valgrind output from xrootd-5.1.2-0.experimental.2534824.4ca38eb6.el7.cern.x86_64 and curl-7.29.0-59.el7_9.1.x86_64
-bash-4.2$ valgrind --undef-value-errors=no /usr/bin/xrootd -l /var/log/xrootd/xrootd.log -c /etc/xrootd/xrootd-clustered.cfg -k fifo -s /var/run/xrootd/xrootd-clustered.pid -n clustered
==20804== Memcheck, a memory error detector
==20804== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==20804== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==20804== Command: /usr/bin/xrootd -l /var/log/xrootd/xrootd.log -c /etc/xrootd/xrootd-clustered.cfg -k fifo -s /var/run/xrootd/xrootd-clustered.pid -n clustered
==20804==
==20804== Thread 52:
==20804== Invalid write of size 8
==20804==    at 0x127A5173: Curl_setopt (url.c:1098)
==20804==    by 0x127B373A: curl_easy_setopt (easy.c:384)
==20804==    by 0x1608349D: TPC::State::~State() (XrdTpcState.cc:22)
==20804==    by 0x16095C0D: TPC::TPCHandler::ProcessPullReq(std::string const&, XrdHttpExtReq&) (XrdTpcTPC.cc:777)
==20804==    by 0x1609688D: TPC::TPCHandler::ProcessReq(XrdHttpExtReq&) (XrdTpcTPC.cc:147)
==20804==    by 0xB2FFF82: XrdHttpReq::ProcessHTTPReq() (XrdHttpReq.cc:998)
==20804==    by 0xB2F7C7C: XrdHttpProtocol::Process(XrdLink*) (XrdHttpProtocol.cc:854)
==20804==    by 0x51D6AA5: XrdLinkXeq::DoIt() (XrdLinkXeq.cc:319)
==20804==    by 0x51D31D8: XrdLink::setProtocol(XrdProtocol*, bool, bool) (XrdLink.cc:435)
==20804==    by 0x51D9DA5: XrdScheduler::Run() (XrdScheduler.cc:382)
==20804==    by 0x51D9EF8: XrdStartWorking(void*) (XrdScheduler.cc:88)
==20804==    by 0x517F846: XrdSysThread_Xeq (XrdSysPthread.cc:86)
==20804==  Address 0x2334a0f0 is 864 bytes inside a block of size 19,536 free'd
==20804==    at 0x4C2B06D: free (vg_replace_malloc.c:540)
==20804==    by 0x127A40FC: Curl_close (url.c:471)
==20804==    by 0x1609131E: TPC::TPCHandler::RunCurlWithUpdates(void*, XrdHttpExtReq&, TPC::State&, TPC::TPCHandler::TPCLogRecord&) (XrdTpcTPC.cc:514)
==20804==    by 0x16095F40: TPC::TPCHandler::ProcessPullReq(std::string const&, XrdHttpExtReq&) (XrdTpcTPC.cc:784)
==20804==    by 0x1609688D: TPC::TPCHandler::ProcessReq(XrdHttpExtReq&) (XrdTpcTPC.cc:147)
==20804==    by 0xB2FFF82: XrdHttpReq::ProcessHTTPReq() (XrdHttpReq.cc:998)
==20804==    by 0xB2F7C7C: XrdHttpProtocol::Process(XrdLink*) (XrdHttpProtocol.cc:854)
==20804==    by 0x51D6AA5: XrdLinkXeq::DoIt() (XrdLinkXeq.cc:319)
==20804==    by 0x51D31D8: XrdLink::setProtocol(XrdProtocol*, bool, bool) (XrdLink.cc:435)
==20804==    by 0x51D9DA5: XrdScheduler::Run() (XrdScheduler.cc:382)
==20804==    by 0x51D9EF8: XrdStartWorking(void*) (XrdScheduler.cc:88)
==20804==    by 0x517F846: XrdSysThread_Xeq (XrdSysPthread.cc:86)
==20804==  Block was alloc'd at
==20804==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
==20804==    by 0x127A4369: Curl_open (url.c:604)
==20804==    by 0x127B3663: curl_easy_init (easy.c:358)
==20804==    by 0x160952C0: TPC::TPCHandler::ProcessPullReq(std::string const&, XrdHttpExtReq&) (XrdTpcTPC.cc:704)
==20804==    by 0x1609688D: TPC::TPCHandler::ProcessReq(XrdHttpExtReq&) (XrdTpcTPC.cc:147)
==20804==    by 0xB2FFF82: XrdHttpReq::ProcessHTTPReq() (XrdHttpReq.cc:998)
==20804==    by 0xB2F7C7C: XrdHttpProtocol::Process(XrdLink*) (XrdHttpProtocol.cc:854)
==20804==    by 0x51D6AA5: XrdLinkXeq::DoIt() (XrdLinkXeq.cc:319)
==20804==    by 0x51D31D8: XrdLink::setProtocol(XrdProtocol*, bool, bool) (XrdLink.cc:435)
==20804==    by 0x51D9DA5: XrdScheduler::Run() (XrdScheduler.cc:382)
==20804==    by 0x51D9EF8: XrdStartWorking(void*) (XrdScheduler.cc:88)
==20804==    by 0x517F846: XrdSysThread_Xeq (XrdSysPthread.cc:86)
==20804==

@jthiltges jthiltges marked this pull request as ready for review April 30, 2021 15:05
@jthiltges jthiltges marked this pull request as draft April 30, 2021 15:48
@jthiltges
Copy link
Contributor Author

Converted back to draft, as I suspect it's not a complete or correct fix:

ProcessPushReq() and ProcessPullReq() return RunCurl*() functions which call curl_easy_cleanup(). But TPC::State::~State() will then still be called after the curl_easy_cleanup()?

If modifying the curl handle in ~State() is really necessary, maybe better to store the curl handle in a unique_ptr and do the curl_easy_cleanup() there? @bbockelm

When `ProcessPushReq()` and `ProcessPullReq()` return,
`TPC::State::~State()` is called. The destructor modifies the curl handle,
but the handle may already have been through `curl_easy_cleanup()`, and any
further use may cause memory corruption.

Put the curl handle into a unique_ptr for cleanup at function exit.
…erSize() or ~MultiCurlHandler()

The curl handle comes from a State object, which is only "borrowing" a reference.
@abh3
Copy link
Member

abh3 commented May 4, 2021

This is marked as a "draft" do you have person in mind to review it?

@jthiltges
Copy link
Contributor Author

I think @bbockelm would be the most likely person.

With the most recent version of this patch, I've been unable to reproduce the crash with some limited TPC load testing at Nebraska, and also not finding any obvious memory leaks. I'm going to go ahead and mark it ready for review.

@jthiltges jthiltges marked this pull request as ready for review May 4, 2021 22:37
@bbockelm
Copy link
Contributor

bbockelm commented May 4, 2021

I get back from vacation Thursday morning, so I should be able to look at it this week.

@simonmichal
Copy link
Contributor

@jthiltges & @bbockelm : do you have an estimate on how long it will take to resolve this PR? we are currently in the release process of 5.2.0 and if this is a matter of days I am happy to wait for the patch, on the other hand if you think this might take weeks I'd rather go for 5.2.0 now and once you have the correct patch cut 5.2.1. Let me know your thoughts!

@bbockelm
Copy link
Contributor

bbockelm commented May 6, 2021

@jthiltges - I can't apply patches directly to your branch, could you do the following:

--- a/src/XrdTpc/XrdTpcMultistream.cc
+++ b/src/XrdTpc/XrdTpcMultistream.cc
@@ -59,10 +59,6 @@ public:
              it++) {
             curl_multi_remove_handle(m_handle, *it);
         }
-        for (std::vector<CURL *>::const_iterator it = m_avail_handles.begin();
-             it != m_avail_handles.end();
-             it++) {
-        }
         curl_multi_cleanup(m_handle);
     }
 
diff --git a/src/XrdTpc/XrdTpcTPC.cc b/src/XrdTpc/XrdTpcTPC.cc
index 4e7bb99..8c96df6 100644
--- a/src/XrdTpc/XrdTpcTPC.cc
+++ b/src/XrdTpc/XrdTpcTPC.cc
@@ -615,8 +615,7 @@ int TPCHandler::ProcessPushReq(const std::string & resource, XrdHttpExtReq &req)
     if (name) rec.name = name;
     logTransferEvent(LogMask::Info, rec, "PUSH_START", "Starting a push request");
 
-    auto curl_deleter = [](CURL *curl) { curl_easy_cleanup(curl); };
-    std::unique_ptr<CURL, decltype(curl_deleter)> curlPtr(curl_easy_init(), curl_deleter);
+    std::unique_ptr<CURL, decltype(&curl_easy_cleanup)> curlPtr(curl_easy_init(), &curl_easy_cleanup);
     CURL *curl = curlPtr.get();
     if (!curl) {
         char msg[] = "Failed to initialize internal transfer resources";
@@ -689,8 +688,7 @@ int TPCHandler::ProcessPullReq(const std::string &resource, XrdHttpExtReq &req)
     if (name) rec.name = name;
     logTransferEvent(LogMask::Info, rec, "PULL_START", "Starting a push request");
 
-    auto curl_deleter = [](CURL *curl) { curl_easy_cleanup(curl); };
-    std::unique_ptr<CURL, decltype(curl_deleter)> curlPtr(curl_easy_init(), curl_deleter);
+    std::unique_ptr<CURL, decltype(&curl_easy_cleanup)> curlPtr(curl_easy_init(), &curl_easy_cleanup);
     CURL *curl = curlPtr.get();
     if (!curl) {
             char msg[] = "Failed to initialize internal transfer resources";

Were you able to test multistreaming?

If the multistream case looks good, I think it's ready to mereg.

@abh3
Copy link
Member

abh3 commented May 6, 2021

@bbockelm Could you review this to make sure nothing else remains and if you are happy, I'll merge it. Though, it would be good if @jthiltges verifies this works with multi-streaming as that seems to bite us a lot.

@jthiltges
Copy link
Contributor Author

Thank you for the changes. I've committed them above.

I've focused on testing TPC with FTS transfers. If you can point me toward a way to test multistream, I'd be happy to work on it.

@jthiltges
Copy link
Contributor Author

Still not quite there with multistream. The State destructor modifies a curl handle after it's been free'd.

Valgrind output
==5729== Thread 24:
==5729== Invalid write of size 8
==5729==    at 0x127A5173: Curl_setopt (url.c:1098)
==5729==    by 0x127B373A: curl_easy_setopt (easy.c:384)
==5729==    by 0x1629095D: TPC::State::~State() (XrdTpcState.cc:22)
==5729==    by 0x1628FE77: TPC::TPCHandler::RunCurlWithStreams(XrdHttpExtReq&, TPC::State&, unsigned long, TPC::TPCHandler::TPCLogRecord&) (XrdTpcMultistream.cc:501)
==5729==    by 0x162A346E: TPC::TPCHandler::ProcessPullReq(std::string const&, XrdHttpExtReq&) (XrdTpcTPC.cc:772)
==5729==    by 0x162A3E7D: TPC::TPCHandler::ProcessReq(XrdHttpExtReq&) (XrdTpcTPC.cc:153)
==5729==    by 0xB2FFF82: XrdHttpReq::ProcessHTTPReq() (XrdHttpReq.cc:998)
==5729==    by 0xB2F7C7C: XrdHttpProtocol::Process(XrdLink*) (XrdHttpProtocol.cc:854)
==5729==    by 0x51D6B45: XrdLinkXeq::DoIt() (XrdLinkXeq.cc:319)
==5729==    by 0x51D3278: XrdLink::setProtocol(XrdProtocol*, bool, bool) (XrdLink.cc:435)
==5729==    by 0x51D9E45: XrdScheduler::Run() (XrdScheduler.cc:382)
==5729==    by 0x51D9F98: XrdStartWorking(void*) (XrdScheduler.cc:88)
==5729==  Address 0x99a4360 is 864 bytes inside a block of size 19,536 free'd

Free the state objects (calling the ~State() destructor, which modifies the curl handles) before the curl handles are free'd
@jthiltges
Copy link
Contributor Author

With the latest commit, valgrind looks OK.

Multistream testing looks OK as well. With 9x 10-stream transfers of a 4GB file, no errors were reported by FTS.

Copy link
Contributor

@bbockelm bbockelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bbockelm
Copy link
Contributor

bbockelm commented May 7, 2021

@jthiltges - did you observe any issues overnight? If not, I think this is probably ready to go.

@jthiltges
Copy link
Contributor Author

I didn't run this patched version overnight, but put it back onto a production server this morning. So far, it has run over three hours with no issues to note.

@abh3
Copy link
Member

abh3 commented May 7, 2021

@jthiltges OK, please give me a thumbs up at the end of the day and I'll merge it. Hopefully we will have fixed our CI problem by then.

@jthiltges
Copy link
Contributor Author

Everything looks OK on the production server. (I'll close and reopen the PR to trigger a CI rebuild.)

@jthiltges jthiltges closed this May 7, 2021
@jthiltges jthiltges reopened this May 7, 2021
@xrootd-dev
Copy link

xrootd-dev commented May 7, 2021 via email

@abh3 abh3 merged commit 70066fe into xrootd:master May 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants