Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in XrdSecProtocolgsi::AuthzFunCheck() if AuthzFun fails #595

Closed
jthiltges opened this issue Oct 2, 2017 · 12 comments
Closed

Segfault in XrdSecProtocolgsi::AuthzFunCheck() if AuthzFun fails #595

jthiltges opened this issue Oct 2, 2017 · 12 comments

Comments

@jthiltges
Copy link
Contributor

After updating from xrootd 4.6 to 4.7.0, a site using GSI authentication with LCMAPS and GUMS reported xrootd segfaults:

Program terminated with signal 11, Segmentation fault.
#0  AuthzFunCheck (this=0x7f2fe4002560, cred=<value optimized out>, parms=0x7f30497ff588, ei=0x7f30497ff590) at /usr/src/debug/xrootd-4.7.0/src/XrdSecgsi/XrdSecProtocolgsi.cc:1558

Source line: https://github.com/xrootd/xrootd/blob/771dbc31b2/src/XrdSecgsi/XrdSecProtocolgsi.cc#L1558

Looking at the core file, the variable notafter (e->buf2.buf) was null, and accessing the null pointer triggered the segfault:

(gdb) info locals
expired = false
notafter = <error reading variable notafter (Cannot access memory at address 0x0)>
st_ref = 2
ts_ref = 1506456799
to_ref = 43200
st_exp = -1 

My guess as to what's happening in XrdSecProtocolgsi::Authenticate():
buf2 is free()d and then there happens to be a failure in the AuthzFun LCMAPS callout. Authenticate() breaks out of the block and buf2 is never reassigned. A later authorization call with the same DN will retrieve the same cache entry and trigger the segfault in AuthzFunCheck().

This guess is supported by log entries of "ERROR: the authorization plug-in reported a failure for this handshake" shortly before the segfaults.

I'm not sure of the best place to fix it. Maybe just "if (e && e->buf2.buf)" in AuthzFunCheck(), similar to the recent fix in QueryProxyCheck()?

@simonmichal
Copy link
Contributor

Thanks for reporting this issue.

Can you reproduce this issue reliably, and if yes, does reverting to 4.6.1 fixes the problem?
I'm asking because in 4.6.1 we had a very similar statement without any additional guards:

int notafter = *((int *) cent->buf2.buf);

@jthiltges
Copy link
Contributor Author

I've opened this issue on behalf of CMS site T2_US_Florida. After upgrading to 4.7.0, they reported frequent segfaults at the same offset (XrdSecProtocolgsi.cc#L1558). They downgraded back to 4.6.1 and the service runs normally.

@simonmichal
Copy link
Contributor

OK, thanks :-)
Just for the record I was able to reproduce it.

@simonmichal
Copy link
Contributor

simonmichal commented Oct 6, 2017

I think that this part of code from 4.6.1:

FreeEntity((XrdSecEntity *) cent->buf1.buf);
SafeDelete(cent->buf1.buf);
SafeDelete(cent->buf2.buf);
cent->status = kPFE_disabled; // Prevent use after unlock!
pfeRef.UnLock(); // Discarding cent!
cacheAuthzFun.Remove(dn);
cent = 0;

corresponds to this one from 4.7.0:
if (!rdlock) {
if (cent->buf1.buf)
FreeEntity((XrdSecEntity *) cent->buf1.buf);
SafeDelete(cent->buf1.buf);
SafeDelete(cent->buf2.buf);
}

In 4.7.0 the lines that remove the entry after deletion from cache disappeared (also the new XrdSutCache doesn't have the Remove method ;-).

gganis added a commit to gganis/xrootd that referenced this issue Oct 10, 2017
A check on the entry validity was misplaced.
Should fix issue xrootd#595 .
@gganis
Copy link
Member

gganis commented Oct 10, 2017

Hi,
I have just submitted a pull request addressing this issue.
@jthiltges would it be possible to test it?

@simonmichal
Copy link
Contributor

The PR fixes the problem in my test-bed.

@simonmichal
Copy link
Contributor

@jthiltges : any news?

@jthiltges
Copy link
Contributor Author

Marian is working toward an OSG build of xrootd that can hopefully be tested at Florida. Thank you both.

simonmichal added a commit that referenced this issue Oct 12, 2017
Fix for segfault in cache checking (issue #595)
simonmichal pushed a commit that referenced this issue Oct 12, 2017
A check on the entry validity was misplaced.
Should fix issue #595 .
@zvada
Copy link

zvada commented Oct 17, 2017

I'd like to engage Florida to test this where we originally discovered the issue. While RC1 should cover fix for this particular issue I'd prefer to move on to OSG build from RC2 coming this week (?) which should include even more (interesting) patches. That way I save one iteration asking Florida update to RC1 and right after that to RC2. Is that acceptable we wait for RC2 or are you insisting on the testing of RC1? By the time we know results of RC1 testing there will be RC2 out of the door I believe.

@simonmichal
Copy link
Contributor

If it's a hassle for you then don't bother with RC1, as I tested it in my test-bed I have a high degree of confidence in this patch ;-) I can wait few more days for confirmation :-)

ffurano pushed a commit to ffurano/xrootd-fbx4 that referenced this issue Oct 18, 2017
A check on the entry validity was misplaced.
Should fix issue xrootd#595 .
@simonmichal
Copy link
Contributor

Any news? (preferably good news ;-)

@simonmichal
Copy link
Contributor

As far as I understood the conversation in #606, we can close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants