Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prime value used for Diffie-Hellman handshake is too small #1556

Closed
alrossi opened this issue Nov 15, 2021 · 11 comments
Closed

Prime value used for Diffie-Hellman handshake is too small #1556

alrossi opened this issue Nov 15, 2021 · 11 comments

Comments

@alrossi
Copy link

alrossi commented Nov 15, 2021

Because of security fixes made to it, we (dCache) have decided to move to a more recent version of the Java Bouncy Castle security library (1.67). We are not back-porting this update (for the moment), but have only applied it to our master branch.

With it, we have experienced the following issue.

When our TPC client attempts to communicate with an xrootd server source endpoint, we get this error:

Remote endpoint sent: P = 250300537505961931441816061639185717883, G = 5, L = 0,
15 Nov 2021 09:51:01 (dcatest04-5) [] Cryptographic issues encountered during cert step: unsafe p value so small specific l required
15 Nov 2021 09:51:01 (dcatest04-5) [] Unable to complete gsi authentication to fndcatemp1.fnal.gov:1095, channel ef39d639, stream 1, session 02000000A67500001900000002000000: org.dcache.xrootd.core.XrootdException: Could not complete cert step: an error occurred during cryptographic operations..

The Bouncy Castle code has become less tolerant as to the prime number (P) value used for the DH exchange.

Recognizing a potential backward compatibility issue, however, the developers also added the possibility of overriding this check. If we set the Java security property:

org.bouncycastle.dh.allow_unsafe_p_value=true

on the pools, then we no longer get the error and the transfer proceeds normally.

We can of course live with the override of the safety check for the moment, but this should be understood as a workaround and not a permanent fix. The better thing to do is to have xrootd emit a longer/safer P value on the GSI handshake. dCache, for instance, has used (for quite a while now) a 512-bit OpenSSL generated 'safe' number. Here is the comparable log statement from our TPC client when a dCache door is the source:

Remote endpoint sent: P = 8810252053191919589359420599637641654529526533255167910139019997260323899182669360368264455955601543214780189083983437542290860708561316883421676261825939, G = 2, L = 0

Thanks, Al

alrossi added a commit to dCache/dcache that referenced this issue Nov 23, 2021
…ue issue

Motivation:

See "Prime value used for Diffie-Hellman handshake is too small"
xrootd/xrootd#1556

Modification:

Added the workaround property suggested by Bouncy Castle.
Added REVISIT note to remove this when the xrootd issue is resolved.

Result:

GSI authentication during third-party copy
with xrootd vanilla servers as source
does not fail.

Target: master
Requires-notes: no
Requires-book: no
Patch: https://rb.dcache.org/r/13288/
Acked-by: Paul
@simonmichal
Copy link
Contributor

It should be fixed in dd54932, which is available in 5.4.0-rc3, could you give it a try?

@alrossi
Copy link
Author

alrossi commented Dec 1, 2021 via email

@alrossi
Copy link
Author

alrossi commented Dec 1, 2021

[arossi@fndcatemp1 ~]$ xrdcp5x --tpc delegate only xroot://fndcatemp1.fnal.gov:1095//data/xrootdfs/data_100mb xroot://fndcatemp2.fnal.gov:1096//pnfs/fs/usr/test/arossi/volatile/test-`date | tr ' ' '.'`
[95.37MB/95.37MB][100%][==================================================][19.07MB/s]  

Looks good!

Thank you, Michal

@alrossi alrossi closed this as completed Dec 1, 2021
@simonmichal
Copy link
Contributor

Thank you for testing! :-)

@alrossi
Copy link
Author

alrossi commented Dec 2, 2021

One question: do you plan to back-port this fix? If you don't, then dCache 8.0+ will require xrootd 5.4+. Not sure that is optimal.

@alrossi alrossi reopened this Dec 2, 2021
alrossi added a commit to alrossi/dcache-server that referenced this issue Dec 2, 2021
…an p-value issue"

This reverts commit a59d2ab.

See GH xrootd/xrootd#1556 "Prime value used for Diffie-Hellman handshake is too small"

Bug fix provided by xrootd commit dd54932, which is available in 5.4.0-rc3
@abh3
Copy link
Member

abh3 commented Dec 13, 2021

This is a client issue and we don't any reason why people would upgrade client except for feature releases. It's an unfortunate state of affairs that people are still running 4.x client and will continue to do so for some time. So, the only solution is for dCache to accommodate old clients as they will be with us for some time.

@abh3
Copy link
Member

abh3 commented Jan 4, 2022

I believe this has been fixed in 5.4.

@abh3 abh3 closed this as completed Jan 4, 2022
@bbockelm
Copy link
Collaborator

@alrossi / @simonmichal --

@jthiltges reports (and I think he's right) that this is causing large-scale issues at Nebraska.

This appears to be due to the fact that XRootD is generating a new set of DH parameters each time (appears to do so after a cursory review of the code) for each GSI login, causing each GSI handshake to take between 0.5 - 4s of CPU time to establish (this is the time it takes to generate a set of 512 bit DH parameters.

Now,

  1. 512 bit DH parameters are insufficiently on their own.
  2. ~1 CPU second on the server per is wildly too high for each connection.

So, we're at the point where the cost of encryption is much too large to provide weak encryption.

The good news, I think, is that we can do what everyone else does: used fixed parameters (either generate once on startup or read from a file) and generate a new key each time (which is cheap). It seems the parameter generation is completely server driven via resizable buffers: shouldn't older clients be quite content with larger parameters?

John is working on a writeup of the issue.

@jthiltges
Copy link
Contributor

On the local redirector at Nebraska, we began seeing odd spikes in system load after the upgrade to v5.4.0. Load was quite high, reaching 1000-5000, with the system being mostly unresponsive.

As Brian said, the problem appeared related to SSL negotiation overhead, where the server generates new DH parameters for each connection.

Sample backtrace
Thread 108 (Thread 0x7f4bbb1fe700 (LWP 28534)):
#0  BN_mod_word (a=a@entry=0x7f4b0f911c00, w=w@entry=1699) at bn_word.c:102
#1  0x00007f4bf8327431 in probable_prime_dh_safe (ctx=0x7f4b6bc2eac0, rem=0x7f4b2db70a98, padd=0x7f4b2db70a80, bits=511, p=0x7f4b0f911c00) at bn_prime.c:502
#2  BN_generate_prime_ex (ret=0x7f4b0f911c00, bits=bits@entry=512, safe=safe@entry=1, add=add@entry=0x7f4b2db70a80, rem=rem@entry=0x7f4b2db70a98, cb=cb@entry=0x0) at bn_prime.c:186
#3  0x00007f4bf83635e9 in dh_builtin_genparams (ret=0x7f4bdedd1ba0, ret=0x7f4bdedd1ba0, cb=0x0, generator=5, prime_len=512) at dh_gen.c:194
#4  DH_generate_parameters_ex (ret=0x7f4bdedd1ba0, prime_len=512, generator=5, cb=0x0) at dh_gen.c:88
#5  0x00007f4bf2d5730b in XrdCryptosslCipher::XrdCryptosslCipher (this=0x7f4b75f2b060, padded=<optimized out>, bits=512, pub=0x0, lpub=<optimized out>, t=0x0) at /usr/src/debug/xrootd/xrootd/src/XrdCrypto/XrdCryptosslCipher.cc:485
#6  0x00007f4bf2d63c05 in XrdCryptosslFactory::Cipher (this=<optimized out>, b=1, p=0x0, l=0, t=0x0) at /usr/src/debug/xrootd/xrootd/src/XrdCrypto/XrdCryptosslFactory.cc:219
#7  0x00007f4bf2f7f11a in XrdSecProtocolgsi::ParseCrypto (this=this@entry=0x7f4b6bcae380, clist=...) at /usr/src/debug/xrootd/xrootd/src/XrdSecgsi/XrdSecProtocolgsi.cc:4974
#8  0x00007f4bf2f8a9f4 in XrdSecProtocolgsi::ServerDoCertreq (this=0x7f4b6bcae380, br=0x7f4b75eb5da0, bm=0x7f4bbb1fd168, cmsg=...) at /usr/src/debug/xrootd/xrootd/src/XrdSecgsi/XrdSecProtocolgsi.cc:3565
#9  0x00007f4bf2f8ae15 in XrdSecProtocolgsi::ParseServerInput (this=this@entry=0x7f4b6bcae380, br=br@entry=0x7f4b75eb5da0, bm=bm@entry=0x7f4bbb1fd168, cmsg=...) at /usr/src/debug/xrootd/xrootd/src/XrdSecgsi/XrdSecProtocolgsi.cc:3503
#10 0x00007f4bf2f8b1d8 in XrdSecProtocolgsi::Authenticate (this=0x7f4b6bcae380, cred=<optimized out>, parms=0x7f4bbb1fd3a8, ei=0x7f4bbb1fd3d0) at /usr/src/debug/xrootd/xrootd/src/XrdSecgsi/XrdSecProtocolgsi.cc:1816
#11 0x00007f4bf9ecce74 in XrdXrootdProtocol::do_Auth (this=this@entry=0x7f4bbf75c800) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdXeq.cc:201
#12 0x00007f4bf9ebf927 in XrdXrootdProtocol::Process2 (this=0x7f4bbf75c800) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdProtocol.cc:519
#13 0x00007f4bf9c03bb6 in XrdLinkXeq::DoIt (this=0x7f4bafe458c8) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdLinkXeq.cc:320
#14 0x00007f4bf9c00389 in XrdLink::setProtocol (this=0x7f4bafe458c8, pp=<optimized out>, runit=<optimized out>, push=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdLink.cc:435
#15 0x00007f4bf9c06c8a in XrdScheduler::Run (this=0x614e60 <XrdGlobal::Sched>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:406
#16 0x00007f4bf9c06da9 in XrdStartWorking (carg=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:89
#17 0x00007f4bf9b94e37 in XrdSysThread_Xeq (myargs=0x7f4b0c6130c0) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.cc:86
#18 0x00007f4bf8cfdea5 in start_thread (arg=0x7f4bbb1fe700) at pthread_create.c:307
#19 0x00007f4bf8a269fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

When the server gets busy, it gets bogged down with key generation. I assume clients may timeout, then retry, making the load worse.

As a temporary workaround for our site, I switched to loading pre-generated DH parameters from a file (using 2048-bit at present). After applying the patch to our local redirector, load went back to normal. jthiltges/xrootd@v5.4.0...dhparam

Regards,
John

@bockjoo
Copy link

bockjoo commented Jan 17, 2022

Florida also has 5.4.0 and I am seeing the load hike (~850) from time to time.
At one point, it led to
Jan 11 22:27:44 cmsio7 systemd[1]: xrootd-privileged@clustered.service: main process exited, code=killed, status=11/SEGV
Could the fix be released as soon as possible?
Thanks,
Bockjoo

@abh3
Copy link
Member

abh3 commented Jan 18, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants