Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xrdcp high CPU consuption on AlmaLinux9 #2162

Closed
joaopblopes opened this issue Jan 11, 2024 · 5 comments · Fixed by #2166
Closed

Xrdcp high CPU consuption on AlmaLinux9 #2162

joaopblopes opened this issue Jan 11, 2024 · 5 comments · Fixed by #2166
Assignees
Milestone

Comments

@joaopblopes
Copy link

An increase in CPU consumption was observed on FTS transfers using xrootd on AlmaLinux 9 machines.

After an initial investigation with perf, it seems that most of the CPU time is spent on the openssl EVP_PKEY_param_check call.

$ perf record -g --call-graph dwarf xrdcp root://eospps.cern.ch:1094//eos/opstest/dteam/batistal/file.1mb root://eospublic.cern.ch:1094//eos/opstest/dteam/batistal/file.1mb_`uuidgen`
$ perf report --call-graph --stdio
.
.
.
#
# Children      Self  Command  Shared Object          Symbol                                                         
# ........  ........  .......  .....................  ...............................................................
#
    99.34%     0.00%  xrdcp    libXrdSecgsi-5.so      [.] XrdSecProtocolgsi::getCredentials
            |
            ---XrdSecProtocolgsi::getCredentials
               |          
                --98.97%--XrdSecProtocolgsi::ParseClientInput
                          |          
                          |--93.35%--XrdSecProtocolgsi::ClientDoCert
                          |          |          
                          |           --93.13%--XrdCryptosslFactory::Cipher
                          |                     XrdCryptosslCipher::XrdCryptosslCipher
                          |                     |          
                          |                      --92.30%--EVP_PKEY_param_check
                          |                                evp_pkey_param_check_combined
                          |                                try_provided_check
                          |                                evp_keymgmt_validate
                          |                                dh_validate
                          |                                DH_check_ex
                          |                                DH_check
                          |                                BN_check_prime
                          |                                ossl_bn_check_prime
                          |                                bn_is_prime_int
                          |                                |          
                          |                                 --92.17%--ossl_bn_miller_rabin_is_prime
                          |                                           |          
                          |                                            --91.94%--BN_mod_exp_mont
                          |                                                      |          
                          |                                                       --90.59%--bn_mul_mont_fixed_top
                          |                                                                 |          

The xrootd and openssl versions installed in the machine are:

[root@fts-daq-005 ~]# rpm -qa | grep -E '^(openssl|xrootd)'
openssl-libs-3.0.7-24.el9.x86_64
openssl-3.0.7-24.el9.x86_64
xrootd-libs-5.6.3-3.el9.x86_64
xrootd-client-libs-5.6.3-3.el9.x86_64
openssl-debugsource-3.0.7-24.el9.x86_64

Do you have an idea of why this is happening?

Thanks a lot!

@abh3
Copy link
Member

abh3 commented Jan 11, 2024

This should also be an issue in Alma 8 where the problem was reported:
https://vulners.com/nessus/ALMA_LINUX_ALSA-2023-7877.NASL

However, the DH parameters should not be longer than 2K, so it's not clear why high CPU usage is triggered.

@joaopblopes
Copy link
Author

There seems to exist several performance issues with openssl 3.0

openssl/openssl#17627

Some of them seem to have been resolved in version 3.1

https://www.openssl.org/blog/blog/2023/03/07/OpenSSL3.1Release/

@smithdh
Copy link
Contributor

smithdh commented Jan 12, 2024

Hi. I've tried to also make some tests since Joao reported this issue.
I ran some "xrdcp" copies, from a local server to a local fileystem, for a non-existing file, using gsi. (Therefore the copy itself fails because the file doesn't exist at the server).

I was testing with a 2048bit voms proxy, connecting to a centos 7, v5.6.4 xrootd server. My client was on an alma9 machine. Timing the xrdcp over 5 attempts, and reporting the average time (quoting two figures, as a rough estimate): It takes 1.9s with the system openssl (3.0.7-24.el9). I rebuilt xrtood against openssl 1.02k (approximately the version that was on centos 7) and using that, the command had better, faster performance 0.15s (which is about the time I also get on an actual centos 7 machine).

Using openssl 3.2.0 (last test release I think) still gave the slower 1.9s timing, and appears to include the fix to avoid testing extremely long DH parameters (openssl/openssl@9e0094e).

Looking for other causes, I thought it may be a change in a prime number test that openssl uses (i.e., the iterations of a Miller-Rabin test). It's a probabilistic primality test, where raising the number if iterations reduces the chance of identifying a composite number as prime.

The prime test is used when checking the DH parameters the XrdCl client receives from the server. Our server now has a fixed set of DH paramerers, with a 3072bit prime. I saw that during the DH check the prime test is called two times, on this 3072bit and a 3071bit number (p/2). The number of iterations for these tests in openssl 1.02k was 2, and now it is 128.

Concerning the change of iterations: There seems to be discussion here openssl/openssl#9272, with the last comment on that ticket refering to this paper https://eprint.iacr.org/2020/065 which I think is the motivation for the change. But post of the ticket is discussing key generation, not specifically DH parameter checking. My (possibly incomplete or wrong!) take is that the lower number of iterations used in openssl 1.0.2k is based on an average case, applicable for testing numbers that a client might generate (e.g. when generating a key), whereas the larger "worst case" number is applicable for testing a number that might be "adversarially-selected". So I believe the increase is deliberate and motivated by security considerations.. However usually we know the client will usually be testing the same DH parameters (as now the server always offers the same one), and we know that one is safe. So one idea could be to check if we receive exactly this set of parameters and if so skip the check.

I returned to using the alma9 standard openssl, but skip the check in the xrootd code:

--- a/src/XrdCrypto/XrdCryptosslCipher.cc
+++ b/src/XrdCrypto/XrdCryptosslCipher.cc
@@ -179,7 +179,8 @@ static int XrdCheckDH (EVP_PKEY *pkey) {
    }
 #else
    EVP_PKEY_CTX *ckctx = EVP_PKEY_CTX_new(pkey, 0);
-   rc = EVP_PKEY_param_check(ckctx);
+//   rc = EVP_PKEY_param_check(ckctx);
+   rc = 1;
    EVP_PKEY_CTX_free(ckctx);
 #endif
    return rc;

(e.g. we could skip if we detect we've received our well known parameter set). Without the EVP_PKEY_param_check call the timing is 0.24s. Alternatively to completely disabling the check, there's a EVP_PKEY_param_check_quick() that checks some aspects of the parameters, but avoids the prime test(s), but I'm not sure if this would be safe to use instead of what we have now. I believe it's probably not safe, although the docs I looked at I wasn't sure about when one might use it. (e.g. https://www.openssl.org/docs/man3.0/man7/EVP_PKEY-DH.html )

Even in the case of skipping the EVP_PKEY_param_check() call it was still slower than openssl 1.02k by about 1.6 times. Most of this remaining slow down seems to be 6 other calls to the prime testing function; these are not connected to the DH parameters but are related to the proxy I was using. (I was using a 2048 bit proxy, and that means it was prime-testing 1024bit numbers, and for these the change in iterations between openssl 1.02k and 3 was different 6 to 64, I think). I don't know if we could do anything for these or not, without some impact on security.

@smithdh
Copy link
Contributor

smithdh commented Jan 15, 2024

I'm making a PR with ideas from above, so we can discuss some possible concrete changes; tomorrow or a little later this week.

@amadio
Copy link
Member

amadio commented Jan 22, 2024

Fixed by #2166.

@amadio amadio closed this as completed Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants