Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gsi el7 clients fail to connect to alma9 MGM #2014

Closed
adriansev opened this issue May 24, 2023 · 40 comments · Fixed by #2026
Closed

gsi el7 clients fail to connect to alma9 MGM #2014

adriansev opened this issue May 24, 2023 · 40 comments · Fixed by #2026
Assignees
Milestone

Comments

@adriansev
Copy link
Contributor

It seems that there is a client-side openssl problem when the connecting client is on Centos7 and the server is Alma9 (in my case an EOS Alma9 MGM).
The same connection with the same client (5.5.5 at this moment) from a Fedora38 works without problem.
The gsi debug output for both cases can be inspected here: https://asevcenc.web.cern.ch/asevcenc/gsi_dump/

Looking at the code it seems that the problem at this point:
https://github.com/xrootd/xrootd/blob/master/src/XrdSecgsi/XrdSecProtocolgsi.cc#L3320
(so everything else up to this point is ok)

but i'm not sure is the implementation of Cipher method is this one:
https://github.com/xrootd/xrootd/blob/master/src/XrdCrypto/XrdCryptosslCipher.cc#L241
and what could be the problem.

Let me know if i can enable tracing options and provide more logging.
Thanks a lot!

@amadio amadio self-assigned this May 24, 2023
@mike-leech
Copy link

We are seeing exactly the same problem. Client Centos7(xrootd5.5.4-1) server Alma9(xrootd5.5.5-1) .

update-crypto-policies --set DEFAULT:SHA1 on server and restarting has no effect.

xrdcp from Alma 9 client to Alma 9 server works fine.

[leech@pplxint11 ~]$ xrdcp -f -d1 xroot://pplxwn021//tmp/zap local_zap2
230525 11:37:45 255389 secgsi_ClientDoCert: could not instantiate session cipher using cipher public info from server
[2023-05-25 11:37:45.619578 +0100][Error ][XRootDTransport ] [pplxwn021:1094.0] Auth protocol handler for gsi refuses to give us more credentials Secgsi: ErrParseBuffer: could not instantiate session cipher : kXGS_cert
[2023-05-25 11:37:45.619725 +0100][Error ][AsyncSock ] [pplxwn021:1094.0] Socket error while handshaking: [FATAL] Auth failed
[2023-05-25 11:37:45.619845 +0100][Error ][PostMaster ] [pplxwn021:1094] elapsed = 1, pConnectionWindow = 120 seconds.
[2023-05-25 11:37:45.619895 +0100][Error ][PostMaster ] [pplxwn021:1094] Unable to recover: [FATAL] Auth failed.
[2023-05-25 11:37:45.619932 +0100][Error ][XRootD ] [pplxwn021:1094] Impossible to send message kXR_open (file: /tmp/zap, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ). Trying to recover.
[0B/0B][100%][==================================================][0B/s]
Run: [FATAL] Auth failed: Secgsi: ErrParseBuffer: could not instantiate session cipher : kXGS_cert (source)

I'm also willing to test, as this is stalling our upgrade of all systems to Alma 9.

Just a thought. Major upgrade to openssl3 on EL9. Lots of legacy stuff has been dropped.

Cheers.

@abh3
Copy link
Member

abh3 commented May 25, 2023 via email

@adriansev
Copy link
Contributor Author

adriansev commented May 26, 2023

@abh3 Alma and RH are the same distro technically and as i said i have

[root@seau2 ~]# update-crypto-policies --show
LEGACY:1024BITS:SHA1

so all possible lowest requirements

L.E. my bad, this i did not mentioned in this ticket, only in private chat

@amadio
Copy link
Member

amadio commented May 26, 2023

What about doing the reverse, that is, setting the policy to "FUTURE" on the client side?

@adriansev
Copy link
Contributor Author

centos7 does not have update-crypto-policies nor crypto-policies..

@mike-leech
Copy link

I've also tried enabling the openssl legacy provider directly on the server. Still no luck I'm afraid.

[root@p ~]# update-crypto-policies --show
LEGACY:SHA1
[root@p ~]# openssl list -providers
Providers:
default
name: OpenSSL Default Provider
version: 3.0.7
status: active
legacy
name: OpenSSL Legacy Provider
version: 3.0.7
status: active

@abh3
Copy link
Member

abh3 commented May 27, 2023 via email

@adriansev
Copy link
Contributor Author

Hi @abh3 so, Alma is a Redhat clone as Centos was for EL7 .. also, see their statement:
AlmaLinux OS is a 1:1 binary compatible clone of RHEL® guided and built by the community.
So, technically RedHat Enterprise Linux, Alma Linux and Rocky Linux are the same thing.
As for the crypto policies, see above where i shown what i have:

[root@seau2 ~]# update-crypto-policies --show
LEGACY:1024BITS:SHA1

@Clarky-Bear
Copy link

Hi, we're seeing the same issue at Durham. We have C7, R9 gateways running xroot-1:5.5.4 and receive cipher issues between C7 to R9 communication even with legacy ciphers turned on. We're finding R8 to be a good middle man between the two but it means we're running a third gateway to permit cross communication.

@bbockelm
Copy link
Contributor

Why are we looking at SHA-1 stuff?

The error message refers to the failure of a session cipher. In the case of a RHEL7, it is exchanged via 512 bit DH, no?

@adriansev
Copy link
Contributor Author

i just tried with a pmod with this content:

min_dh_size = 512
min_dsa_size = 1024
min_rsa_size = 1024

applied and rebooted and still i get (from a centos 7 client):

230531 23:21:54 965842 secgsi_ClientDoCert: could not instantiate session cipher using cipher public info from server
[FATAL] Auth failed: Secgsi: ErrParseBuffer: could not instantiate session cipher : kXGS_cert

@bbockelm
Copy link
Contributor

Hi Adrian,

For crypto policies, I believe the minimum Diffie Hellman is 2048 (unless FUTURE is selected; in that case, it's 3072). Can you try that size?

Thanks,

Brian

@adriansev
Copy link
Contributor Author

Hi @bbockelm erm, why would i increase something when the need is to lower everything down until the EL7 client works? the target is an Alma9 EOS MGM and the client is a Centos7 (and this is actually the problem, as a fedora 38 client works without problem)

@bbockelm
Copy link
Contributor

@adriansev - I'm not suggesting it's a solution, just want to test the theory that this is where the problem is. It would be very useful to understand it's indeed in the DH settings and not elsewhere in the code.

(Memory is very hazy of my last read of this code but I believe the DH key itself is later on truncated so there's only 512 bits of security even if the buffers fed to OpenSSL are 2048 bit.... i.e., there's no effective security here unless you're using xroots. So the setting of the DH size can be made to "whatever makes OpenSSL happy")

@adriansev
Copy link
Contributor Author

@bbockelm yeah, i did not get it but make sense. so i did as you suggested, rebooted the machine and the error is the same. for reference the overall current policy looks like this: https://asevcenc.web.cern.ch/asevcenc/eos_config_auger/new_pol

@abh3
Copy link
Member

abh3 commented May 31, 2023 via email

@adriansev
Copy link
Contributor Author

Hi @abh3 see the link posted above: https://asevcenc.web.cern.ch/asevcenc/eos_config_auger/new_pol
which is the full state of current crypto policy on the server.

Is built with SHA1 on top of LEGACY after which i added this pmod (well the dh is now 2048):

min_dh_size = 512
min_dsa_size = 1024
min_rsa_size = 1024

the end result the is policy dump reference above
EL7/Centos7 have no crypto policy AFAIK so i have no idea how to compare them.
(also, just to stress again: Alma9 IS RHEL9 software/binary wise)
Also, the target MGM have the ops and dteam VOs configured so anyone registered with these VOs can do the testing from various clients.

@abh3
Copy link
Member

abh3 commented May 31, 2023

Could you

export XrdSecDEBUG=1

and try connecting again and post the log output. It may tell us what the server really expects.

@xrootd-dev
Copy link

xrootd-dev commented May 31, 2023 via email

@abh3
Copy link
Member

abh3 commented May 31, 2023

Ah, could you post the OpenSSL versions of the centos7, fedora38, and Allma9 machines?

@adriansev
Copy link
Contributor Author

EL7: 1.0.2k
Alma9: 3.0.7
Fedora38: 3.0.8

@abh3
Copy link
Member

abh3 commented Jun 1, 2023 via email

@VipulDavda
Copy link

Just to add, it works when connecting from Rocky8 client (openssl v1.1.1k).

@abh3
Copy link
Member

abh3 commented Jun 1, 2023 via email

@xrootd-dev
Copy link

xrootd-dev commented Jun 1, 2023 via email

@VipulDavda
Copy link

BTW, to try to make it work, we even changed the openssl config on Alma9 - https://www.practicalnetworking.net/practical-tls/openssl-3-and-legacy-providers/

@VipulDavda
Copy link

Re: upgrade EL7 to use openssl 1.1.1

I'm new to xrootd but don't you have to compile xrootd to use openssl 1.1.1 on EL7?

ldd /usr/bin/xrdcp
...
libssl.so.10 => /lib64/libssl.so.10 (0x00007fb857cbb000)
..
...
rpm -qf /lib64/libssl.so.10
openssl-libs-1.0.2k-26.el7_9.x86_64

@adriansev
Copy link
Contributor Author

Thanks for that observation. So, it might be that 1.0.1 is simply not compatible with 3.0. OK, so can you upbgrade your EL7 machine to 1.1.1 and change nothing else and see if it works?

Actually i do have openssl11 installed from epel:

openssl11-1.1.1k-5.el7.x86_64
openssl11-static-1.1.1k-5.el7.x86_64
openssl11-devel-1.1.1k-5.el7.x86_64
openssl11-libs-1.1.1k-5.el7.x86_64

but the epel provided xrootd is build with system openssl not with epel openssl
so, all i need is an xrootd rpm build with epel openssl11
and subsequently the gfal should be rebuild upon this new xrootd and openssl11

@amadio is there a possibility of a test rpm for EL7 that requires and use openssl11?

If this works, then the next EL7 xrootd release should work but the gfal packager for epel should also be contacted to repackage gfal on the new dependencies.

@amadio
Copy link
Member

amadio commented Jun 1, 2023

Sure, I will create RPMs using OpenSSL 1.1 from epel for testing. I will add a comment here when they are ready.

@bbockelm
Copy link
Contributor

bbockelm commented Jun 6, 2023

I think I figured it out. The problem is this one:

openssl/openssl#9792

Basically, in 2019 OpenSSL overhauled it's DH parameter generation code which resulted in it generating new DH parameters sent by the server that older clients did not like. It appears the more lenient client-side check was kept but eventually the server-side change was reverted during 1.1.1 -- but based on some GDB footwork, it's back in 3.0.0.

Now, options:

  1. Backport the more lenient client side check and copy/paste it into the XRootD source code. All prior versions of clients are still broken but newer ones on RHEL7 would work. I assume this "break everything" is not an option.
  2. Select a fixed DH group, compatible with the old client and new, and hardcode it into the server side to always be used.

I think (2) is the more viable option; hardcoding a known good group is a fairly common solution (see https://wiki.openssl.org/index.php/Diffie-Hellman_parameters).

Unfortunately, XRootD's 512-bit DH is weak enough to not be considered secure by the 1990's; therefore, there's no standardized 512-bit DH group that we can easily reuse. Instead, I'd just suggest generating any old one by hand and hardcode that. Here's an example:

$ openssl dhparam 512 -5
Generating DH parameters, 512 bit long safe prime, generator 5
This is going to take a long time
......................................+..+...................+..+...+.............+......+................+.............+................................................................................+................................................................................+....+...........+...+.............+................+............................................................+...+.........+.............+.........+......+..............................................+....+..............+.................................................+..................+......+..............................+..+..+..........+.............+...........................+....+...+......+................+...+.+....+................+....................................+....+.+................................+............................................+..................+.............+............................................................+................++*++*++*++*++*++*
-----BEGIN DH PARAMETERS-----
MEYCQQDuCROhiIMH6R+BJGDf4OP5SlHM4pYjaODCuO02D8H9FwKopHU0T7XmOHZ7
eUxajA3EqUMqa5AY1+EzFV0JXpEfAgEF
-----END DH PARAMETERS-----

Loading that on the server side would replace the generation code:

https://github.com/xrootd/xrootd/blob/master/src/XrdCrypto/XrdCryptosslCipher.cc#L507-L518

For other sizes of DH parameters, one could simply do a lookup table. RFC 3526 covers examples up through 4096.

@adriansev
Copy link
Contributor Author

Fantastic news that you found the problem!!! but it seems to me that there will not be a fast resolution to the problem and i am pressed by the beneficiary of this EOS installation to be put in production as soon as possible, so i will have to reinstall it with Centos 7 and let the "big guys" handle this problem. Thanks a lot!!

@amadio
Copy link
Member

amadio commented Jun 6, 2023

I have put CentOS 7 RPMs linking against OpenSSL 1.1 from EPEL here:
http://xrootd.cern.ch/repo/centos/7/openssl11

The file with the repository configuration is here:
http://xrootd.cern.ch/xrootd-openssl11.repo

This repository is temporary, just for testing, it may be removed after this is tested.

@amadio
Copy link
Member

amadio commented Jun 6, 2023

@adriansev Could you please try to connect with the client above? For convenience (copy/paste into your terminal):

$ podman run --rm -it centos:7
$ yum install -y epel-release && yum update -y
$ curl -L http://xrootd.cern.ch/xrootd-openssl11.repo > /etc/yum.repos.d/xrootd-openssl11.repo
$ cat /etc/yum.repos.d/xrootd-openssl11.repo
$ yum update
$ yum install xrootd-* # check that it installs xrootd-*5.5.5-1.el7.openssl1.1
$ xrdcp --version
$ ldd /usr/lib64/libXrdCrypto.so | grep libssl # (links against libssl.so.1.1)

Cheers,

@bbockelm
Copy link
Contributor

bbockelm commented Jun 6, 2023

@amadio - could you test #2026? I did some initial testing on my side and it restores the compatibility between RHEL7 clients and RHEL9.

Unfortunately, I'm not sure a build against OpenSSL 1.1.1 is necessary anymore. Because the issue is traced to the client - and we can't change all existing clients in one fell swoop - the fix must be server side.

@amadio
Copy link
Member

amadio commented Jun 7, 2023

I've tested the client linked against OpenSSL 1.1 and it works against the unpatched server. I also tested a server with this patch on Alma 9 with a client on lxplus7 and it also works.

@abh3
Copy link
Member

abh3 commented Jun 8, 2023

OK, I merged this. However, could you rerun your test on the latest merge as @bbockelm made some last minute changes. I don't see how they will affect the test but you never know.

@adriansev
Copy link
Contributor Author

@amadio I apologize for the silence, returning from holiday was a little bit busy.
So, as i said, i had to reinstall the machine to Centos 7 because of the pressure to be put in production BUT the problem still persist!!!
i installed the xrootd provided by you and the problem is still present despite that now the server is a standard centos 7!!! the logs with the new xrootd to the new installation are here: https://asevcenc.web.cern.ch/asevcenc/eos_config_auger/from_cent7/
Moreover, even on fedora i get the same message!
Thanks a lot for help!!!

@amadio amadio added this to the 5.6 milestone Jun 8, 2023
@bbockelm
Copy link
Contributor

bbockelm commented Jun 8, 2023

Hi @adriansev -

What you post appears to be a different problem:

Auth protocol handler for gsi refuses to give us more credentials Secgsi: ErrParseBuffer: could not instantiate digest object: kXGS_cert

The "digest" object here is used to sign messages. Looking at the code, it defaults to sha1:md5 (probably both are not enabled in your client if it's Fedora/RHEL9-based). I believe the default digest can be set on the server side with:

sec.protocol gsi [...other opts...] -md:sha256

(ref: https://xrootd.slac.stanford.edu/doc/dev56/sec_config.htm)

If that doesn't solve it, let's file a separate ticket as it's a distinct issue.

Brian

@amadio
Copy link
Member

amadio commented Jun 8, 2023

I have tested the current master branch (i.e. after this has been merged) on the same Alma9 machine, and I can connect with a client on CentOS 7 to it without problems.

@adriansev
Copy link
Contributor Author

@bbockelm oh, yes, sorry for the noise, as soon as i put -md:sha256:sha1:md5 it worked
Thanks a lot!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants