-
Notifications
You must be signed in to change notification settings - Fork 577
Add Support for openssl v3.5 as a TLS Backend #4959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@microsoft-github-policy-service agree [company="OpenSSL"] |
@microsoft-github-policy-service agree company="OpenSSL" |
1e7e38e
to
69d5261
Compare
I do think that renaming openssl -> quictls is a good change. I have some concern with the fact that it would cause a use to go from building with quitls to building with openssl without any action on their end or warning.
|
@guhetier I think such a change would be fine, though I'd be curious to see what others thought about the version being added to the name - i.e. is doing this committing to adding an openssl4/5/6 in the future? Given that msquic builds against whatever openssl variant is defined in the associated submodule, I'm not sure thats needed, though if you want to support multiple openssl versions, it might be beneficial. |
@nhorman All these are great questions, we need to clarify how we see openssl upgrades happen in the future. A different name pattern might also be an option - my concern is mainly about avoid the re-use of the same parameter with a different meaning. Lets gather some opinions about how to handle this best. |
@guhetier I understand what you're getting at. I don't have a whole lot of skin in that game, but I'll gladly implement what you and the rest of the dev team think is best. Just to float another idea for you to consider (and I grant this is additional departure from your current build approach, and so may not be viable), but you might consider:
i.e. check in cmake to see if SSL_provide_quic_data exists, if it does, then build the quictls platform adaptation layer, if SSL_set_quic_tls_cbs exists, then build the openssl platform adaptation layer. That would disconnect you from needing to select a specific implementation manually at configure time. In the interim, I'll start looking at these CI failures. Looks like I did something wrong in the openssl/quictls migration for the workflow files |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4959 +/- ##
==========================================
- Coverage 87.38% 86.81% -0.57%
==========================================
Files 59 59
Lines 18037 18045 +8
==========================================
- Hits 15761 15666 -95
- Misses 2276 2379 +103 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
16b10e1
to
eafd672
Compare
ok, comments and CamelCase/snake_case issues fixed. Will start looking at CI failures next |
af45354
to
3cbded3
Compare
FYI, I have most of the cargo CI jobs building currently, but the windows static builds are failing, seemingly due to last nights windows-latest github runner update, which migrated from cmake 3 to cmake 4 which is having various build problems, that may be causing you trouble in other pull requests. We hit some similar problems in openssl this morning. |
Thanks for fixing that on top of everything else! Sorry for the delay in reviewing the PR, I'll get to it as soon as I can. |
don't rush, I know you all are busy. Feel free to cherry-pick any of the cmake/rust fixes to another PR if you want to prioritize those. |
src/core/crypto.c
Outdated
// Note: Nominally, if we have the write key, we should have the read key | ||
// but there might be a delay with openssl as it yields read and write | ||
// secrets independently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be abstracted from the core QUIC layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had initially attempted to do so by delaying the installation of new keys until both read and write were available, but doing so resulted in other failures when the quic stack expected keys be available on return from ProcessData. I can play with it some more, but if you have additional input here as to how best to do this, I would appreciate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why openssl would give a read key without a write key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nhorman any comments on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will eventually give both of course, but there is a difference in how quictls provides keys vs how the openssl api does. in quictls the yield secret callback derives both keys, and provides them in a single callback, so they are always updated synchronously, whereas in openssl we use the state machine call path which derives the keys independenly and provides the in two separate callbacks, so there may be a period of time when the read key is updated but the write key is not, or vice versa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm experimenting with that now (and have in the past), as well as with the idea of "cheating", I.e. assuming that keys are symmetric, and so when I get one callback in either direction, installing that as the key for both directions, and skipping subsequent calls for the same protection level.
Referencing the former, its a bit tricky, as it appears the QUIC stack expects various key transitions after various calls to CxPlatTlsProcessData, and by buffering the keys, thats not always honored.
Referencing the latter specifically, it seems to generally work, but locally I'm hitting several different failures here, most notably in TlsTest.HandshakeResumption, which fails in on lines 418 of Test.cpp indicating that The BufferKey attempting to be sent (at level 0) doesn't match the Connection Read Key Level of 0RTT (level 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nibanks I'd ask that @mattcaswell chime in again here, but just as a datapoint, If I back out the change that removed this particular assert, it occurs when the write key level is 0RTT and the read key level is HANDSHAKE.
Adding a bit of printf-ery in, it appears that we get this pattern in key callbacks When Running the Handshake/WithHandshakeArgs4.RandomLossResumeRejection (Which is whats failing in CI):
Installing secret SSL 0x7f639802a890 (Server) for Write, level 2
Installing secret SSL 0x7f639802a890 (Server) for Read, level 2
Installing secret SSL 0x7f639802a890 (Server) for Write, level 3
Installing secret SSL 0x7f63a0006d70 (Client) for Read, level 2
Installing secret SSL 0x7f63a0006d70 (Client) for Write, level 2
Installing secret SSL 0x7f63a0006d70 (Client) for Read, level 3
Installing secret SSL 0x7f63a0006d70 (Client) for Write, level 3
Installing secret SSL 0x7f639802a890 (Server) for Read, level 3
Installing secret SSL 0x7f63a000b280 (Client) for Write, level 1 <=====HERE
Installing secret SSL 0x7f639403b130 (Server) for Write, level 2
Installing secret SSL 0x7f639403b130 (Server) for Read, level 2
Installing secret SSL 0x7f639403b130 (Server) for Write, level 3
Installing secret SSL 0x7f63a000b280 (Client) for Read, level 2 <==== AND HERE
It appears that the client can receive the write key for 0RTT, then the read key for HANDSHAKE, which (maybe) makes sense? @mattcaswell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming that keys are symmetric, and so when I get one callback in either direction, installing that as the key for both directions,
This is not a valid assumption.
For all other TLS libraries we've used, here in the completion processing code for QUIC, we never expected to have a Read key that is more recent than a write key. Can you explain the scenario where OpenSSL breaks that assumption, generally in wire-format terms?
We don't make any guarantees about the order of availability of keys. On the client side I think it turns out to be true that you will never get a read key installed after the write key for any particular protection level. On the server side I don't think this holds true. For example we install the app data write keys immediately after we have written the server Finished - but we don't install the app data read keys until we have processed the client Finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't make any guarantees about the order of availability of keys
But why was this design chosen? To offer maximum flexibility for future algorithms?
We're trying to understand how it's possible to receive TLS 1.3 records such that the handshake or app keys wouldn't be available together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design is driven by when the keys are made available for "standard" TLS. It's the TLS implementation that decides to release keys (independent of whether we are doing QUIC, standard TLS or DTLS). In the example I gave above we install app data write keys immediately after we have written the server Finished because it is possible for the server to start writing app data immediately, even though the client has not been authenticated yet (so called 0.5RTT data). At least this is true for standard TLS, I don't know what the QUIC RFC has to say about it. But the app data read keys aren't released until the client Finished has been received (which may occur sometime later) and therefore the client has been authenticated.
I'm thinking that we should move away from submodules for TLS dependencies. I never likely having two for quictls/openssl v1.1 and v3, but having 3 submodules only makes this worse. I'm open to suggestions, but I'm thinking about having CMake dynamically get the submodule and put it in a well-known path (perhaps still submodules/openssl, or perhaps the build folder). Any better ideas? P.S. Depending on how we do this (likely in a separate PR first), we might be able to isolate most of the changes in this PR (all the name changes)... |
// | ||
// TODO: Remove this when the above tests are tolerant of addition of ML-KEM keyshares | ||
// | ||
SSL_set1_groups_list(TlsContext->Ssl, "secp256r1:x25519"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest trying "?*X25519:?secp256r1 / ?X448:?secp384r1:?secp521r1 / ?ffdhe2048:?ffdhe3072"
as this will set the same configuration as in default settings but without enabling the hybrid key share .
Success! Of the remaining tests that failed:
So I think at this point we're generally in good shape. At this point I think the next steps are: b) Once (a) is done, I need to update this PR to stop using my personal branch with those openssl changes, and update the openssl submodule to pull from the openssl-3.5 branch again c) I should probably rebase this PR to master, and squash all my intervening changes down to something more readable and reviewable, and run it through CI one more time to ensure that nothing new broke from that rebase effort. Please tell me if you disagree, but at that point I think we're good to go. |
I agree! Really awesome job so far! Thanks so much! @anrossi please help review the latest and let's get this merged. |
forgive the delay here, we've found some issues with openssl/openssl#27646 that I'm addressing. Hope to have them taken care of shortly. |
@mattcaswell has an alternate fix for key yielding in openssl that I'd like to try on this PR if we could please |
Note, the ubuntu/quictls/android failure: |
ok! I've cleaned up this pr, squashing all the various commits down to something logical, and updated the openssl submodule to the latest on the 3.5 branch, so as to pull in openssl/openssl#27732 It passes all the unit tests locally on my linux and windows system here, so I think we just need to re-run CI to make sure I haven't screwed up any corner cases and I think we're good to go. |
Openssl, in supporting ML-KEM, can trip over a corner case in asserting what key level we're at in this code.
Just Marking a TODO for later, in that the fuzzer should support TLS records that span multiple datagrams (which is the case when ML-KEM is in use)
BufferKey is only set at the start of the test, But some TLS implementations may update their keys while processing the data passed into this function specifically observed on openssl, Sending a buffer with a ServerHello to a client will yield handshake keys immediately, and following data will cause this to fail so only check once at the start of the test to ensure we are in the right state
one more rebase to correct the recent conflict with the main branch |
ok, I think we're ready for review here. We had 3 failures in CI. Two from code coverage due to 2 exercised lines of code, which I think can be ignored, and what appears to be a transient failure on quictls that I think is probably out of scope for this PR. Please review at your convenience @nibanks @anrossi |
Awesome! Thanks so much for your tireless work here! Really appreciated! |
no sweat, it was a good exercise! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your all your hard work here! This has been a monumental effort, and we really appreciate your work, patience, and persistence on this PR.
Thank you all! You're help on this was greatly appreciated! |
Description
This commit enhances msquic to allow it to build using the canonical openssl sources as a TLS backend.
Testing
All existing tests should cover this code as far as I can see.
I've run this code base against the quic-interop-runner and it passes all supported test cases there (save for zerortt I believe) which fails with all TLS backends at the moment I think, but we have apis to support early data, so we should be able to get that up and running shortly.
All unit tests pass under windows currently, save for WithHandshakeArgs4.RandomLossResumeRejection/4 and WithHandshakeArgs4.RandomLossResumeRejection/5 which were failing scholastically. I believe I have those fixed by adjusting some of the asserts when built in debug mode which (If I read the code base right) assume that keys are updated in lock step, which the openssl quic callback api doesn't always do, instead updating keys in independent callbacks.
Documentation
Possibly. currently this PR renames openssl support (which initially pointed to quictls), to actually be named quictls[3] and takes over the QUIC_TLS=openssl configuration to use the canonical openssl sources. That may require build doc updates to reflect that, but I've not clearly found any (yet).