Add Support for openssl v3.5 as a TLS Backend #4959

nhorman · 2025-03-31T16:20:39Z

Description

This commit enhances msquic to allow it to build using the canonical openssl sources as a TLS backend.

Testing

All existing tests should cover this code as far as I can see.

I've run this code base against the quic-interop-runner and it passes all supported test cases there (save for zerortt I believe) which fails with all TLS backends at the moment I think, but we have apis to support early data, so we should be able to get that up and running shortly.

All unit tests pass under windows currently, save for WithHandshakeArgs4.RandomLossResumeRejection/4 and WithHandshakeArgs4.RandomLossResumeRejection/5 which were failing scholastically. I believe I have those fixed by adjusting some of the asserts when built in debug mode which (If I read the code base right) assume that keys are updated in lock step, which the openssl quic callback api doesn't always do, instead updating keys in independent callbacks.

Documentation

Possibly. currently this PR renames openssl support (which initially pointed to quictls), to actually be named quictls[3] and takes over the QUIC_TLS=openssl configuration to use the canonical openssl sources. That may require build doc updates to reflect that, but I've not clearly found any (yet).

nhorman · 2025-03-31T16:23:18Z

@microsoft-github-policy-service agree [company="OpenSSL"]
@microsoft-github-policy-service agree
@microsoft-github-policy-service agree company="Microsoft"

nhorman · 2025-03-31T16:24:15Z

@microsoft-github-policy-service agree company="OpenSSL"
@microsoft-github-policy-service agree

guhetier · 2025-03-31T17:18:23Z

I do think that renaming openssl -> quictls is a good change.

I have some concern with the fact that it would cause a use to go from building with quitls to building with openssl without any action on their end or warning.

Is it something we want?
Could we find a slightly different name for the "openssl" parameter to avoid re-using the existing name?
- maybe tagging everything with the major version number, so we have quictls1, quictls3, openssl3?

nhorman · 2025-03-31T17:30:31Z

@guhetier I think such a change would be fine, though I'd be curious to see what others thought about the version being added to the name - i.e. is doing this committing to adding an openssl4/5/6 in the future? Given that msquic builds against whatever openssl variant is defined in the associated submodule, I'm not sure thats needed, though if you want to support multiple openssl versions, it might be beneficial.

guhetier · 2025-03-31T18:13:39Z

@nhorman All these are great questions, we need to clarify how we see openssl upgrades happen in the future.

A different name pattern might also be an option - my concern is mainly about avoid the re-use of the same parameter with a different meaning. Lets gather some opinions about how to handle this best.

src/platform/tls_openssl.c

nhorman · 2025-03-31T18:26:37Z

@guhetier I understand what you're getting at. I don't have a whole lot of skin in that game, but I'll gladly implement what you and the rest of the dev team think is best.

Just to float another idea for you to consider (and I grant this is additional departure from your current build approach, and so may not be viable), but you might consider:

Removing the TLS submodules from the tree entirely
Creating a cmake variable that points to an openssl-like tree
using a configure time test in cmake to ascertain what platform adaptation layer to use based on what symbols are available in the library pointed to by (2)

i.e. check in cmake to see if SSL_provide_quic_data exists, if it does, then build the quictls platform adaptation layer, if SSL_set_quic_tls_cbs exists, then build the openssl platform adaptation layer.

That would disconnect you from needing to select a specific implementation manually at configure time.

In the interim, I'll start looking at these CI failures. Looks like I did something wrong in the openssl/quictls migration for the workflow files

src/platform/tls_openssl.c

.azure/obtemplates/build-linux.yml

src/core/crypto.c

src/platform/tls_openssl.c

codecov · 2025-03-31T19:15:46Z

Codecov Report

Attention: Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.

Project coverage is 86.81%. Comparing base (65660c4) to head (102bc39).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/core/crypto.c	77.77%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4959      +/-   ##
==========================================
- Coverage   87.38%   86.81%   -0.57%     
==========================================
  Files          59       59              
  Lines       18037    18045       +8     
==========================================
- Hits        15761    15666      -95     
- Misses       2276     2379     +103

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nhorman · 2025-03-31T21:07:12Z

ok, comments and CamelCase/snake_case issues fixed. Will start looking at CI failures next

nhorman · 2025-04-01T20:44:49Z

FYI, I have most of the cargo CI jobs building currently, but the windows static builds are failing, seemingly due to last nights windows-latest github runner update, which migrated from cmake 3 to cmake 4 which is having various build problems, that may be causing you trouble in other pull requests. We hit some similar problems in openssl this morning.

guhetier · 2025-04-04T01:00:49Z

Thanks for fixing that on top of everything else!
We are seeing the issue on other PR and didn't get the time to fix it yet.

Sorry for the delay in reviewing the PR, I'll get to it as soon as I can.

nhorman · 2025-04-04T01:43:00Z

don't rush, I know you all are busy. Feel free to cherry-pick any of the cmake/rust fixes to another PR if you want to prioritize those.

src/platform/tls_openssl.c

.gitmodules

Cargo.toml

scripts/build.ps1

scripts/clog.inputs

src/test/lib/ApiTest.cpp

nibanks · 2025-04-04T13:30:22Z

src/core/crypto.c

+        // Note: Nominally, if we have the write key, we should have the read key
+        // but there might be a delay with openssl as it yields read and write
+        // secrets independently.


Can this be abstracted from the core QUIC layer?

I had initially attempted to do so by delaying the installation of new keys until both read and write were available, but doing so resulted in other failures when the quic stack expected keys be available on return from ProcessData. I can play with it some more, but if you have additional input here as to how best to do this, I would appreciate it.

Can you help me understand why openssl would give a read key without a write key?

@nhorman any comments on this?

It will eventually give both of course, but there is a difference in how quictls provides keys vs how the openssl api does. in quictls the yield secret callback derives both keys, and provides them in a single callback, so they are always updated synchronously, whereas in openssl we use the state machine call path which derives the keys independenly and provides the in two separate callbacks, so there may be a period of time when the read key is updated but the write key is not, or vice versa

I'm experimenting with that now (and have in the past), as well as with the idea of "cheating", I.e. assuming that keys are symmetric, and so when I get one callback in either direction, installing that as the key for both directions, and skipping subsequent calls for the same protection level.

Referencing the former, its a bit tricky, as it appears the QUIC stack expects various key transitions after various calls to CxPlatTlsProcessData, and by buffering the keys, thats not always honored.

Referencing the latter specifically, it seems to generally work, but locally I'm hitting several different failures here, most notably in TlsTest.HandshakeResumption, which fails in on lines 418 of Test.cpp indicating that The BufferKey attempting to be sent (at level 0) doesn't match the Connection Read Key Level of 0RTT (level 1)

@nibanks I'd ask that @mattcaswell chime in again here, but just as a datapoint, If I back out the change that removed this particular assert, it occurs when the write key level is 0RTT and the read key level is HANDSHAKE.

Adding a bit of printf-ery in, it appears that we get this pattern in key callbacks When Running the Handshake/WithHandshakeArgs4.RandomLossResumeRejection (Which is whats failing in CI):

Installing secret SSL 0x7f639802a890 (Server) for Write, level 2 Installing secret SSL 0x7f639802a890 (Server) for Read, level 2 Installing secret SSL 0x7f639802a890 (Server) for Write, level 3 Installing secret SSL 0x7f63a0006d70 (Client) for Read, level 2 Installing secret SSL 0x7f63a0006d70 (Client) for Write, level 2 Installing secret SSL 0x7f63a0006d70 (Client) for Read, level 3 Installing secret SSL 0x7f63a0006d70 (Client) for Write, level 3 Installing secret SSL 0x7f639802a890 (Server) for Read, level 3 Installing secret SSL 0x7f63a000b280 (Client) for Write, level 1 <=====HERE Installing secret SSL 0x7f639403b130 (Server) for Write, level 2 Installing secret SSL 0x7f639403b130 (Server) for Read, level 2 Installing secret SSL 0x7f639403b130 (Server) for Write, level 3 Installing secret SSL 0x7f63a000b280 (Client) for Read, level 2 <==== AND HERE

It appears that the client can receive the write key for 0RTT, then the read key for HANDSHAKE, which (maybe) makes sense? @mattcaswell?

assuming that keys are symmetric, and so when I get one callback in either direction, installing that as the key for both directions,

This is not a valid assumption.

For all other TLS libraries we've used, here in the completion processing code for QUIC, we never expected to have a Read key that is more recent than a write key. Can you explain the scenario where OpenSSL breaks that assumption, generally in wire-format terms?

We don't make any guarantees about the order of availability of keys. On the client side I think it turns out to be true that you will never get a read key installed after the write key for any particular protection level. On the server side I don't think this holds true. For example we install the app data write keys immediately after we have written the server Finished - but we don't install the app data read keys until we have processed the client Finished.

We don't make any guarantees about the order of availability of keys

But why was this design chosen? To offer maximum flexibility for future algorithms?
We're trying to understand how it's possible to receive TLS 1.3 records such that the handshake or app keys wouldn't be available together.

The design is driven by when the keys are made available for "standard" TLS. It's the TLS implementation that decides to release keys (independent of whether we are doing QUIC, standard TLS or DTLS). In the example I gave above we install app data write keys immediately after we have written the server Finished because it is possible for the server to start writing app data immediately, even though the client has not been authenticated yet (so called 0.5RTT data). At least this is true for standard TLS, I don't know what the QUIC RFC has to say about it. But the app data read keys aren't released until the client Finished has been received (which may occur sometime later) and therefore the client has been authenticated.

nibanks · 2025-04-04T13:34:38Z

I'm thinking that we should move away from submodules for TLS dependencies. I never likely having two for quictls/openssl v1.1 and v3, but having 3 submodules only makes this worse. I'm open to suggestions, but I'm thinking about having CMake dynamically get the submodule and put it in a well-known path (perhaps still submodules/openssl, or perhaps the build folder). Any better ideas?

P.S. Depending on how we do this (likely in a separate PR first), we might be able to isolate most of the changes in this PR (all the name changes)...

t8m · 2025-05-23T14:48:13Z

src/platform/tls_openssl.c

+    //
+    // TODO: Remove this when the above tests are tolerant of addition of ML-KEM keyshares
+    //
+    SSL_set1_groups_list(TlsContext->Ssl, "secp256r1:x25519");


I'd suggest trying "?*X25519:?secp256r1 / ?X448:?secp384r1:?secp521r1 / ?ffdhe2048:?ffdhe3072" as this will set the same configuration as in default settings but without enabling the hybrid key share .

nhorman · 2025-05-23T14:50:41Z

Note I've created:
#5116
#5117

To track these remaining issues

nhorman · 2025-05-23T16:05:53Z

Success!

Of the remaining tests that failed:

Codecov - @nibanks I think you mentioned that this test was somewhat overly agressive, and I think in this case that holds. The patch its complaining about is a number of comments, with one extra conditional error case added, that we don't fall into nominally, so I think this can be ignored. Please tell me if you agree
The stress failure. I think this error is unrelated to this PR, as its a memory leak in QuicSentPacketPoolGetPacketMetadata, and I see this failure in other test runs (notably here https://github.com/microsoft/msquic/actions/runs/15090263875/job/42417843480)

So I think at this point we're generally in good shape.

At this point I think the next steps are:
a) We need to get the openssl changes for this support effort merged (from this openssl PR openssl/openssl#27646). Its generally ready to go, I need to add a test for it, which I'm doing today, and expect that to go in sometime next week

b) Once (a) is done, I need to update this PR to stop using my personal branch with those openssl changes, and update the openssl submodule to pull from the openssl-3.5 branch again

c) I should probably rebase this PR to master, and squash all my intervening changes down to something more readable and reviewable, and run it through CI one more time to ensure that nothing new broke from that rebase effort.

Please tell me if you disagree, but at that point I think we're good to go.

nibanks · 2025-05-23T20:31:19Z

So I think at this point we're generally in good shape.

At this point I think the next steps are: a) We need to get the openssl changes for this support effort merged (from this openssl PR openssl/openssl#27646). Its generally ready to go, I need to add a test for it, which I'm doing today, and expect that to go in sometime next week

b) Once (a) is done, I need to update this PR to stop using my personal branch with those openssl changes, and update the openssl submodule to pull from the openssl-3.5 branch again

c) I should probably rebase this PR to master, and squash all my intervening changes down to something more readable and reviewable, and run it through CI one more time to ensure that nothing new broke from that rebase effort.

Please tell me if you disagree, but at that point I think we're good to go.

I agree! Really awesome job so far! Thanks so much! @anrossi please help review the latest and let's get this merged.

src/platform/tls_openssl.c

nhorman · 2025-05-27T20:49:11Z

forgive the delay here, we've found some issues with openssl/openssl#27646 that I'm addressing. Hope to have them taken care of shortly.

nhorman · 2025-05-30T12:48:52Z

@mattcaswell has an alternate fix for key yielding in openssl that I'd like to try on this PR if we could please

nhorman · 2025-05-30T14:21:31Z

Note, the ubuntu/quictls/android failure:
https://github.com/microsoft/msquic/actions/runs/15347134043/job/43185942723?pr=4959
seems completely unrelated to this PR, likely some change in the master branch to CI that the workflow in this PR isn't handling, or a transient failure, should come out in the wash when I rebase

nhorman · 2025-06-03T18:17:05Z

ok!
openssl/openssl#27732 has been merged to the openssl repo

I've cleaned up this pr, squashing all the various commits down to something logical, and updated the openssl submodule to the latest on the 3.5 branch, so as to pull in openssl/openssl#27732

It passes all the unit tests locally on my linux and windows system here, so I think we just need to re-run CI to make sure I haven't screwed up any corner cases and I think we're good to go.

Openssl, in supporting ML-KEM, can trip over a corner case in asserting what key level we're at in this code.

Just Marking a TODO for later, in that the fuzzer should support TLS records that span multiple datagrams (which is the case when ML-KEM is in use)

BufferKey is only set at the start of the test, But some TLS implementations may update their keys while processing the data passed into this function specifically observed on openssl, Sending a buffer with a ServerHello to a client will yield handshake keys immediately, and following data will cause this to fail so only check once at the start of the test to ensure we are in the right state

nhorman · 2025-06-04T12:52:07Z

one more rebase to correct the recent conflict with the main branch

nhorman · 2025-06-04T16:30:00Z

ok, I think we're ready for review here. We had 3 failures in CI. Two from code coverage due to 2 exercised lines of code, which I think can be ignored, and what appears to be a transient failure on quictls that I think is probably out of scope for this PR. Please review at your convenience @nibanks @anrossi

nibanks · 2025-06-04T16:31:37Z

ok, I think we're ready for review here. We had 3 failures in CI. Two from code coverage due to 2 exercised lines of code, which I think can be ignored, and what appears to be a transient failure on quictls that I think is probably out of scope for this PR. Please review at your convenience @nibanks @anrossi

Awesome! Thanks so much for your tireless work here! Really appreciated!

nhorman · 2025-06-04T16:52:13Z

no sweat, it was a good exercise!

anrossi

Thank you so much for your all your hard work here! This has been a monumental effort, and we really appreciate your work, patience, and persistence on this PR.

nhorman · 2025-06-05T14:53:30Z

Thank you all! You're help on this was greatly appreciated!

nhorman requested a review from a team as a code owner March 31, 2025 16:20

nhorman force-pushed the openssl-quic-api branch from 1e7e38e to 69d5261 Compare March 31, 2025 16:39

github-advanced-security bot found potential problems Mar 31, 2025

View reviewed changes

src/platform/tls_openssl.c Fixed Show fixed Hide fixed

src/platform/tls_openssl.c Fixed Show fixed Hide fixed

src/platform/tls_openssl.c Fixed Show resolved Hide resolved

guhetier reviewed Mar 31, 2025

View reviewed changes

nhorman force-pushed the openssl-quic-api branch from 16b10e1 to eafd672 Compare March 31, 2025 21:04

nhorman force-pushed the openssl-quic-api branch 2 times, most recently from af45354 to 3cbded3 Compare April 1, 2025 20:00

anrossi reviewed Apr 4, 2025

View reviewed changes

src/platform/tls_openssl.c Outdated Show resolved Hide resolved

nibanks added TLS: OpenSSL external labels Apr 4, 2025