Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Connection Pool API #4832

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

anrossi
Copy link
Contributor

@anrossi anrossi commented Feb 20, 2025

Description

Provide an API for pooling connections across multiple RSS cores

Testing

Manual testing

Documentation

Documentation will be updated with the new API

Comment on lines +1898 to +1901
if (RssConfig == NULL) {
Status = QUIC_STATUS_INVALID_PARAMETER;
goto Error;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an internal function, we can probably assert this instead of checking it. SAL indicates this is a required output too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's part of the CxPlat interface, so if we make that library separate, it should validate this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever actually do that, then there will be plenty to clean up. For now, simpler is better IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to leave this here. Why would we want less validation? It's on the control path, so this isn't performance sensitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not more or less validation, it is hard failure vs soft failure :)
This check means the caller might do something wrong (not provide the output buffer) and it will be harder for us to realize because we migth recover. A crash would point at the error directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a DBG assert instead of a check for all internal functions, independent of if it might be exposed later.

Copy link

codecov bot commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 32.54717% with 143 lines in your changes missing coverage. Please review.

Project coverage is 86.28%. Comparing base (da4cbd2) to head (acd61c6).

Files with missing lines Patch % Lines
src/core/connection_pool.c 30.39% 142 Missing ⚠️
src/core/library.h 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4832      +/-   ##
==========================================
- Coverage   87.08%   86.28%   -0.80%     
==========================================
  Files          56       57       +1     
  Lines       17685    17894     +209     
==========================================
+ Hits        15401    15440      +39     
- Misses       2284     2454     +170     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anrossi anrossi force-pushed the anrossi/connection-pool-api branch from 54be6ca to 47753f0 Compare February 25, 2025 09:07
@anrossi anrossi force-pushed the anrossi/connection-pool-api branch from d22ce49 to acd61c6 Compare March 5, 2025 09:14
@anrossi anrossi marked this pull request as ready for review March 5, 2025 09:14
@anrossi anrossi requested a review from a team as a code owner March 5, 2025 09:14
| Flag | Effect |
|------|--------|
| QUIC_CONNECTION_POOL_FLAG_NONE | Nothing |
| QUIC_CONNECTION_POOL_FLAG_CLOSE_CONNECTIONS_ON_FAILURE | Tells the API to clean up any created *and started* connections in the pool if an error occurrs. **Note:** The application must be able to handle having connections suddenly closed with this flag. Without this flag, the application is expected to clean up remaining connections. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and started" -> does it means the flag causes all created connections to be closed on failure, or only started connections?

I assume the former, that is the reasonable thing to do - but that sentence can be confusing. Maybe the "started" part should be in the "Note": "The connection might have been started when it get close because of this flag, ...."

void
)
{
const uint16_t CurrentProc = (uint16_t)CxPlatProcCurrentNumber();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cast needed? The function accepts a uint32_t

@@ -73,7 +74,8 @@ QuicConnAlloc(
// partition can be updated accordingly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be updated.
(the "just use the current processor" part is more nuanced now).

Ugh, github is annoying, it doesn't let me comment on a line out of its diff :(

QuicTestConnectionPoolCreate(
Params->ConnPoolCreateParams.Family,
Params->ConnPoolCreateParams.NumberOfConnections,
Params->ConnPoolCreateParams.XdpSupported == TRUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the == TRUE? If this is already a "nice" boolean, there is no need for it. If it isn't yet a "nice" boolean, this could give the wrong result (and !! or != FALSE would be better).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a pattern (that seems to have been abandoned) where boolean values from user mode were converted for use in kernel mode with != FALSE (which is what I should have done).
These are "nice" booleans, so I should be able to use them directly. Let's try that

DEFINE_ENUM_FLAG_OPERATORS(QUIC_CONNECTION_POOL_FLAGS);

typedef struct QUIC_CONNECTION_POOL_CONFIG {
_In_ HQUIC Registration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_In_ SAL annotation don't mean anything on a struct, and are not used in MsQuic currently (or at least in this file). I don't think we should introduce that pattern.
_Field_* annotations should be used if there is something to specify here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I'll convert the ones that can be to _Field_* and remove the rest.

Comment on lines +1898 to +1901
if (RssConfig == NULL) {
Status = QUIC_STATUS_INVALID_PARAMETER;
goto Error;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not more or less validation, it is hard failure vs soft failure :)
This check means the caller might do something wrong (not provide the output buffer) and it will be harder for us to realize because we migth recover. A crash would point at the error directly.

QUIC_STATUS Status = QUIC_STATUS_SUCCESS;
uint32_t CreatedConnections = 0;

if (Config == NULL || ConnectionPool == NULL || Config->Registration == NULL ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: That if test is very hard to read.
It could easily be split in several tests and potentially in a "validateConfig" function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll break it up to make it easier to read, but this is an area where I disagree with making a helper function.

Status);
goto Error;
} else if (RssConfig->RssSecretKeyLength < CXPLAT_TOEPLITZ_KEY_SIZE_MIN) {
Status = QUIC_STATUS_INVALID_STATE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this one also be INTERNAL_ERROR?
Is there a scenario that can lead here, or is this check to be cautious but essentially an assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be an ASSERT because I don't think RSS can be configured with a smaller key size (on windows at least. No idea about Linux).
But I picked INVALID_STATE because the RSS configuration would be in an invalid state. Potentially difficult for the caller to understand that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see...
INVALID_STATE is generally used when the caller did something at the wrong time or in the wrong order.

Do you know if the user could do something that affect the rss key length? If yes, INVALID_STATE can make sense.
But if it means the OS is in a wrong config (or a config ew don't expect), I think I'd keep INTERNAL_ERROR (or some other more precise error code)


for (; RetryCount < MaxCreationRetries; RetryCount++) {

uint32_t NewPort = QuicAddrGetPort(&LocalAddress) + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, everything inside the body of this loop should be extracted in a "TryCreateConnection" function, or something similar.
And everything in the parent loop should also be extracted in a "CreateConnection" function or some similar name.

Both are action that are pretty independent from all the pre-processing you need to do first, it will help a lot being able to abstract them.

Comment on lines +261 to +263
$RssSeedPath = (Join-Path $SetupPath tcprssseed.exe)
if (!(Test-Path $RssSeedPath)) { Write-Error "Missing file: $RssSeedPath" }
Invoke-Expression "$RssSeedPath set aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55aa55"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put a comment to say why we need this?

HANDLE InterfaceHandle = NULL;
HRESULT Result = XdpInterfaceOpen(InterfaceIndex, &InterfaceHandle);
if (FAILED(Result)) {
QuicTraceLogError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we should be using the following format for error traces (not logs) so they are collected in the low volume scenario:

        QuicTraceEvent(
            LibraryErrorStatus,
            "[ lib] ERROR, %u, %s.",
            WsaError,
            "WSAStartup");

Comment on lines +1389 to +1390
#define IOCTL_QUIC_RUN_VALIDATE_CONNECTION_POOL_CREATE \
QUIC_CTL_CODE(132, METHOD_BUFFERED, FILE_WRITE_DATA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define IOCTL_QUIC_RUN_VALIDATE_CONNECTION_POOL_CREATE \
QUIC_CTL_CODE(132, METHOD_BUFFERED, FILE_WRITE_DATA)
#define IOCTL_QUIC_RUN_VALIDATE_CONNECTION_POOL_CREATE \
QUIC_CTL_CODE(132, METHOD_BUFFERED, FILE_WRITE_DATA)

if (This->StartedConnectionCount > This->MaxConnections) {
return false;
}
This->Connections[This->StartedConnectionCount++] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if/when this is called in parallel on two threads/cpus?

_In_ size_t ServerNameLength
)
{
char* ServerNameCopy = CXPLAT_ALLOC_NONPAGED(ServerNameLength + 1, QUIC_POOL_SERVERNAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: offsets in this function are all off.

Definitions for the MsQuic Connection Pool API, which allows clients to
create a pool of connections that are spread across RSS cores.

--*/
Copy link
Member

@nibanks nibanks Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code coverage is still claiming a lot of the lines in this file are uncovered...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is because code coverage doesn't run with xdp. Probably should change that...

(Config->CibirIds != NULL && Config->CibirIdLength == 0) ||
(Config->CibirIds == NULL && Config->CibirIdLength != 0)) {
Status = QUIC_STATUS_INVALID_PARAMETER;
QuicTraceLogError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember, QuicTraceLog* functions are effectively verbose level 'printf' style traces. If you want to be able to efficiently capture an error trace, you will need to use QuicTraceEvent. Perhaps this is fine for invalid parameter calls (I don't think we even have these at all for api.c), but please keep in mind for all your error logging.

return Status;
}

QuicTraceEvent(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in general, for other functions in api.c we are logging api enter/exit even for INVALID_PARAMETER cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but I can't dereference an untrusted pointer without validating it isn't NULL. So either I log entry after the validation, or I log entry without any parameter, which seems to be the requirement for this scenario.


Status = CxPlatDataPathRssConfigGet(InterfaceIndex, &RssConfig);
if (QUIC_FAILED(Status)) {
QuicTraceLogError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, you don't need to log an error if the function itself already logs an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants