Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(kernel): add listeners to services #258

Merged
merged 19 commits into from
Sep 3, 2023
Merged

feat(kernel): add listeners to services #258

merged 19 commits into from
Sep 3, 2023

Conversation

hawkw
Copy link
Contributor

@hawkw hawkw commented Aug 28, 2023

Currently, when a task (or userspace) wishes to connect to a RegisteredDriver,
it performs a registry lookup, which returns a handle for sending requests to
that service. These handles are essentially producers for a single KChannel
which is associated with the service when it's registered, and the service can
use this channel's KConsumer to receive a stream of incoming requests.

This approach works fine, but it limits what a service can do with its incoming
request stream. Because all requests arrive on the same KConsumer, the server
may not spawn a separate task to handle requests from each client. Similarly, it cannot
route requests to different worker tasks based on which resource the request is
associated with.

This branch changes the way clients communicate with servers by introducing a
handshake to this process. Rather than registering a service by giving the
registry a KProducer for a single request channel, the registry now owns a
producer for a channel of incoming connections. These connections contain a
Hello message sent by the client, of a type defined by the RegisteredDriver
implementation for the service. This may be used by the service to determine
what resource the client wants to access, or otherwise route connections to
different tasks. This handshake is completed by the service responding to the
incoming connection with a KProducer for a request channel, which is then
returned to the client.

The returned KProducer may still point to a single global request channel
for that service, allowing us to continue to implement the existing behavior.
However, the service may alternatively spawn a new task for that connection and
create a new channel for that worker. Or, it may return a different KProducer
for a different request channel depending on the Hello message, allowing the
server to have separate worker tasks for different resources managed by the same
service, and route the connection to one of those workers depending on which
resource was requested in the Hello message.

An adapter, into_request_stream, is provided, to make it easier for services
which want the current behavior of one global stream of requests to have that.
Currently, all the existing services use this, but in the future, many of our
existing services can be rewritten to use the new handshake mechanism more
effectively. I'd also like to do some additional refactoring, but I wanted to
get a working PR up first.

@hawkw hawkw changed the title WIP: start on service listeners feat(kernel): add listeners to services Sep 2, 2023
@hawkw hawkw requested a review from jamesmunns September 2, 2023 17:45
@hawkw hawkw marked this pull request as ready for review September 2, 2023 17:45
Copy link
Contributor

@jamesmunns jamesmunns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some notes, happy to merge this and follow up. Thank you!

platforms/beepy/src/i2c_puppet.rs Show resolved Hide resolved
/// data from the client, this type can be set to [`()`].
// XXX(eliza): ideally, we could default `Hello` to () and `ConnectError` to
// `Infallible`...do we want to do that? it requires a nightly feature.
type Hello: 'static;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, we might want to give this a name like "Introduction" or "ConnectDetails" or something, but we can bikeshed later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i stole the name from TLS handshakes, which start with a ClientHello message followed by a ServerHello message. But, if you don't like it, i'm happy to change it later...

source/kernel/src/registry/mod.rs Outdated Show resolved Hide resolved
NotFound,
/// The remote [`RegisteredDriver`] rejected the connection.
Rejected(D::ConnectError),
/// The remote [`RegisteredDriver`] has been registered, but the service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could check this in the registry (awaiting channel closure?), tho we can't remove services anyway yet.

UserRequest<'_>,
&ErasedKProducer,
&bbq::MpscProducer,
ServiceId,
ClientId,
) -> Result<(), UserHandlerError>;

type ErasedHandshake = unsafe fn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, if we want to just yank all of the userspace stuff for now, we've already removed the k2u/u2k rings, and this will all be redesigned to be interface-like later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my motivation for actually writing the userspace code was mostly that i wanted to validate that the new design didn't completely break our type erasure mechanism, and that it was possible to do something like this for a serialization based service. i'm happy to remove this code and add something similar back in later, but i have a slight preference for keeping it so that we can build on top of it when integrating the new serializing channel. fundamentally up to you, though.

@hawkw hawkw merged commit 25443a2 into main Sep 3, 2023
7 checks passed
@hawkw hawkw deleted the eliza/listener branch September 3, 2023 17:26
hawkw added a commit that referenced this pull request Sep 3, 2023
The registry emits a bunch of `tracing` diagnostics, but they're not
*that* helpful in most cases. In particular, when we get a service, we
log its UUID, but this isn't super readable for someone consuming the
logs --- you have to then take that UUID and look it up against the
table of potential UUIDs to figure out which service was being looked
up. Also, some of the errors introduced in #258 look pretty lousy in
`fmt::Debug` output.

This branch fixes both of those issues. I've changing the logs emitted
by the registry to also include the type name of the service, which
makes it easier to tell what's going on without having to
cross-reference UUIDs against the actual source code. Furthermore, I've
improved the readability of some of the error types.


Example log output:
<details>
<summary>Before</summary>

```rust
running 1 test
 INFO registry::tests::user_connect Registry::register{uuid=05bcd4b7-dd81-434a-a958-f18ee84f8635}: kernel::registry: Registered uuid=05bcd4b7-dd81-434a-a958-f18ee84f8635 service_id=0
 INFO registry::tests::user_connect kernel::comms::bbq: Creating new mpsc BBQueue channel capacity=256
 INFO registry::tests::user_connect kernel::comms::bbq: Channel created successfully
 INFO registry::tests::user_connect user_connect{hello=TestMessage(1)}:Registry::connect_userspace_with_hello{user_hello=[1] uuid=05bcd4b7-dd81-434a-a958-f18ee84f8635}: kernel::registry: Got KernelHandle from Registry uuid=05bcd4b7-dd81-434a-a958-f18ee84f8635 service_id=0 client_id=1
 INFO registry::tests::user_connect user_request{req=TestMessage(1)}: kernel::registry::tests: sending user request req=TestMessage(1)
 INFO registry::tests::user_connect kernel::registry::tests: received request val=1
 INFO registry::tests::user_connect user_request{req=TestMessage(1)}: kernel::registry::tests: return=Ok(TestMessage(2))
ERROR registry::tests::user_connect user_connect{hello=TestMessage(2)}: kernel::registry::tests: error=UserConnectError::Connect(ConnectError::<kernel::registry::tests::TestService>::Rejected(TestMessage(666)))
 INFO registry::tests::user_connect user_connect{hello=TestMessage(1)}:Registry::connect_userspace_with_hello{user_hello=[1] uuid=05bcd4b7-dd81-434a-a958-f18ee84f8635}: kernel::registry: Got KernelHandle from Registry uuid=05bcd4b7-dd81-434a-a958-f18ee84f8635 service_id=0 client_id=2
 INFO registry::tests::user_connect user_request{req=TestMessage(2)}: kernel::registry::tests: sending user request req=TestMessage(2)
 INFO registry::tests::user_connect kernel::registry::tests: received request val=2
 INFO registry::tests::user_connect user_request{req=TestMessage(2)}: kernel::registry::tests: return=Ok(TestMessage(3))
 INFO registry::tests::user_connect user_request{req=TestMessage(3)}: kernel::registry::tests: sending user request req=TestMessage(3)
 INFO registry::tests::user_connect kernel::registry::tests: received request val=3
 INFO registry::tests::user_connect user_request{req=TestMessage(3)}: kernel::registry::tests: return=Ok(TestMessage(4))
test registry::tests::user_connect ... ok
```
</details>
<details>

<summary>After</summary>

```rust
running 1 test
 INFO registry::tests::user_connect Registry::register{svc=kernel::registry::tests::TestService}: kernel::registry: Registered svc=kernel::registry::tests::TestService uuid=05bcd4b7-dd81-434a-a958-f18ee84f8635 service_id=0
 INFO registry::tests::user_connect kernel::comms::bbq: Creating new mpsc BBQueue channel capacity=256
 INFO registry::tests::user_connect kernel::comms::bbq: Channel created successfully
 INFO registry::tests::user_connect user_connect{hello=TestMessage(1)}:Registry::connect_userspace_with_hello{user_hello=[1] svc=kernel::registry::tests::TestService}: kernel::registry: Got KernelHandle from Registry svc=kernel::registry::tests::TestService uuid=05bcd4b7-dd81-434a-a958-f18ee84f8635 service_id=0 client_id=1
 INFO registry::tests::user_connect user_request{req=TestMessage(1)}: kernel::registry::tests: sending user request req=TestMessage(1)
 INFO registry::tests::user_connect kernel::registry::tests: received request val=1
 INFO registry::tests::user_connect user_request{req=TestMessage(1)}: kernel::registry::tests: return=Ok(TestMessage(2))
ERROR registry::tests::user_connect user_connect{hello=TestMessage(2)}: kernel::registry::tests: error=Connect(Rejected { error: TestMessage(666), svc: kernel::registry::tests::TestService })
 INFO registry::tests::user_connect user_connect{hello=TestMessage(1)}:Registry::connect_userspace_with_hello{user_hello=[1] svc=kernel::registry::tests::TestService}: kernel::registry: Got KernelHandle from Registry svc=kernel::registry::tests::TestService uuid=05bcd4b7-dd81-434a-a958-f18ee84f8635 service_id=0 client_id=2
 INFO registry::tests::user_connect user_request{req=TestMessage(2)}: kernel::registry::tests: sending user request req=TestMessage(2)
 INFO registry::tests::user_connect kernel::registry::tests: received request val=2
 INFO registry::tests::user_connect user_request{req=TestMessage(2)}: kernel::registry::tests: return=Ok(TestMessage(3))
 INFO registry::tests::user_connect user_request{req=TestMessage(3)}: kernel::registry::tests: sending user request req=TestMessage(3)
 INFO registry::tests::user_connect kernel::registry::tests: received request val=3
 INFO registry::tests::user_connect user_request{req=TestMessage(3)}: kernel::registry::tests: return=Ok(TestMessage(4))
test registry::tests::user_connect ... ok
````
hawkw added a commit that referenced this pull request Sep 3, 2023
Depends on #267

Currently, the `Registry::connect_*` methods return an error immediately
if the requested service is not found in the registry. Most client types
implement a `from_registry` method which retry in a loop when the
requested service is not found in the registry. These methods will wait
for a fixed (short) amount of time and then try again until the registry
returns the service.

This approach is quite inefficient, as we have to run a bunch of retry
loops that keep trying to access a service that may not be there. This
may happen several times before the service actually is registered,
especially when registering a service requires connecting to another
service.

This branch improves the efficiency of waiting for a service to be
registered. Now, rather than retrying with a fixed-duration sleep, we
instead have the `Registry` own a `WaitCell` which is woken whenever a
new service is registered. This wakes all takes potentially waiting to
connect, allowing them to re-check whether the service they want is in
the registry. This idea was initially proposed by @jamesmunns in a
[comment] on PR #259

Connections are now established using either `Registry::connect`, which
retries whenever a new service is registered, or `Registry::try_connect`
which never retries.

Additionally, we now have the capacity to indicate that a service is not
found *and* that the registry is full, by closing the `WaitCell`. In
this case, retrying will never succeed, because the registry is full and
if the service isn't already there, it will never be added. In this
case, the retrying methods will also return an error, rather than never
completing, so we avoid a potential task leak.

In order to make this change, we need to move the `RwLock` from being
around the entire `Registry` to being inside the registry, around
`items`. This allows the `WaitCell` to be accessed regardless. It also
allows us to shorten the duration for which the lock is held. This
requires changing all methods on `Registry` to take `&self`.
Therefore, I've removed the wrapper methods on `Kernel` for connecting
and registering, since they can now just be called on `kernel.registry`
without a bunch of extra boilerplate for lock management. I've also
simplified the API surface of the registry a bit by removing the
`connect` methods that don't take a `Hello`, and just using
`Registry::connect(())` in those cases. IMO, those methods weren't
really pulling their weight, and they required us to have a method named
`Registry::try_connect_userspace_with_hello` if we were going to add a
non-retrying `connect` variant. Now, we can just have
`Registry::try_connect_userspace`, `Registry::connect_userspace`,
`Registry::connect`, and `Registry::try_connect`, which feels much less
egregious.

[comment]: #258 (comment)
hawkw added a commit that referenced this pull request Sep 3, 2023
Depends on #267

Currently, the `Registry::connect_*` methods return an error immediately
if the requested service is not found in the registry. Most client types
implement a `from_registry` method which retry in a loop when the
requested service is not found in the registry. These methods will wait
for a fixed (short) amount of time and then try again until the registry
returns the service.

This approach is quite inefficient, as we have to run a bunch of retry
loops that keep trying to access a service that may not be there. This
may happen several times before the service actually is registered,
especially when registering a service requires connecting to another
service.

This branch improves the efficiency of waiting for a service to be
registered. Now, rather than retrying with a fixed-duration sleep, we
instead have the `Registry` own a `WaitCell` which is woken whenever a
new service is registered. This wakes all takes potentially waiting to
connect, allowing them to re-check whether the service they want is in
the registry. This idea was initially proposed by @jamesmunns in a
[comment] on PR #259

Connections are now established using either `Registry::connect`, which
retries whenever a new service is registered, or `Registry::try_connect`
which never retries.

Additionally, we now have the capacity to indicate that a service is not
found *and* that the registry is full, by closing the `WaitCell`. In
this case, retrying will never succeed, because the registry is full and
if the service isn't already there, it will never be added. In this
case, the retrying methods will also return an error, rather than never
completing, so we avoid a potential task leak.

In order to make this change, we need to move the `RwLock` from being
around the entire `Registry` to being inside the registry, around
`items`. This allows the `WaitCell` to be accessed regardless. It also
allows us to shorten the duration for which the lock is held. This
requires changing all methods on `Registry` to take `&self`.
Therefore, I've removed the wrapper methods on `Kernel` for connecting
and registering, since they can now just be called on `kernel.registry`
without a bunch of extra boilerplate for lock management. I've also
simplified the API surface of the registry a bit by removing the
`connect` methods that don't take a `Hello`, and just using
`Registry::connect(())` in those cases. IMO, those methods weren't
really pulling their weight, and they required us to have a method named
`Registry::try_connect_userspace_with_hello` if we were going to add a
non-retrying `connect` variant. Now, we can just have
`Registry::try_connect_userspace`, `Registry::connect_userspace`,
`Registry::connect`, and `Registry::try_connect`, which feels much less
egregious.

[comment]: #258 (comment)
hawkw added a commit that referenced this pull request Sep 3, 2023
Depends on #267

Currently, the `Registry::connect_*` methods return an error immediately
if the requested service is not found in the registry. Most client types
implement a `from_registry` method which retry in a loop when the
requested service is not found in the registry. These methods will wait
for a fixed (short) amount of time and then try again until the registry
returns the service.

This approach is quite inefficient, as we have to run a bunch of retry
loops that keep trying to access a service that may not be there. This
may happen several times before the service actually is registered,
especially when registering a service requires connecting to another
service.

This branch improves the efficiency of waiting for a service to be
registered. Now, rather than retrying with a fixed-duration sleep, we
instead have the `Registry` own a `WaitCell` which is woken whenever a
new service is registered. This wakes all takes potentially waiting to
connect, allowing them to re-check whether the service they want is in
the registry. This idea was initially proposed by @jamesmunns in a
[comment] on PR #259

Connections are now established using either `Registry::connect`, which
retries whenever a new service is registered, or `Registry::try_connect`
which never retries.

Additionally, we now have the capacity to indicate that a service is not
found *and* that the registry is full, by closing the `WaitCell`. In
this case, retrying will never succeed, because the registry is full and
if the service isn't already there, it will never be added. In this
case, the retrying methods will also return an error, rather than never
completing, so we avoid a potential task leak.

In order to make this change, we need to move the `RwLock` from being
around the entire `Registry` to being inside the registry, around
`items`. This allows the `WaitCell` to be accessed regardless. It also
allows us to shorten the duration for which the lock is held. This
requires changing all methods on `Registry` to take `&self`.
Therefore, I've removed the wrapper methods on `Kernel` for connecting
and registering, since they can now just be called on `kernel.registry`
without a bunch of extra boilerplate for lock management. I've also
simplified the API surface of the registry a bit by removing the
`connect` methods that don't take a `Hello`, and just using
`Registry::connect(())` in those cases. IMO, those methods weren't
really pulling their weight, and they required us to have a method named
`Registry::try_connect_userspace_with_hello` if we were going to add a
non-retrying `connect` variant. Now, we can just have
`Registry::try_connect_userspace`, `Registry::connect_userspace`,
`Registry::connect`, and `Registry::try_connect`, which feels much less
egregious.

[comment]:
#258 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants