C# wrapper for io_uring
. This library fulfills the same purpose as the native liburing, by which it is heavily inspired.
The primary goal of this library is to bring io_uring
to all systems supporting it, also the ones without liburing
pre-installed.
If ulimit -l
returns something along the lines of 64K, adjustments should be made.
It's simplest (although not smartest) to set memlock
to unlimited in limits.conf (e.g. Ubuntu), to set DefaultLimitMEMLOCK=infinity
in systemd config (e.g. Clear Linux*), or to do the equivalent for your distro...
Experimental, managed ASP.NET Core Transport layer based on io_uring
. This library is inspired by kestrel-linux-transport, a similar linux-specific transport layer based on epoll
.
This transport layer supports both server (IConnectionListenerFactory
) and client (IConnectionFactory
) scenarios. It can be registered with services.AddIoUringTransport();
in the ConfigureServices
method.
A configurable number of TransportThread
s are started. Each thread opens an accept-socket on the server endpoint (IP and port) using the SO_REUSEPORT
option. This allows all threads to accept
inbound connections and will let the kernel load balance between the accept-sockets. The threads are also able to connect
outbound connections.
All threads are provided with the writing end of the same Channel
to write accepted connections to. This Channel
will be read from when ConnectAsync
is invoked on the IConnectionListener
. The Channel
is unbounded and back-pressure to temporarily disable accept
ing new connections is not yet supported.
The IConnectionFactory
will delegate the request for handling new outbound connections to a TransportThead
in a round robin fashion.
Each thread creates an io_uring
to schedule IO operations and to get notified of their completion.
Each thread also creates an eventfd
in the semaphore-mode (EFD_SEMAPHORE
) with an initial value of 0 and places a readv
operation (IORING_OP_READV
) from that eventfd
onto the io_uring
. This allows us - as we shall later see and use - to unblock the thread using a normal write
to the eventfd
if the thread is blocked by a io_uring_enter
syscall waiting for an IO operation to complete. This could also be achieved by sending a no-op (IORING_OP_NOP
) through the io_uring
but that would require synchronization of access to said ring, as now multiple threads may be writing to it. This trick allows us to be mostly lock-free in event loop. (The only exception being data structures such as Channel
and ConcurrentQueue
)
Before the event loop is started, we place above-mentioned readv
from the eventfd
as well as a poll
(IORING_OP_POLL_ADD
) for acceptable connections (POLLIN
) on the accept-socket.
The event loop is then made up of the following actions:
- Check the accept-socket-queue. This
ConcurrentQueue
contains newly bound sockets for server endpoints. For each connection in this queue, apoll
(IORING_OP_POLL_ADD
) for "acceptability" (POLLIN
) is added to theio_uring
. - Check the client-socket-queue. This
ConcurrentQueue
contains sockets to client endpoints for which a connect is in progress. For each connection in this queue, apoll
(IORING_OP_POLL_ADD
) for "writability" (POLLOUT
) is added to theio_uring
. "Writability" will indicate the completion of aconnect
. - Check the read-poll-queue. This
ConcurrentQueue
contains connections that could be read from again, after aFlushAsync
to the application completed asynchronously indicating that there was a need for back-pressure. The synchronous case is handled with a fast-path below. For each connection in this queue, apoll
(IORING_OP_POLL_ADD
) for incoming bytes (POLLIN
) is added to theio_uring
. - Check the write-poll-queue. This
ConcurrentQueue
contains connections that should be written to, after aReadAsync
from the application completed asynchronously. The synchronous case is handled with a fast-path below. For each connection in this queue, apoll
(IORING_OP_POLL_ADD
) for "writability" (POLLOUT
) is added to theio_uring
. - Submit all previously prepared operations to the kernel and block until at least on operation completed. (This involves one syscall to
io_uring_enter
). - Handle all completed operations. Typically each (successfully) completed operation causes another operation to be prepared for submission in the next iteration of the event loop. Recognized types of completed operations are:
- eventfd poll completion: The
poll
for theeventfd
completed. This indicates that aReadAsync
from or aFlushAsync
to the application completed asynchronously and that the corresponding connection was added to one of the above mentioned queues. The immediate action taken is to prepare anotherpoll
(IORING_OP_POLL_ADD
) for theeventfd
, as the connection specificpoll
s are added when handling the queues at the beginning of the next event loop iteration. This ensures that the transport thread could again be unlocked, if the nextio_uring_enter
blocks. - accept poll completion: The
poll
on an accept-socket completed. This indicates that one or more connection could beaccept
ed. One connection is accepted by invoking the syscallaccept
. In a future release, this could be done via theio_uring
(IORING_OP_ACCEPT
) to avoid the syscall, but this feature will only be available in the kernel version 5.5. that is unreleased by the time of writing this. The accepted connection is added to the above-mentioned channel and two operations are triggered. Apoll
(IORING_OP_POLL_ADD
) for incoming bytes (POLLIN
) is added to theio_uring
and aReadAsync
from the application is started to get bytes to be sent. If the latter completes synchronously, apoll
(IORING_OP_POLL_ADD
) for "writability" (POLLOUT
) is added to theio_uring
directly. In the asynchronous case, a callback is scheduled that will register the connection with the write-poll-queue and unblock the transport thread if necessary by writing to theevetfd
. - read poll completion: The
poll
for available data (POLLIN
) on a socket completed. Areadv
(IORING_OP_READV
) is added to theio_uring
to read the data from the socket. - write poll completion: On of two things could have happened:
- The
poll
for "writability" (POLLOUT
) of an outbound socket completed. Apoll
(IORING_OP_POLL_ADD
) for incoming bytes (POLLIN
) is added to theio_uring
. Additionally aReadAsync
from the application is started as for the write-queue items above. - The
poll
for "writability" (POLLOUT
) of an inbound socket completed. Awritev
(IORING_OP_WRITEV
) for the data previously acquired during aReadAsync
is added to theio_uring
.
- The
- read completion: The
readv
previously added for the affected socket completed. ThePipeline
is advanced past the number of bytes read and handed over to the application usingFlushAsync
. IfFlushAsync
completes synchronously, apoll
(IORING_OP_POLL_ADD
) for incoming bytes (POLLIN
) is added to theio_uring
directly. In the asynchronous case, a callback is scheduled that will register the connection with the read-poll-queue and unblock the transport thread if necessary by writing to theevetfd
. - write completion: The
writev
previously added for the affected socket completed. ThePipeline
is advanced past the number of bytes written and more data from the application is read usingReadAsync
. IfReadAsync
completes synchronously, apoll
for "writability" (POLLOUT
) is added to theio_uring
directly. In the asynchronous case, a callback is scheduled that will register the connection with the write-poll-queue and unblock the transport thread if necessary by writing to theevetfd
.
Once an IO operation handed over to io_uring
completes, the application needs to restore some contextual information regarding the operation that completed. This includes:
- The type of operation that completed (listed in bold above).
- The socket (and associated data) the operation was performed on
io_uring
allows for 64-bit of user data to be provided with each submission, that will be routed through to the completion of the request. The lower 32-bit of this value are set to the socket file descriptor, the operation is performed on and the high 32-bit are set to an operation indicator. This ensures context can be restored after the completion of an asynchronous operation.
The socket file descriptor is used as index into a Dictionary
to fetch the data associated with the socket.
- Error handling in general. This is currently a very minimal PoC.
- Polishing in general. Again, this is currently a very minimal PoC.
- Testing with more than a simple demo app...
- Benchmark and optimize
- Enable CPU affinity
- Investigate whether the use of zero-copy options are profitable (vis-a-vis registered buffers)
- Use multi-
iovec
readv
s if more than_memoryPool.MaxBufferSize
bytes are readable and ensure that the syscall toioctl(_fd, FIONREAD, &readableBytes)
is avoided in the typical cases where oneiovec
is enough.
- Create the largest possible (and reasonable)
io_uring
s. The max number ofentries
differs between kernel versions, perform auto-sensing. - Implement
accept
-ing new connections usingio_uring
, once supported on non-rc kernel versions (v5.5). - Implement
connect
-ing to new connections usingio_uring
, once supported on non-rc kernel versions (v5.5). - Profit from
IORING_FEAT_NODROP
or implement safety measures to ensure no more thanio_uring_params->cq_entries
operations are in flight at any given moment in time. - Profit form
IORING_FEAT_SUBMIT_STABLE
. Currently theiovec
s are allocated and fixed per connection to ensure they don't "move" during the execution of an operation. - Profit from
io_uring_register
andIORING_REGISTER_BUFFERS
to speed-up IO.
Add the following MyGet feed to your nuget.config:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<add key="myget-tkp1n" value="https://www.myget.org/F/tkp1n/api/v3/index.json" />
</packageSources>
</configuration>