Support Windows #155

carllerche · 2015-04-20T23:06:34Z

Overview

Currently, Mio currently only supports Linux and Darwin platforms (though *BSD support could happen relatively easily). It uses epoll and kqueue respectively to provide a readiness API to consumers. Windows offers a completion based API (completion ports) which is significantly different from epoll & kqueue. The goal would be to tweak Mio in order to support Windows while still maintaining low overhead that mio strives for across all platforms.

History

I have wavered a bunch on the topic of how to best support Windows. At first, I had originally planned to do whatever was needed to support windows even if the implementation was less than ideal. Then, started towards not supporting windows with Mio and instead provide a standalone IO library that supported windows only. I started investigating the IOCP APIs in more depth and thinking about what a windows library would look like and it was very similar to what mio already is.

Completion Ports

There are a number of details related to using completion ports, but what matters is that instead of being notified when an operation (read, write, accept, ...) is ready to be performed and then performing the operation, an operation is submitted and then completion is signaled by reading from a queue.

For example, when reading, a byte buffer is provided to the operating system. The operating system then takes ownership of the buffer until the operation completes. When the operation completes, the application is notified by reading off of the completion status queue

Strategy

The strategy would be to, on windows, internally manage a pool of buffers. When a socket is registered with the event loop with readable interest, a read a system read would be initiated supplying an available buffer. When the read completes, the internal buffer is now full. The the event loop would notify readiness and the user will then be able to read from the socket. The read would copy data from the internal buffer to the user's buffer and the read would be complete.

On write, the user's data would be copied to a an internal buffer immediately and then the internal buffer submitted to the OS for the system write call.

Mio API changes

In order to implement the above strategy, Mio would not be able to rely on IO types from std::net anymore. As such, I propose to bring back TcpStream and TcpListener implemented in mio::net. Since Mio will then own all IO types, there will be no more need to have the NonBlock wrapper. Also, it seems thatNonBlock` can be confusing (see #154). So, all IO types in mio will always be blocking.

I believe that this will be the only required API change.

The text was updated successfully, but these errors were encountered:

retep998 · 2015-04-20T23:21:04Z

So effectively Mio would read in the background whenever the buffers for a handle are empty? Would it also read when the buffers are only partially empty? Would there be a cutoff level for when it reads more?

carllerche · 2015-04-21T00:17:40Z

@retep998 I think there will be some experimentation around that and possibly some tunable params. I was thinking of defaulting to 4kb per socket. If oneshot is requested, then only one read will be queued. Otherwise, it will attempt to keep the buffer full. There probably should be a heuristic as to when to start a new read if the buffer was only partially consumed. For example, if the 4kb buffer is full and the user reads 1 byte, it probably doesn't make sense to attempt another system read of 1 byte. Maybe a good default would be 1KB? So, once the internal buffer has 1KB available, Mio attempts to fill it.

Also gets rid of NonBlock. This is in preparation for windows support as described in #155. tl;dr: Windows support requires mio to have ownership of the tcp & udp types so the std versions cannot be used.

jnicholls · 2015-07-18T13:11:33Z

The idea of internal buffer management is a good one for sticking with the current Stream API. But the data copies would make Windows performance second-class unnecessarily. In truth, IOCP and RIO are actually superior models to epoll/kqueue that reduce CPU/syscalls. It would be a shame to have to do a 1-copy of the data.

I think it would be wise to keep the (Try)Read interface on top of the Windows streams, but also offer them access directly to the internal buffer and take ownership of it if desired. When they take ownership, Mio will allocate a replacement. I'd like to think that is a far more efficient model. It does diverge the API surface but that's where we are.

The only way to keep the API surface consistent while not sacrificing performance is to emulate the IOCP model on top of epoll/kqueue, which is easy to do. The reverse is not. E.g. the Handler would send buffers of data received or return buffers of data to write, rather than notifications of readiness.

I personally am not concerned about Windows support, but many others are. I think this will be an important decision for mio. If we go with the copy model, I would expect there to be a fork of mio or a second project spun up to compete.

jnicholls · 2015-07-18T14:24:07Z

Some more research may want to go into Windows RIO (Registered IO) and think about explicitly targeting that instead of overlapped IO. RIO supports different notification mechanisms (polled, evented, and IOCP) and thus it might fit more nicely into the current Mio readiness model. If that is the case, I honestly don't see a problem with only supporting Windows 8+. It is the future, and this is a green field lib :)

retep998 · 2015-07-18T14:27:20Z

Perhaps a feature flag to distinguish between using an older more compatible but less efficient IOCP model, or using the fancier RIO model?

jnicholls · 2015-07-18T14:45:16Z

Yeah definitely, if RIO ends up making sense to support. I'll dig in and get back with some details.

dpc · 2015-07-19T02:48:59Z

It seems to me having Windows support (even performance-impaired) is a good thing, so that any software written on top of mio (even the one that is not really performance critical) is portable. So the plan looks OK for stuff I'm interested in.

Give the fundamental differences between asynchronous I/O handling models, it might be a job for higher level libraries and software to switch between different mio-like libraries for different platforms. Unless of course someone figures out a really neat unified model.

jnicholls · 2015-07-19T15:24:46Z

Yeah that's fair. I do believe it will inspire another Rust I/O library that has the unified model. If Mip maintainers are okay with that, then this plan is the easiest path forward.

There is a unified model; the question is whether Mio wants to break its current model, or not.

dpc · 2015-07-20T00:33:44Z

Is there really an unified model? It seems to me that it's not really possible, and eg. RIO requires registering memory beforhand which does not seem to work well with Unix-like IO.

jnicholls · 2015-07-20T02:22:08Z

RIO wouldn't be a part of a unified model. RIO is an optimization over traditional overlapped IO where the buffers are registered ahead of time, like you said. This reduces syscalls & allocs, and thus CPU.

A unified model wouldn't need RIO. IOCP is fine, and it's already a superior model over epoll/kqueue as it stands (zero-copy, fewer syscalls, etc.). Given that, unified model absolutely exists: the IOCP model. Emulating IOCP on top of epoll/kqueue is absolutely doable; the reverse is not. Said plainly, the model is asynchronous I/O: the caller invokes an IO action like read/write, we take ownership of the buffer and perform the action, and notify the caller when we've completed the action. This is exactly what libuv does to abstract async IO over IOCP, epoll and kqueue.

Doing the above would be a big change for Mio. It would require state management of buffers, fds, requests, and callbacks (unless a single trait-based handler is sufficient for the notifications). Mio is currently a minimal abstraction with zero-allocations and no state management. Windows support will change that one way or another.

carllerche · 2015-07-20T17:21:43Z

I don't know anything about RIO.

MIO's goal is to be a zero cost abstraction over epoll and as lightweight as possible normalization of epoll's semantics to other platforms. So, MIO is going to need to have internal buffer management for the windows implementation. That being said, I believe that the windows implementation should end up being very close to (if not on par with) any "higher level" abstraction like libuv.

On top of that, it is entirely possible to expose non portable, windows specific APIs in a windows specific module in mio that would be a zero cost abstraction on top of IOCP (or something close).

The problem, though, is that I am not a skilled windows dev and I am figuring out IOCP as I go...

jnicholls · 2015-07-20T18:00:18Z

On top of that, it is entirely possible to expose non portable, windows specific APIs in a windows specific module in mio that would be a zero cost abstraction on top of IOCP (or something close).

The current plan would then be sufficient if the user is allowed to take ownership of the internal buffer that Mio used to perform the overlapped IO request. It would then let the caller choose whether to take the optimization (take ownership of the buffer used) or leave it and maintain compatible code (read/copy from the Mio owned buffer to the caller's buffer). Though I don't write much Windows code now-a-days and thus I have little dog in this fight, I'd say that's a good compromise. There's still a bunch of Windows-specific things that will have to go into that API anyways, e.g. buffer sizes and thresholds. That's why the libuv model is so simple: the user initiates the action and provides the buffers for said action, which is 1:1 with IOCP and easily emulated on top of the unix non-blocking APIs.

carllerche · 2015-07-20T18:52:42Z

I think I would also consider a portable API that allows you to take ownership of the buffer as well, but that will be in the future after initial windows support.

I specifically did not pick the libuv strategy for the reason you described. It's closer to the windows model and requires overhead to emulate on *nix systems. Mio is going to favor epoll & *nix platforms primarily for the cross platform APIs.

verysimplenick · 2015-08-12T20:17:22Z

@carllerche really need windows support, maybe ugly implementation but needed.

Diggsey · 2015-08-17T03:22:54Z

I don't believe windows support is possible with the mio model: not all reads have the same behaviour, so until a read is actually performed, mio can't predict it.

For example, if you wait for data on a TCP socket, and then read 1 byte, everything's fine on linux. On windows, that would only work if mio had predicted that you only wanted 1 byte, otherwise it could have started a much larger read, which never completes (as only 1 byte is ever sent).

Even worse, reads are not the only possible operation: the user may instead decide to close the socket, or do some completely different operation. Predicting a read in that situation would result in visibly different behaviour.

I think a better approach is to leave mio as-is, and implement a higher level library on top, which presents the IOCP model, using mio on linux and native IOCP (/RIO if available) on windows.

To be honest, I think @carllerche is somewhat overstating the performance overhead of implementing IOCP on top of a readiness model: while it's true that you have to allocate buffers up-front, given a decently fast allocator (such as jemalloc), the ability to pool buffers if necessary, the fact that the OS need not actually commit physical memory to the buffer until data is actually written into it, and also the fact that received data has to be stored somewhere - in the readiness model it just happens to be in buffers owned by the kernel instead of the application, narrow the margin to practically nothing.

That's not to say I don't think mio has a place - just that it may not be ideal for using directly from application code (apart from anything else, the IOCP model is much easier for people to get to grips with).

jnicholls · 2015-08-17T10:12:53Z

I agree.

On Sunday, August 16, 2015, Diggory Blake notifications@github.com wrote:

I don't believe windows support is possible with the mio model: not all
reads have the same behaviour, so until a read is actually performed, mio
can't predict it.

For example, if you wait for data on a TCP socket, and then read 1 byte,
everything's fine on linux. On windows, that would only work if mio had
predicted that you only wanted 1 byte, otherwise it could have started a
much larger read, which never completes (as only 1 byte is ever sent).

Even worse, reads are not the only possible operation: the user may
instead decide to close the socket, or do some completely different
operation. Predicting a read in that situation would result in visibly
different behaviour.

I think a better approach is to leave mio as-is, and implement a higher
level library on top, which presents the IOCP model, using mio on linux and
native IOCP (/RIO if available) on windows.

To be honest, I think @carllerche https://github.com/carllerche is
somewhat overstating the performance overhead of implementing IOCP on top
of a readiness model: while it's true that you have to allocate buffers
up-front, given a decently fast allocator (such as jemalloc), the ability
to pool buffers if necessary, the fact that the OS need not actually commit
physical memory to the buffer until data is actually written into it, and
also the fact that received data has to be stored somewhere - in the
readiness model it just happens to be in buffers owned by the kernel
instead of the application, narrow the margin to practically nothing.

That's not to say I don't think mio has a place - just that it may not be
ideal for using directly from application code (apart from anything else,
the IOCP model is much easier for people to get to grips with).

—
Reply to this email directly or view it on GitHub
#155 (comment).

Sent from Gmail Mobile

carllerche · 2015-08-17T17:23:48Z

@Diggsey

I think you are probably overstating the performance overhead of implementing the readiness model on top of IOCP 😉 My take is that 90% of the time, the overhead of a completion / readiness port does not matter as the largest overhead will be from code using Mio.

However, I also care about that last 10%. The largest group of users that care about that last 10% ship on Linux vs. Windows, it makes sense to optimize for them.

Regarding your specific example w/ a 1byte read, I'm not sure I follow what you see the problem as.

Basically, with Linux, the kernel manages a set of staging buffers to hold data once they come from the socket before the user reads. On windows, the event loop will manage this set of staging buffers. I have been doing a lot of experiments, the overhead will be minimal, maybe slightly larger than implementing completion on Linux.

Finally, after initial Windows support, the goal will be to add further windows specific APIs to try to reduce cost at the expense of a little bit of non portable code.

So, in short, since the cost of implementing readiness on windows will be, at worse, slightly larger than what it would cost to implement completion on linux. The win of providing a readiness model on linux, the most common server platform, is significant.

jnicholls · 2015-08-17T18:30:02Z

On Mon, Aug 17, 2015 at 1:23 PM, Carl Lerche notifications@github.com
wrote:

@Diggsey https://github.com/Diggsey

I think you are probably overstating the performance overhead of
implementing the readiness model on top of IOCP [image: 😉] My take
is that 90% of the time, the overhead of a completion / readiness port does
not matter as the largest overhead will be from code using Mio.

However, I also care about that last 10%. The largest group of users that
care about that last 10% ship on Linux vs. Windows, it makes sense to
optimize for them.

Regarding your specific example w/ a 1byte read, I'm not sure I follow
what you see the problem as.

Basically, with Linux, the kernel manages a set of staging buffers to hold
data once they come from the socket before the user reads. On windows, the
event loop will manage this set of staging buffers. I have been doing a lot
of experiments, the overhead will be minimal, maybe slightly larger
than implementing completion on Linux.

I think the issue is starvation. If you send a buffer to read 64KiB, IOCP
will not notify the operation is complete until 64KiB has been read in.
During that time, the application will not be able to read smaller chunks
that it may desire as soon as possible. Unless there is a means to timeout
the I/O completion sooner, and get partial data?

Finally, after initial Windows support, the goal will be to add further
windows specific APIs to try to reduce cost at the expense of a little bit
of non portable code.

So, in short, since the cost of implementing readiness on windows will be,
at worse, slightly larger than what it would cost to implement completion
on linux. The win of providing a readiness model on linux, the most common
server platform, is significant.

—
Reply to this email directly or view it on GitHub
#155 (comment).

retep998 · 2015-08-17T18:34:50Z

@jnicholls What are you talking about? When you do a read using something like WSARecv with IOCP, it will fire a completion notification as soon as any data is read at all. It will not wait until the full 64 KiB has been read.

carllerche · 2015-08-17T18:35:33Z

@jnicholls That has not been my experience during my IOCP experiments. I would even say that such behavior would make writing network applications impossible. There are many times with network protocols that the number of bytes to read is unknown.

jnicholls · 2015-08-17T23:27:26Z

Just testing you guys...just testing...

On Mon, Aug 17, 2015 at 2:35 PM, Carl Lerche notifications@github.com
wrote:

@jnicholls https://github.com/jnicholls That has not been my experience
during my IOCP experiments. I would even say that such behavior would make
writing network applications impossible. There are many times with network
protocols that the number of bytes to read is unknown.

—
Reply to this email directly or view it on GitHub
#155 (comment).

Diggsey · 2015-08-18T01:14:43Z

My mistake, I didn't realise that WSARecv did not require you to specify the number of bytes to read up-front.

However, my second point still stands: multiple operations are possible on a socket, and starting an operation, then waiting for the result may cause visibly different behaviour from waiting for a notification, and then being able to decide whether or not to even perform the operation. Pre-emptively initiating every possible operation seems like it will quickly get out of hand.

I think you are probably overstating the performance overhead of implementing the readiness model on top of IOCP

I'll believe it when I see it... Maybe you'll be able to support non-blocking reads with not unreasonable overhead, but I'm not convinced about things like accept.

Why are you trying to emulate readiness via IOCP anyway? There are several methods which look like they provide essentially the same readiness-style API as exists on linux: select, WSAEventSelect, WSAEnumNetworkEvents?

retep998 · 2015-08-18T02:14:14Z

@Diggsey Because those techniques are not designed to scale up to tens of thousands of sockets the way IOCP is.

Diggsey · 2015-08-18T02:32:54Z

@retep998 Oh? This technique seems very scalable:

Create a hidden window for its message queue (this is not uncommon on windows)
For each created socket, call WSAAsyncSelect with the hidden window

You can then read and react to messages posted to the window. lparam/wparam tell you the socket and the type of event.

alexcrichton · 2015-08-25T00:04:01Z

cc #239, an initial stab at Windows TCP/UDP, but with much room to expand!

These commits add preliminary support for the TCP/UDP API of mio, built on top of IOCP using some raw Rust bindings plus some networking extensions as the foundational support. This support is definitely still experimental as there are likely to be a number of bugs and kinks to work out. I haven't yet done much benchmarking as there are still a number of places I would like to improve the implementation in terms of performance. I've also been focusing on getting "hello world" and the in-tree tests working ASAP to start getting some broader usage and feedback. High level docs are available in the src/sys/windows/mod.rs file and the TCP/UDP implementations are quite similar in terms of how they're implemented. Not many new tests were added, but all tests (other than those using unix sockets) are passing on Windows and an appveyor.yml file was also added to enable AppVeyor CI support to ensure this doesn't regress. cc tokio-rs#155

alexcrichton · 2015-08-25T23:29:22Z

cc #246, #245, #244, #243, #242, and #241

carllerche · 2015-11-23T18:23:31Z

Closing this as the bulk of the initial work has landed. Further work will happen as individual issues / PRs.

jswrenn mentioned this issue Apr 23, 2015

Update README to reflect eventual Windows support #158

Closed

This was referenced Apr 30, 2015

Implement io::Read and io::Write for NonBlock<T> #162

Closed

RFC: Stop using std::net types (TcpStream / UdpSocket) #146

Closed

carllerche mentioned this issue May 28, 2015

Add scatter/gather operations #185

Closed

carllerche mentioned this issue Aug 3, 2015

Windows support #10

Closed

alexcrichton mentioned this issue Aug 25, 2015

Preliminary Windows TCP/UDP support #239

Merged

carllerche added the windows Related to the Windows OS. label Aug 25, 2015

carllerche closed this as completed Nov 23, 2015

kseo mentioned this issue Jun 27, 2018

Support Windows build CodeChain-io/codechain#260

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Windows #155

Support Windows #155

carllerche commented Apr 20, 2015

retep998 commented Apr 20, 2015

carllerche commented Apr 21, 2015

jnicholls commented Jul 18, 2015

jnicholls commented Jul 18, 2015

retep998 commented Jul 18, 2015

jnicholls commented Jul 18, 2015

dpc commented Jul 19, 2015

jnicholls commented Jul 19, 2015

dpc commented Jul 20, 2015

jnicholls commented Jul 20, 2015

carllerche commented Jul 20, 2015

jnicholls commented Jul 20, 2015

carllerche commented Jul 20, 2015

verysimplenick commented Aug 12, 2015

Diggsey commented Aug 17, 2015

jnicholls commented Aug 17, 2015

carllerche commented Aug 17, 2015

jnicholls commented Aug 17, 2015

retep998 commented Aug 17, 2015

carllerche commented Aug 17, 2015

jnicholls commented Aug 17, 2015

Diggsey commented Aug 18, 2015

retep998 commented Aug 18, 2015

Diggsey commented Aug 18, 2015

alexcrichton commented Aug 25, 2015

alexcrichton commented Aug 25, 2015

carllerche commented Nov 23, 2015

Support Windows #155

Support Windows #155

Comments

carllerche commented Apr 20, 2015

Overview

History

Completion Ports

Strategy

Mio API changes

retep998 commented Apr 20, 2015

carllerche commented Apr 21, 2015

jnicholls commented Jul 18, 2015

jnicholls commented Jul 18, 2015

retep998 commented Jul 18, 2015

jnicholls commented Jul 18, 2015

dpc commented Jul 19, 2015

jnicholls commented Jul 19, 2015

dpc commented Jul 20, 2015

jnicholls commented Jul 20, 2015

carllerche commented Jul 20, 2015

jnicholls commented Jul 20, 2015

carllerche commented Jul 20, 2015

verysimplenick commented Aug 12, 2015

Diggsey commented Aug 17, 2015

jnicholls commented Aug 17, 2015

carllerche commented Aug 17, 2015

jnicholls commented Aug 17, 2015

retep998 commented Aug 17, 2015

carllerche commented Aug 17, 2015

jnicholls commented Aug 17, 2015

Diggsey commented Aug 18, 2015

retep998 commented Aug 18, 2015

Diggsey commented Aug 18, 2015

alexcrichton commented Aug 25, 2015

alexcrichton commented Aug 25, 2015

carllerche commented Nov 23, 2015