Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Windows #155

Closed
carllerche opened this issue Apr 20, 2015 · 27 comments
Closed

Support Windows #155

carllerche opened this issue Apr 20, 2015 · 27 comments
Labels
windows Related to the Windows OS.

Comments

@carllerche
Copy link
Member

Overview

Currently, Mio currently only supports Linux and Darwin platforms (though *BSD support could happen relatively easily). It uses epoll and kqueue respectively to provide a readiness API to consumers. Windows offers a completion based API (completion ports) which is significantly different from epoll & kqueue. The goal would be to tweak Mio in order to support Windows while still maintaining low overhead that mio strives for across all platforms.

History

I have wavered a bunch on the topic of how to best support Windows. At first, I had originally planned to do whatever was needed to support windows even if the implementation was less than ideal. Then, started towards not supporting windows with Mio and instead provide a standalone IO library that supported windows only. I started investigating the IOCP APIs in more depth and thinking about what a windows library would look like and it was very similar to what mio already is.

Completion Ports

There are a number of details related to using completion ports, but what matters is that instead of being notified when an operation (read, write, accept, ...) is ready to be performed and then performing the operation, an operation is submitted and then completion is signaled by reading from a queue.

For example, when reading, a byte buffer is provided to the operating system. The operating system then takes ownership of the buffer until the operation completes. When the operation completes, the application is notified by reading off of the completion status queue

Strategy

The strategy would be to, on windows, internally manage a pool of buffers. When a socket is registered with the event loop with readable interest, a read a system read would be initiated supplying an available buffer. When the read completes, the internal buffer is now full. The the event loop would notify readiness and the user will then be able to read from the socket. The read would copy data from the internal buffer to the user's buffer and the read would be complete.

On write, the user's data would be copied to a an internal buffer immediately and then the internal buffer submitted to the OS for the system write call.

Mio API changes

In order to implement the above strategy, Mio would not be able to rely on IO types from std::net anymore. As such, I propose to bring back TcpStream and TcpListener implemented in mio::net. Since Mio will then own all IO types, there will be no more need to have the NonBlock wrapper. Also, it seems thatNonBlock` can be confusing (see #154). So, all IO types in mio will always be blocking.

I believe that this will be the only required API change.

@retep998
Copy link

So effectively Mio would read in the background whenever the buffers for a handle are empty? Would it also read when the buffers are only partially empty? Would there be a cutoff level for when it reads more?

@carllerche
Copy link
Member Author

@retep998 I think there will be some experimentation around that and possibly some tunable params. I was thinking of defaulting to 4kb per socket. If oneshot is requested, then only one read will be queued. Otherwise, it will attempt to keep the buffer full. There probably should be a heuristic as to when to start a new read if the buffer was only partially consumed. For example, if the 4kb buffer is full and the user reads 1 byte, it probably doesn't make sense to attempt another system read of 1 byte. Maybe a good default would be 1KB? So, once the internal buffer has 1KB available, Mio attempts to fill it.

carllerche added a commit that referenced this issue May 8, 2015
Also gets rid of NonBlock. This is in preparation for windows support as
described in #155.

tl;dr: Windows support requires mio to have ownership of the tcp & udp types
so the std versions cannot be used.
@jnicholls
Copy link

The idea of internal buffer management is a good one for sticking with the current Stream API. But the data copies would make Windows performance second-class unnecessarily. In truth, IOCP and RIO are actually superior models to epoll/kqueue that reduce CPU/syscalls. It would be a shame to have to do a 1-copy of the data.

I think it would be wise to keep the (Try)Read interface on top of the Windows streams, but also offer them access directly to the internal buffer and take ownership of it if desired. When they take ownership, Mio will allocate a replacement. I'd like to think that is a far more efficient model. It does diverge the API surface but that's where we are.

The only way to keep the API surface consistent while not sacrificing performance is to emulate the IOCP model on top of epoll/kqueue, which is easy to do. The reverse is not. E.g. the Handler would send buffers of data received or return buffers of data to write, rather than notifications of readiness.

I personally am not concerned about Windows support, but many others are. I think this will be an important decision for mio. If we go with the copy model, I would expect there to be a fork of mio or a second project spun up to compete.

@jnicholls
Copy link

Some more research may want to go into Windows RIO (Registered IO) and think about explicitly targeting that instead of overlapped IO. RIO supports different notification mechanisms (polled, evented, and IOCP) and thus it might fit more nicely into the current Mio readiness model. If that is the case, I honestly don't see a problem with only supporting Windows 8+. It is the future, and this is a green field lib :)

@retep998
Copy link

Perhaps a feature flag to distinguish between using an older more compatible but less efficient IOCP model, or using the fancier RIO model?

@jnicholls
Copy link

Yeah definitely, if RIO ends up making sense to support. I'll dig in and get back with some details.

@dpc
Copy link
Contributor

dpc commented Jul 19, 2015

It seems to me having Windows support (even performance-impaired) is a good thing, so that any software written on top of mio (even the one that is not really performance critical) is portable. So the plan looks OK for stuff I'm interested in.

Give the fundamental differences between asynchronous I/O handling models, it might be a job for higher level libraries and software to switch between different mio-like libraries for different platforms. Unless of course someone figures out a really neat unified model.

@jnicholls
Copy link

Yeah that's fair. I do believe it will inspire another Rust I/O library that has the unified model. If Mip maintainers are okay with that, then this plan is the easiest path forward.

There is a unified model; the question is whether Mio wants to break its current model, or not.

@dpc
Copy link
Contributor

dpc commented Jul 20, 2015

Is there really an unified model? It seems to me that it's not really possible, and eg. RIO requires registering memory beforhand which does not seem to work well with Unix-like IO.

@jnicholls
Copy link

RIO wouldn't be a part of a unified model. RIO is an optimization over traditional overlapped IO where the buffers are registered ahead of time, like you said. This reduces syscalls & allocs, and thus CPU.

A unified model wouldn't need RIO. IOCP is fine, and it's already a superior model over epoll/kqueue as it stands (zero-copy, fewer syscalls, etc.). Given that, unified model absolutely exists: the IOCP model. Emulating IOCP on top of epoll/kqueue is absolutely doable; the reverse is not. Said plainly, the model is asynchronous I/O: the caller invokes an IO action like read/write, we take ownership of the buffer and perform the action, and notify the caller when we've completed the action. This is exactly what libuv does to abstract async IO over IOCP, epoll and kqueue.

Doing the above would be a big change for Mio. It would require state management of buffers, fds, requests, and callbacks (unless a single trait-based handler is sufficient for the notifications). Mio is currently a minimal abstraction with zero-allocations and no state management. Windows support will change that one way or another.

@carllerche
Copy link
Member Author

I don't know anything about RIO.

MIO's goal is to be a zero cost abstraction over epoll and as lightweight as possible normalization of epoll's semantics to other platforms. So, MIO is going to need to have internal buffer management for the windows implementation. That being said, I believe that the windows implementation should end up being very close to (if not on par with) any "higher level" abstraction like libuv.

On top of that, it is entirely possible to expose non portable, windows specific APIs in a windows specific module in mio that would be a zero cost abstraction on top of IOCP (or something close).

The problem, though, is that I am not a skilled windows dev and I am figuring out IOCP as I go...

@jnicholls
Copy link

On top of that, it is entirely possible to expose non portable, windows specific APIs in a windows specific module in mio that would be a zero cost abstraction on top of IOCP (or something close).

The current plan would then be sufficient if the user is allowed to take ownership of the internal buffer that Mio used to perform the overlapped IO request. It would then let the caller choose whether to take the optimization (take ownership of the buffer used) or leave it and maintain compatible code (read/copy from the Mio owned buffer to the caller's buffer). Though I don't write much Windows code now-a-days and thus I have little dog in this fight, I'd say that's a good compromise. There's still a bunch of Windows-specific things that will have to go into that API anyways, e.g. buffer sizes and thresholds. That's why the libuv model is so simple: the user initiates the action and provides the buffers for said action, which is 1:1 with IOCP and easily emulated on top of the unix non-blocking APIs.

@carllerche
Copy link
Member Author

I think I would also consider a portable API that allows you to take ownership of the buffer as well, but that will be in the future after initial windows support.

I specifically did not pick the libuv strategy for the reason you described. It's closer to the windows model and requires overhead to emulate on *nix systems. Mio is going to favor epoll & *nix platforms primarily for the cross platform APIs.

@verysimplenick
Copy link

@carllerche really need windows support, maybe ugly implementation but needed.

@Diggsey
Copy link

Diggsey commented Aug 17, 2015

I don't believe windows support is possible with the mio model: not all reads have the same behaviour, so until a read is actually performed, mio can't predict it.

For example, if you wait for data on a TCP socket, and then read 1 byte, everything's fine on linux. On windows, that would only work if mio had predicted that you only wanted 1 byte, otherwise it could have started a much larger read, which never completes (as only 1 byte is ever sent).

Even worse, reads are not the only possible operation: the user may instead decide to close the socket, or do some completely different operation. Predicting a read in that situation would result in visibly different behaviour.

I think a better approach is to leave mio as-is, and implement a higher level library on top, which presents the IOCP model, using mio on linux and native IOCP (/RIO if available) on windows.

To be honest, I think @carllerche is somewhat overstating the performance overhead of implementing IOCP on top of a readiness model: while it's true that you have to allocate buffers up-front, given a decently fast allocator (such as jemalloc), the ability to pool buffers if necessary, the fact that the OS need not actually commit physical memory to the buffer until data is actually written into it, and also the fact that received data has to be stored somewhere - in the readiness model it just happens to be in buffers owned by the kernel instead of the application, narrow the margin to practically nothing.

That's not to say I don't think mio has a place - just that it may not be ideal for using directly from application code (apart from anything else, the IOCP model is much easier for people to get to grips with).

@jnicholls
Copy link

I agree.

On Sunday, August 16, 2015, Diggory Blake notifications@github.com wrote:

I don't believe windows support is possible with the mio model: not all
reads have the same behaviour, so until a read is actually performed, mio
can't predict it.

For example, if you wait for data on a TCP socket, and then read 1 byte,
everything's fine on linux. On windows, that would only work if mio had
predicted that you only wanted 1 byte, otherwise it could have started a
much larger read, which never completes (as only 1 byte is ever sent).

Even worse, reads are not the only possible operation: the user may
instead decide to close the socket, or do some completely different
operation. Predicting a read in that situation would result in visibly
different behaviour.

I think a better approach is to leave mio as-is, and implement a higher
level library on top, which presents the IOCP model, using mio on linux and
native IOCP (/RIO if available) on windows.

To be honest, I think @carllerche https://github.com/carllerche is
somewhat overstating the performance overhead of implementing IOCP on top
of a readiness model: while it's true that you have to allocate buffers
up-front, given a decently fast allocator (such as jemalloc), the ability
to pool buffers if necessary, the fact that the OS need not actually commit
physical memory to the buffer until data is actually written into it, and
also the fact that received data has to be stored somewhere - in the
readiness model it just happens to be in buffers owned by the kernel
instead of the application, narrow the margin to practically nothing.

That's not to say I don't think mio has a place - just that it may not be
ideal for using directly from application code (apart from anything else,
the IOCP model is much easier for people to get to grips with).


Reply to this email directly or view it on GitHub
#155 (comment).

Sent from Gmail Mobile

@carllerche
Copy link
Member Author

@Diggsey

I think you are probably overstating the performance overhead of implementing the readiness model on top of IOCP 😉 My take is that 90% of the time, the overhead of a completion / readiness port does not matter as the largest overhead will be from code using Mio.

However, I also care about that last 10%. The largest group of users that care about that last 10% ship on Linux vs. Windows, it makes sense to optimize for them.

Regarding your specific example w/ a 1byte read, I'm not sure I follow what you see the problem as.

Basically, with Linux, the kernel manages a set of staging buffers to hold data once they come from the socket before the user reads. On windows, the event loop will manage this set of staging buffers. I have been doing a lot of experiments, the overhead will be minimal, maybe slightly larger than implementing completion on Linux.

Finally, after initial Windows support, the goal will be to add further windows specific APIs to try to reduce cost at the expense of a little bit of non portable code.

So, in short, since the cost of implementing readiness on windows will be, at worse, slightly larger than what it would cost to implement completion on linux. The win of providing a readiness model on linux, the most common server platform, is significant.

@jnicholls
Copy link

On Mon, Aug 17, 2015 at 1:23 PM, Carl Lerche notifications@github.com
wrote:

@Diggsey https://github.com/Diggsey

I think you are probably overstating the performance overhead of
implementing the readiness model on top of IOCP [image: 😉] My take
is that 90% of the time, the overhead of a completion / readiness port does
not matter as the largest overhead will be from code using Mio.

However, I also care about that last 10%. The largest group of users that
care about that last 10% ship on Linux vs. Windows, it makes sense to
optimize for them.

Regarding your specific example w/ a 1byte read, I'm not sure I follow
what you see the problem as.

Basically, with Linux, the kernel manages a set of staging buffers to hold
data once they come from the socket before the user reads. On windows, the
event loop will manage this set of staging buffers. I have been doing a lot
of experiments, the overhead will be minimal, maybe slightly larger
than implementing completion on Linux.

I think the issue is starvation. If you send a buffer to read 64KiB, IOCP
will not notify the operation is complete until 64KiB has been read in.
During that time, the application will not be able to read smaller chunks
that it may desire as soon as possible. Unless there is a means to timeout
the I/O completion sooner, and get partial data?

Finally, after initial Windows support, the goal will be to add further
windows specific APIs to try to reduce cost at the expense of a little bit
of non portable code.

So, in short, since the cost of implementing readiness on windows will be,
at worse, slightly larger than what it would cost to implement completion
on linux. The win of providing a readiness model on linux, the most common
server platform, is significant.


Reply to this email directly or view it on GitHub
#155 (comment).

@retep998
Copy link

@jnicholls What are you talking about? When you do a read using something like WSARecv with IOCP, it will fire a completion notification as soon as any data is read at all. It will not wait until the full 64 KiB has been read.

@carllerche
Copy link
Member Author

@jnicholls That has not been my experience during my IOCP experiments. I would even say that such behavior would make writing network applications impossible. There are many times with network protocols that the number of bytes to read is unknown.

@jnicholls
Copy link

Just testing you guys...just testing...

On Mon, Aug 17, 2015 at 2:35 PM, Carl Lerche notifications@github.com
wrote:

@jnicholls https://github.com/jnicholls That has not been my experience
during my IOCP experiments. I would even say that such behavior would make
writing network applications impossible. There are many times with network
protocols that the number of bytes to read is unknown.


Reply to this email directly or view it on GitHub
#155 (comment).

@Diggsey
Copy link

Diggsey commented Aug 18, 2015

My mistake, I didn't realise that WSARecv did not require you to specify the number of bytes to read up-front.

However, my second point still stands: multiple operations are possible on a socket, and starting an operation, then waiting for the result may cause visibly different behaviour from waiting for a notification, and then being able to decide whether or not to even perform the operation. Pre-emptively initiating every possible operation seems like it will quickly get out of hand.

I think you are probably overstating the performance overhead of implementing the readiness model on top of IOCP

I'll believe it when I see it... Maybe you'll be able to support non-blocking reads with not unreasonable overhead, but I'm not convinced about things like accept.

Why are you trying to emulate readiness via IOCP anyway? There are several methods which look like they provide essentially the same readiness-style API as exists on linux: select, WSAEventSelect, WSAEnumNetworkEvents?

@retep998
Copy link

@Diggsey Because those techniques are not designed to scale up to tens of thousands of sockets the way IOCP is.

@Diggsey
Copy link

Diggsey commented Aug 18, 2015

@retep998 Oh? This technique seems very scalable:

  • Create a hidden window for its message queue (this is not uncommon on windows)
  • For each created socket, call WSAAsyncSelect with the hidden window

You can then read and react to messages posted to the window. lparam/wparam tell you the socket and the type of event.

@alexcrichton
Copy link
Contributor

cc #239, an initial stab at Windows TCP/UDP, but with much room to expand!

alexcrichton pushed a commit to alexcrichton/mio that referenced this issue Aug 25, 2015
These commits add preliminary support for the TCP/UDP API of mio, built on top
of IOCP using some raw Rust bindings plus some networking extensions as the
foundational support. This support is definitely still experimental as there are
likely to be a number of bugs and kinks to work out.

I haven't yet done much benchmarking as there are still a number of places I
would like to improve the implementation in terms of performance. I've also been
focusing on getting "hello world" and the in-tree tests working ASAP to start
getting some broader usage and feedback. High level docs are available in the
src/sys/windows/mod.rs file and the TCP/UDP implementations are quite similar in
terms of how they're implemented.

Not many new tests were added, but all tests (other than those using unix
sockets) are passing on Windows and an appveyor.yml file was also added to
enable AppVeyor CI support to ensure this doesn't regress.

cc tokio-rs#155
alexcrichton pushed a commit to alexcrichton/mio that referenced this issue Aug 25, 2015
These commits add preliminary support for the TCP/UDP API of mio, built on top
of IOCP using some raw Rust bindings plus some networking extensions as the
foundational support. This support is definitely still experimental as there are
likely to be a number of bugs and kinks to work out.

I haven't yet done much benchmarking as there are still a number of places I
would like to improve the implementation in terms of performance. I've also been
focusing on getting "hello world" and the in-tree tests working ASAP to start
getting some broader usage and feedback. High level docs are available in the
src/sys/windows/mod.rs file and the TCP/UDP implementations are quite similar in
terms of how they're implemented.

Not many new tests were added, but all tests (other than those using unix
sockets) are passing on Windows and an appveyor.yml file was also added to
enable AppVeyor CI support to ensure this doesn't regress.

cc tokio-rs#155
alexcrichton added a commit to alexcrichton/mio that referenced this issue Aug 25, 2015
These commits add preliminary support for the TCP/UDP API of mio, built on top
of IOCP using some raw Rust bindings plus some networking extensions as the
foundational support. This support is definitely still experimental as there are
likely to be a number of bugs and kinks to work out.

I haven't yet done much benchmarking as there are still a number of places I
would like to improve the implementation in terms of performance. I've also been
focusing on getting "hello world" and the in-tree tests working ASAP to start
getting some broader usage and feedback. High level docs are available in the
src/sys/windows/mod.rs file and the TCP/UDP implementations are quite similar in
terms of how they're implemented.

Not many new tests were added, but all tests (other than those using unix
sockets) are passing on Windows and an appveyor.yml file was also added to
enable AppVeyor CI support to ensure this doesn't regress.

cc tokio-rs#155
@carllerche carllerche added the windows Related to the Windows OS. label Aug 25, 2015
@alexcrichton
Copy link
Contributor

cc #246, #245, #244, #243, #242, and #241

@carllerche
Copy link
Member Author

Closing this as the bulk of the initial work has landed. Further work will happen as individual issues / PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
windows Related to the Windows OS.
Projects
None yet
Development

No branches or pull requests

7 participants