Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tokio_core::reactor::current() #84

Closed
wants to merge 1 commit into from

Conversation

rushmorem
Copy link

Fixes #79.

@alexcrichton
Copy link
Contributor

Thanks for the PR! Correct me if I'm wrong though, @carllerche, but were you thinking of this being a global function rather than a function of Remote?

@carllerche
Copy link
Member

Yeah, i was thinking a global fn: tokio_core::reactor::current() or something like that.

This would be to avoid having to thread the handle through code

@rushmorem rushmorem force-pushed the handle branch 2 times, most recently from 5642f53 to 6d86b0d Compare October 26, 2016 22:26
@rushmorem rushmorem changed the title Add current_reactor() method on Remote Add tokio_core::reactor::current() Oct 26, 2016
@rushmorem
Copy link
Author

OK then. I have made it a global function. Note however, that the return type is Option<Handle> rather than Option<&Handle> as proposed by @carllerche. I couldn't figure out how to return a borrowed reference from within a closure. I will be happy to change this if you show me how to do it.

rushmorem added a commit to rushmorem/tokio-core that referenced this pull request Oct 31, 2016
Implements `std::os::unix::io::FromRawFd` for `net::TcpStreamNew`.
This should make it easier to utilise futures on existing connections.

This pull request depends on
tokio-rs#84 so it should be merged
first.
rushmorem added a commit to rushmorem/tokio-core that referenced this pull request Oct 31, 2016
Implements `std::os::unix::io::FromRawFd` for `net::TcpStreamNew`.
This should make it easier to utilise futures on existing connections.

This pull request builds on top of tokio-rs#84
so if both are desirable you can just merge this one and close
tokio-rs#84.
@rushmorem rushmorem changed the title Add tokio_core::reactor::current() WIP: Add tokio_core::reactor::current() Oct 31, 2016
@rushmorem
Copy link
Author

I'm obviously missing something here. This method never seems to return the handle. I expect the following to work:-

let mut l = Core::new().unwrap();
let handle = reactor::current().unwrap();

Seems like this method is always returning None.

@alexcrichton
Copy link
Contributor

Yes this is one of my worries about this method, it only works if you're currently in the reactor itself.

I'm also very worried about enabling functionality like #87 as it seems like it's very difficult to handle the None case and it's rarely expected, causing confusion when it actually arises.

I'd personally prefer to avoid adding this method for now and see how far we can get passing handles around. In general we've avoided in the standard library and Rust in general these sorts of global state functions unless they're infallible, passing around contexts and such, and it's worked out so far. This may be a niche though where it breaks down perhaps?

@rushmorem
Copy link
Author

I agree. Having it as a global function is probably too fragile. I think a nice compromise though would be to implement it on a Remote like in the initial implementation. That way we can avoid the trouble of having to run in a closure and use a oneshot or channel to communicate with the outside world. This is what @carllerche and I essentially want to do. See #75 (comment). In @carllerche's example in that comment, we are only calling Remote::spawn because we want access to a handle.

Having a Remote::handle that returns a Handle will simplify things a lot and is consistent with the rest of the library. I mean, we have Handle::remote and Core::remote methods that convert to a Remote that we can pass through threads. Having a Remote::handle method that can give us a handle we can work with in the current thread would be rather nice, don't you think?

While I think the current implementation in #87 is broken because of it's reliance on this pull request which is also currently broken, I think it's still important and relatively safe to enable that functionality. See #87 (comment). Also note that both std::net::TcpStream and mio::tcp::TcpStream have it.

@rushmorem
Copy link
Author

Alternatively, if possible I think it will even be nicer if we can use Handle and Remote interchangeably. I mean they are both handles right? How about making them implement the same trait and require that trait instead of the concrete Handle everywhere. What do you think about this?

@alexcrichton
Copy link
Contributor

Unfortunately we can't make Handle and Remote interchangeable because of their difference in Send vs not. I do like the idea, though, of Remote::handle (perhaps try_handle) to optimistically attempt to upgrade to a handle if it's possible. That is, it's a performance optimization, but if it returns None you have the recourse of calling spawn to run on the event loop.

@aajtodd
Copy link

aajtodd commented Nov 3, 2016

Generally there is one event loop per thread (if more than one at all). I believe Wangle uses thread local storage and a singleton approach.

Or rather folly does to wrap libevent which Wangle is built from:
https://github.com/facebook/folly/blob/master/folly/io/async/EventBaseManager.h#L54
https://github.com/facebook/folly/blob/master/folly/io/async/EventBaseManager.cpp#L23

Just throwing out ideas. Not going to argue about the merits or downfalls of singletons :)

It would be nice to have access to the current thread's reactor but obviously plumbing it through everywhere works too just less ergonomic.

@alexcrichton
Copy link
Contributor

@aajtodd it's true yeah, and tokio-core definitely has the architecture that you'd want one event loop per thread if you wanted multiple threads doing I/O work. We don't currently have a "fetch the current thread's event loop" function as global state is generally frowned on in Rust, but we could certainly add one! (and probably will at some point)

@rushmorem want to switch this for now to an optional Remote -> Handle upgrade?

@tailhook
Copy link
Contributor

Well, I'd like to support this pull request. I've written a lot of code that passes handle today, and I have to admit it looks useless :)

Recently python has fixed similar issue. And I think we should fix it too, for two reasons:

  1. It makes API ugly, you need to pass handle to Timeout::new(dur, handle), which is an example of ugly API and also if you have timeouts anywhere you need to accept handle too, just to propagate it to a Timeout. Also these kind of hacks are especially ugly.

  2. They occupy memory. A couple of copies of handle in each future can add up when you have a million of connections.

So basically I have two options:

  1. Publish this function as a separate crate, just to see how useful it might be
  2. Benchmark a real-world application and show how much memory is used by handles, to back up my point (2) above.

While they are not mutually exclusive, more code I'm writing more tedious it becomes to write. And also it's not easy to actually keep two codebases (with and without handle) just for the benchmarks.

So what are the downsides of exposing current loop in a thread local? And will things above be helpful in decision?


If talking about the API: I think everybody will use the function this way:

current().expect("running on tokio-core loop")

So I think providing function that panics is good idea to avoid different spelling of this very same message (especially an unwrap variant).

So I propose the following signatures

fn current() -> Handle;
fn is_running() -> bool;
/// And the helper to avoid panic of `current()` and:
///   thread 'foo' panicked at 'no Task is currently running'
fn run<F>(f: F) -> F::Output {
  let mut lp = Core::new().expect("create loop");
  lp.run(futures::lazy(f))
}

@ishitatsuyuki
Copy link

I agree with @tailhook. It's good time to improve the API. Passing the Handle around is definitely too expensive, thus they should be avoided.

@alexcrichton
Copy link
Contributor

I personally feel like an API as proposed in this PR and #79 does not belong in this crate. I've always felt that if there's a notion of a "thread local event loop" then all APIs would want to reflect such a design decision. For example if we were to add such a function then we would also likely want to add mirroring APIs for TCP, UDP, etc.

I feel as if patterns such as this belong in an external crate, if at all. If an external crate doesn't have the tools it needs to manufacture such a pattern then we can try to update the APIs here, but short of that I feel like it's conflating paradigms to support both "thread local" and non-thread-local operation in one crate.

@tailhook you mentioned that memory consumption is a concern, and a PR such as this I believe would not solve that problem. All I/O objects embed a Remote, which is contributing to that memory bloat. To solve that I know of no other way but to basically rewrite the whole crate to assume nothing can cross threads and there's only ever one event loop.

@rushmorem
Copy link
Author

@rushmorem want to switch this for now to an optional Remote -> Handle upgrade?

I'm sorry for such a late response. I have been busy with other things. The fact that we can't always get the handle makes me less enthusiastic about this pull request. I don't think it will improve user experience at all if they have to fallback to calling Remote::spawn as usual. If anything, it worsens it since they will have to write more code. They might as well just call Remote::spawn and be done with it.

@tailhook
Copy link
Contributor

tailhook commented Jan 4, 2017

@tailhook you mentioned that memory consumption is a concern, and a PR such as this I believe would not solve that problem.

Just quick showcase. I'm tuning example from minihttp to use buffer_unordered(1_000_000). (Which is approved way to handle a lot of connections right?) It takes 807 MiB RSS from the start. Basically, it's a million futures preallocated in a vector in BufferUnordered. If I comment out handle field in the Buffered Codec (which only disables websockets for now), it renders 713 MiB RSS. I.e. 10% save on hello world example (actually same hello_world in master takes ~400 MiB, so something in my state machine became much larger, I might figure it out and fix it and it'll make savings percentage even larger).

I plan to find some time to test more real-life examples later.

All I/O objects embed a Remote, which is contributing to that memory bloat.

That's true. But the point is: you need a handle at every other layer of your implementation. I.e. TcpStream needs a handle, protocol needs a handle, codec needs a handle, timeout needs a handle and user's future needs another handle. I understand why handle is needed in io object and in timer itself, but I also need another duplicate of a handle to create a timeout object, io object, or anything

Another point: it's unclear why the object is so large? It needs to be 8 bytes (a size of an Arc or Weak), but it's different topic to explore elsewhere.

To solve that I know of no other way but to basically rewrite the whole crate to assume nothing can cross threads and there's only ever one event loop.

Well, I don't propose this way :)

I feel as if patterns such as this belong in an external crate, if at all

While it's fine as a showcase in an external crate it won't work in the long term for two reasons:

  1. No memory savings unless every library actually uses current() instead of passing handle
  2. There is no way to find actually running loop in external library (comparing to putting it to a global variable manually, which can fail sometimes)

@ishitatsuyuki
Copy link

The Handle has two purpose:

  1. Interacting with the Poll objects
  2. Pushing jobs to the queue

This PR directly makes the second easier. Instead of extracting from a closure capture, we can directly use the thread local to spawn a new task. The closure one is this: https://github.com/tokio-rs/tokio-core/blob/master/examples/echo.rs#L115

Any reason that this is marked as WIP? This is probably enough to merge, although the source branch is way behind.

@rushmorem
Copy link
Author

@ishitatsuyuki The problem with the current implementation is that it's not always guaranteed to return a handle. If you are not in the reactor it returns None.

@ishitatsuyuki
Copy link

@rushmorem It's OK to do that. If you feel it ugly, feel free to unwrap it internally. (I'm afraid some people want to handle it though)

@rushmorem
Copy link
Author

@ishitatsuyuki That's not the problem. The current behavior is not very intuitive. I would expect the following code to work but it always returns None:-

let mut l = Core::new().unwrap();
let handle = reactor::current().unwrap();

@ishitatsuyuki
Copy link

@rushmorem as the usecase I mentioned above, the main purpose is to provide a easy way to inject things into queue within a handler.

@carllerche
Copy link
Member

@alexcrichton That's true, if there was a reactor::current, the handle argument wouldn't make much sense... but that's something that sounds nice to me :)

@rushmorem
Copy link
Author

I agree with @carllerche. Losing the handle argument would make the API more consistent with the std lib, which is nice.

@alexcrichton
Copy link
Contributor

@tailhook benchmarking with 1M or so connections and looking at memory I've found to be not too useful, we can just do the math and see what the actual result would be. Also to be clear a combinator like buffer_unordered shouldn't allocate so much space up front, it should do so lazily.

Right now on a 64-bit system a Remote is 32 bytes and a Handle is 24 bytes. That means a millions remotes is 32MB and a million handles is 32MB.

I do agree that it's a problem if there's lots of shared components that are all tied to the same remote so they store them all separately. This can of course be optimized today by adding accessor methods (if they don't already exist) and everything that builds on a TcpStream just uses the same remote from the TcpStream itself.

I'm certainly not saying that this shouldn't be somewhere at all. There are definitely situations where the memory savings are crucial. My argument is that if this is the case then having any handles anywhere is unacceptable. In that case we're looking at basically an entirely separate library from tokio-core almost, and I'm not sure how feasible that is.


To be clear I think there's two issues at play here:

  1. A fundamental issue of how many handles are stored in memory.
  2. The ergonomics of passing handles around

For the first I'm not sure how we'd solve that in tokio-core itself. For the latter I would propose a separate crate and that such ergonomics don't belong in tokio-core itself.

@dwrensha
Copy link
Contributor

dwrensha commented Jan 6, 2017

Right now on a 64-bit system a Remote is 32 bytes and a Handle is 24 bytes. That means a millions remotes is 32MB and a million handles is 32MB.

Nitpick: it looks like you got this backwards:

    println!("Remote size = {}, Handle size = {}.",
             ::std::mem::size_of::<::tokio_core::reactor::Remote>(),
             ::std::mem::size_of::<::tokio_core::reactor::Handle>());

prints Remote size = 24, Handle size = 32..

@carllerche carllerche added this to the 0.2 release milestone Jan 6, 2017
@carllerche
Copy link
Member

Going to punt this to 0.2

@tailhook
Copy link
Contributor

tailhook commented Jan 6, 2017

Okay, I've put together tk-easyloop. While there is still not too much incentive to use it in libraries (because it will produce code that can't be used without starting loop by tk_easyloop:run()), let's see how appealing it will be.

@rushmorem rushmorem changed the title WIP: Add tokio_core::reactor::current() Add tokio_core::reactor::current() Aug 31, 2017
@rushmorem
Copy link
Author

rushmorem commented Aug 31, 2017

Async/await got me excited about this pull request again. I have rebased it and added more documentation. I have also made the function return io::Result<Handle> instead of an option. This way, if someone unwraps it and it panics, it will produce a better error message (which I hope addresses @tailhook's earlier concern while not resorting to an explicit panic).

I have been playing around with this both in combinators (see included example) and using async/await. I have to say that it feels much better than passing handles around.

That server() function in the included example using an async macro would start as follows:-

#[async]
fn server() -> Result<()> {
    // Get a handle to the current reactor
    let handle = reactor::current()?;
    // Bind the server's socket
    let addr = "127.0.0.1:12345".parse()?;
    let listener = TcpListener::bind(&addr, &handle)?;
    // ...
}

I will confess that I came to this issue earlier today with an intention to close it. However, before doing so, I decided to play around with it and see how it would work out. After seeing these results, I've got to say I'm really glad I decided to give it another shot.

If this is OK with you guys, I propose we merge this. What do you think?

@carllerche
Copy link
Member

We have been thinking a lot about this. We're working on an RFC that should cover such functionality. So, stay tuned :)

@rushmorem
Copy link
Author

This is now obsolete in the upcoming v0.2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants