Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] have a go at making goose async #8

Closed
wants to merge 3 commits into from

Conversation

alsuren
Copy link
Contributor

@alsuren alsuren commented May 12, 2020

Hi. I read your blog post and found it really interesting. I wondered to myself what kind of speed-boost you would get from making things async. I expect that if you're already CPU-bound on most of your interesting workloads then you might not see any difference, but I put this together anyway.

It's a bit of a shitshow because of lifetimes, and the
fact that async functions can't be used as function
pointers (because the return value is not sized
predictably in a dynamic context).

There's quite a lot of boxing and dynamic dispatch going on. I think that's fine though, because at least it will keep compile times down.

This thread was really hepful to me. I have stolen the macro from that:
https://users.rust-lang.org/t/how-to-store-async-function-pointer/38343/4

All that's left to do is:

  • Fix the doctests
  • Actually try out the examples and see if they are
    still working/performant. (I haven't tried to replicate the test setup from your blog post)
  • Go hunting for places where explicit threads are used,
    which could be turned into tasks.
  • Think a bit more about ergonomics. I wonder whether rust should have some kind of dyn async fn syntax that will give you a fn that can be stored as a function pointer. In the meantime, it might be possible craft our own attribute macro that is a little better suited to our needs.

At the end of the day, if the speed improvements here are outweighed by the ergonomic problems, then it's probably best to close the PR.

It's a bit of a shitshow because of lifetimes, and the
fact that async functions can't be used as function
pointers (because the return value is not sized
predictably in a dynamic context).

This thread was really hepful to me:
https://users.rust-lang.org/t/how-to-store-async-function-pointer/38343/4

All that's left to do is:
* Fix the doctests
* Actually try out the examples and see if they are
  still working/performant.
* Go hunting for places where explicit threads are used
  which could be turned into tasks.
@jeremyandrews
Copy link
Member

Thanks for the contribution! I appreciate the effort you put into this!

I'm new to async in rust, so a proper review is going to take me a while.

I did check out your branch on my dev server and try it out, unfortunately it's not making any HTTP requests. It compiles and runs, but there's no traffic hitting the web server, and when it completes Goose shows no statistics activity. This suggests we're not actually getting to [goose_send](https://github.com/alsuren/goose/blob/async/src/goose.rs#L1008).

I hope to find time to dig into this more soon, and certainly welcome any further work in this direction! As you say, I'm curious to test and compare performance, to better understand what gains are possible.

Note, you can test this against any web server to validate basic functionality, worst case you get a bunch of 404's. To confirm requests are actually being made, try something like this:

cargo run --example simple -- --host http://127.0.0.1 -v -t10 --print-stats

@jeremyandrews
Copy link
Member

Tracing through the code, I see we get as far as invoking the task function, but it never actually runs. So perhaps a change is needed here?
https://github.com/alsuren/goose/blob/async/src/client.rs#L69

@alsuren
Copy link
Contributor Author

alsuren commented May 13, 2020

Good spot. Sorry about that. In my defence, it was midnight at that point, and "I'll just spend an hour mechanically translating this to be async while I watch my housemates play computer games" had already turned into 4 hours of trying to fix that lifetime issue.

(I think I was expecting function's return type to be auto-annotated with must_use or something. I suppose we'd have to do that manually if we thought it was likely to cause regressions).

I suspect that there will be a bunch more places where the thing gets wedged. The fix is pretty mechanical:

  • .await on all futures
  • if you're not already in an async fn then add async to the top of your function
  • walk up the function-call chain and repeat the process

If we can prove that there is a worthwhile performance gain then I will fix the doctests over the weekend.

@jeremyandrews
Copy link
Member

Excellent, this got GET requests working. But POST requests still aren't working. Possibly something more needs to happen to goose_post?
https://github.com/alsuren/goose/blob/async/src/goose.rs#L885

For example, this task is not properly executing:
https://github.com/alsuren/goose/blob/async/examples/simple.rs#L48

@jeremyandrews
Copy link
Member

I agree with starting with some performance tests to see if this is worth the effort. Assuming it is, my preference would be to support both async and blocking task functions, not just either or.

@alsuren
Copy link
Contributor Author

alsuren commented May 13, 2020

my preference would be to support both async and blocking task functions, not just either or.

This will need a bit of investigation. When I was developing, I noticed that reqwest::blocking throws a fit if you try to use it inside a task spawned by tokio::runtime::Runtime::block_on() or tokio::spawn(). There might be a way to wrap it (spawn_blocking() or something?) so that it plays nice.

  • If so, we could make GooseTask::function into an enum, and make a GooseTask::new() and GooseTask::new_async() and only do the wrapping dance only for sync functions.
  • If not then we'd need to make GooseState::execute() which takes a GooseTaskSet and a completely separate GooseState::execute_async() that takes a GooseAsyncTaskSet or something, all the way down, which doesn't sound very fun.

@jeremyandrews
Copy link
Member

Sorry, I meant Goose should support both, but it would be fine if we have to have to choose either/or for a given load test. Trying to combine them both in the same load test doesn't seem immediately useful to me.

@jeremyandrews
Copy link
Member

A large commit just landed (adding Gaggles, distributed load testing) -- I can help rebase this code once we get it working, no need to chase HEAD atm. Currently this is still blocked as all POST tasks are not running. Once fixed, I'm eager to do some load testing comparisons between the blocking and the async versions.

@jeremyandrews
Copy link
Member

jeremyandrews commented May 16, 2020

I got a chance to review this closer. There's not a problem with the POST functions, instead there were two missed .async calls for on_start and on_stop tasks -- this was preventing the authenticated load-test from logging in, which in-turn was preventing it from posting comments. Fixed locally, going to collect some comparison numbers now.

For reference, the two lines I had to edit:

@jeremyandrews
Copy link
Member

I'm seeing ~25Mbit/sec of traffic with the Async code, versus ~35Mbit/sec with the Blocking code. Any thoughts as to why Async is generating so much less traffic? Any thoughts on how we could optimize it?

async-vs-blocking

Even with slower performance, I still want to add async support to Goose. I'm going to see about how to do so in an optional way.

@jeremyandrews jeremyandrews mentioned this pull request May 16, 2020
@jeremyandrews
Copy link
Member

@alsuren I rebased your PR with current head here: #13 . If you're interested in continuing to work on this, you can rebase your PR on this one to easily pull in these changes. I'd appreciate any help 1) figuring out how to cleanly support async without losing blocking (there's no need to mix both in the same load test), and 2) figuring out if we can optimize async load tests.

@jeremyandrews
Copy link
Member

Doing a little more testing, I'm happy to have tracked this down. In short: async is slow when running with a non-optimized build with debugging information, whereas blocking is quite a bit faster.

However, running both with an optimized build w/o debugging information, async is much faster:

async-vs-block-2

@alsuren
Copy link
Contributor Author

alsuren commented May 16, 2020

I think I enabled the option for you to force push to this branch if you want. Sorry for dumping the sketch PR on you and then wandering off. Congratulations on getting it working.

  1. I had some thoughts about using macros for your public API/callbacks, and a feature flag to make the macros generate thing_blocking(arg) or thing_async(arg).await. This sounds kind-of mental though. I think that the maintenance burden of maintaining the two approaches will be huge, and I would only do it if I had a large user base already using each approach.

  2. The first thing I would try when debugging performance is run it under a sampling profiler and get a flame graph out. There is a tool called cargo-flame that should make this easy. Once you have the graphs you will be in a better position to make things better.

There are a bunch of synchronization primatives, like locks and channels, that have async versions. If your graphs say that you're spending all of your time blocking on those then you can go around and fix them.

I don't remember what the default tokio executor does in terms of threads per core. You might be able to tune this? (I expect that the default is already tuned for our workload though)

@alsuren
Copy link
Contributor Author

alsuren commented May 16, 2020

... oh. I suppose that's not hugely surprising. What is the network card on that box? 1Gbps or 10Gbps? What is the unit on the graph? MBps or Mbps? (By which I mean: are you close to maxing out the line speed on your network?)

@jeremyandrews
Copy link
Member

It is not letting me force push to your repo.

The graphs are Mbit/sec and the interface is 1G, so we're not hitting wire speed yet. But this is a fantastic improvement. (At this point the web server is the bottleneck.)

My main interest in continuing to support both blocking and async, is writing blocking load tests is simpler, making the tool more accessible to people. I did consider a cargo feature, but I'm thinking we can do it cleanly without. I'll be exploring this more over the next few days.

@alsuren
Copy link
Contributor Author

alsuren commented May 16, 2020

It is not letting me force push to your repo.

Hrm. I guess that's not a thing. Not sure what made me think it was. Probably best to use your branch as the main feature branch and I will make PRs against that if I have any interesting ideas. That way I won't slow down development too much (I'm a bit low on energy at the moment, and the initial hacking session for this PR has wiped me out). [Edit: I have subscribed to that PR, so feel free to close this one whenever you want]

I agree that the async version is a bit fiddly and complicated at the moment. I think that a bunch of that is because of the &mut lifetimes. I wonder if we could sidestep a bunch of that fiddliness by wrapping everything in Arc<Mutex<>> or somesuch, or having a global tracking API and passing around IDs of the things we're interested in rather than &mut references. I'm assuming that the structure of context objects in your API is intended to mirror locust, but I've only touched locust very briefly, years ago, so I'm not sure how far we can bend things without confusing people.

The other thing that makes things fiddly is that the return type of async callbacks must remain generic (impl Future rather than Box. This is something that is easy to solve without macros though).

It is also worth looking into how tower/warp do things. This is on my to-do list of things to learn anyway. Maybe there are some interesting ideas there.

@jeremyandrews
Copy link
Member

Internal structures can change. I’m not worried about the code powering Goose looking like Locust. I just want the writing of load tests to be as simple as possible. Ultimately I’m hoping to simplify it further with macros.

If we can give the writing of async load tests the same or simpler ergonomics than what currently exists, then there’s no need to support both blocking and async.

I’m studying async now and will continue to pursue this in my PR. I’ve started splitting the blocking versus async code with a feature flag, but as you noted this is adding a maintenance overhead.

@jeremyandrews
Copy link
Member

Closing in favor of #13 . Will continue the conversation there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants