Are there any benchmarks? #8

d4h0 · 2021-02-22T21:02:36Z

d4h0
Feb 22, 2021

Hi,

Stakker looks fascinating!

I'm wondering if you have planned to publish benchmarks that compare Stakker with other options.

I think, that could help with the marketing of the crate (if the results are good... 😄).

Also: Is my assumption correct, that basically everything has to be re-written for Stakker (HTTP servers/clients, etc.)? Or would it be easy to reuse crates that are written on top of "pure" Mio, for example?

Answered by uazu

Feb 22, 2021

Yes, I did some benchmarking but it isn't complete enough to publish yet. I tested doing a calculation that stresses both data and instruction cache and then passing the resulting value to the next actor, and so on around a ring of actors. The number of actors and the length of the data-part and instruction-part of the calculation could be varied. Also the number of rings running in parallel could be varied. I tested Stakker against a solution based on Rc<RefCell> and also actors running in separate threads connected with crossbeam channels. The conclusions so far are:

Stakker does very well, roughly comparable to calling another actor through Rc<RefCell> once you've added code to handle…

View full answer

uazu · 2021-02-22T21:52:41Z

uazu
Feb 22, 2021
Maintainer

Yes, I did some benchmarking but it isn't complete enough to publish yet. I tested doing a calculation that stresses both data and instruction cache and then passing the resulting value to the next actor, and so on around a ring of actors. The number of actors and the length of the data-part and instruction-part of the calculation could be varied. Also the number of rings running in parallel could be varied. I tested Stakker against a solution based on Rc<RefCell> and also actors running in separate threads connected with crossbeam channels. The conclusions so far are:

Stakker does very well, roughly comparable to calling another actor through Rc<RefCell> once you've added code to handle re-entrant calls to Rc<RefCell>-based actors.
Crossbeam+threads does not work well unless you saturate it with work, which is why I had to add a number of rings running in parallel to the tests. If for a moment there is no work, the kernel deschedules the thread and then there is a delay to get that thread running again.
For small units of work, less than a few hundred lines of generated assembler, single-thread Stakker easily wins against saturated multi-thread crossbeam. Bigger than that, crossbeam first draws even, and then starts to get ahead (so long as it stays saturated with enough work).

So this backs up what I claimed in the blog post. However trying to nail down theses figures more precisely gets difficult, as there is variation up/down with small changes to the code, I guess due to cache effects. I would need to do a lot of runs and plot them to see the trend. Also it would be better to also test against other channel implementations such as flume and the channel implementation inside Tokio.

Regarding interfacing with the existing ecosystem, there is no ready-made "glue" available as yet. If a crate was written to just do protocol and not talk directly to an I/O layer (i.e. not hardcoded to one solution) then interfacing to it should be relatively easy. (It's possible that some crates even if coded for Tokio for example may also provide another interface which just does protocol, which we could interface to.) If something is coded to talk directly to mio then that's probably just as unhelpful as it being hardcoded to talk to anything else, although it depends on the exact way they interface (maybe they could be adapted more easily).

It may be possible someday to support running external async/await code, i.e. act as an executor. Then maybe we can pull in some of the ecosystem that way. It needs investigating. Otherwise you can run an existing executor (Tokio/async_std/etc) in another thread and communicate to the Stakker thread with channels or something. Whether this is efficient enough depends on how central that aspect is to your application.

If you or anyone else has better ideas about how to (with relatively small effort) bring in some part of the ecosystem, then I'd be interested to hear.

0 replies

d4h0 · 2021-02-23T20:22:36Z

d4h0
Feb 23, 2021
Author

@uazu: Very interesting – thanks for the detailed answer!

Also it would be better to also test against other channel implementations such as flume and the channel implementation inside Tokio.

I'd be interested in benchmarks of an "end product". An HTTP/WebSocket client/server, for example.

If something is coded to talk directly to mio then that's probably just as unhelpful as it being hardcoded to talk to anything else, although it depends on the exact way they interface (maybe they could be adapted more easily).

Would tungstenite (a Websocket client/server crate) be enough decopeled for stakker?

They write:

It allows for both synchronous (like TcpStream) and asynchronous usage and is easy to integrate into any third-party event loops including MIO. The API design abstracts away all the internals of the WebSocket protocol but still makes them accessible for those who wants full control over the network.

I think, a list of crates that work well together with stakker would be useful. In case you stumble upon such a crate.

Very excited to see how stakker evolves!

1 reply

uazu Feb 24, 2021
Maintainer

Yes, tungstenite seems perfect according to their description, i.e. it is specifically designed to run over any event loop, so protocol handling must be detached from I/O. I couldn't immediately see an example of running it over MIO, though. I'd have to get more into the detail of the API to see how it works.

I don't have time right now to work on this, so I'll add an issue with all the details. Do you have any details of what you'd test this against (i.e. some tool that would stress a websocket server)? And what other implementations would be good to compare it to? I've not worked with websockets before, so I don't know what a typical load or websocket backend would need to look like for benchmarking. Can you load-balance websockets, e.g. with something like nginx?

If you or anyone else wanted to work on this, I can definitely help out with any Stakker-related code, or making Stakker-related stuff more optimal.

I think for Stakker to evolve and be proven in more situations, it will need the help of more people to push it in new directions. I will push it in the directions needed for my work and for my personal projects. But to really get more examples of using it in other areas it depends on more people joining in.

d4h0 · 2021-02-24T20:30:20Z

d4h0
Feb 24, 2021
Author

Yes, tungstenite seems perfect according to their description, i.e. it is specifically designed to run over any event loop, so protocol handling must be detached from I/O.

That sounds great.

I couldn't immediately see an example of running it over MIO, though. I'd have to get more into the detail of the API to see how it works.

Yes, unfortunately, they do not really recommend to use mio (they recommend to use tokio because you need to be more careful, when using mio. However, within their company they use mio for a WebSocket server). As far as I know, there is no official example for mio, but I found this snipped (if I remember correctly, noticed a few problems with that code).

Do you have any details of what you'd test this against (i.e. some tool that would stress a websocket server)?

There are some WebSocket load testing tools and services, it seems (however, I don't have any experience with them).

Here is a repo with benchmark code and instructions for different programming languages (including Rust). There wasn't an update in a few years, but this might still be useful.

Here is a WebSocket load testing tool that's written in Rust.

I found other projects written in Python and JavaScript, for example, but I assume that they would be too slow (on the other hand, ideally a benchmark would be distributed, so these tools might still be useful). Here is an article about stress testing WebSockets with artillery (JavaScript/node.js).

And what other implementations would be good to compare it to? I've not worked with websockets before, so I don't know what a typical load or websocket backend would need to look like for benchmarking.

Do you mean, what normally is implemented via WebSockets?

The most common use-case is probably to maintain a connection between as web browser and an HTTP server. HotWire and Phoenix LiveView send HTML (and other data) from the server to the client, for example. Database/API connections should also be pretty common. Appolo's GraphQL server would be an example for that.

Besides that, I've also seen people mentioning that WebSockets are used for game servers and even stock trading APIs.

Can you load-balance websockets, e.g. with something like nginx?

Yes, on the web, the client would send a protocol upgrade request to the HTTP server (which can be load balanced, as you wrote).

If you or anyone else wanted to work on this, I can definitely help out with any Stakker-related code, or making Stakker-related stuff more optimal.

So, I'd love to do this. Unfortunately, I'm pretty busy with other projects, as well. I also doubt I'd be the ideal person to do it (I don't have a background in system programming. I do mostly web development).

That being said, if nobody beats me to it, I might give it a try, when I have more time. Especially, when I have an actual use-case for a high-performance WebSocket server (one project I'm interested in, would need this).

One problem, regarding my potential use-case (in regard to stakker) would be that I'd have to communicate with other components of the application. For example, the HTTP server to negotiate the HTTP protocol upgrade, if I'd embed stakker into my Rust application, or a database and GraphsQL crate. So features that make this possible/easier would be needed/useful (it seems, you are thinking about adding this).

These components all use async, and I'm not aware of anything that could work well together with stakker (however, I'll keep my eyes open).

Something that would be useful for people like me (without a background in system programming), would be a short guide, that explains what to look out for, when searching for crates that play nicely with stakker. Currently, I believe, these crates should not use any async/multi-threading related functionality, and not block in any way (which is difficult to find out, without analyzing the source code).

I will push it in the directions needed for my work and for my personal projects.

Out of curiosity, for what do you use stakker for?

2 replies

uazu Feb 25, 2021
Maintainer

I meant that if I was going to benchmark Websockets then I'd need to generate some "typical" traffic and handle that traffic in some "typical" way, i.e. a way that is fairly realistic for how Websockets are usually used. I think the whole thing needs a lot more research.

Anyway I've added an issue for real-world benchmarks.

I'll look at adding a page to the Guide about finding crates that will work well with Stakker. I think some glue to join to the async/await ecosystem will be helpful at some point, either running Stakker on top of Tokio or async_std, or else Stakker acting as a single-threaded executor to host async/await code.

My own personal recreational use of Stakker is currently for writing TUIs and maybe in the future some SDL visualisation apps or audio apps. I'd like to write an AST-based Rust TUI editor, but as ever, there's never enough time ...

uazu Feb 25, 2021
Maintainer

Something that would be useful for people like me would be a short guide, that explains what to look out for, when searching for crates that play nicely with stakker

I've added a page in the guide for this.

d4h0 · 2021-02-26T19:59:30Z

d4h0
Feb 26, 2021
Author

I meant that if I was going to benchmark Websockets then I'd need to generate some "typical" traffic and handle that traffic in some "typical" way, i.e. a way that is fairly realistic for how Websockets are usually used.

Ah, okay. I'm not really sure, there are many different use cases. So I'd use a combination of small/medium/big requests/responses in the data formats that are supported (text and binary), and measure maximum throughput. But I'm not really experienced with benchmarking/stress testing.

I think the whole thing needs a lot more research.

You are most likely right here.

Stakker on top of Tokio or async_std, or else Stakker acting as a single-threaded executor to host async/await code.

I like the second approach. At some point, traits for runtime agnostic crates will be available, so this seems like the most future-oriented approach (I'm sure, you are aware of all this).

Anyway I've added an issue for real-world benchmarks.
I've added a page in the guide for this.

That's fantastic, thanks!

My own personal recreational use of Stakker is currently for writing TUIs and maybe in the future some SDL visualisation apps or audio apps. I'd like to write an AST-based Rust TUI editor

Nice! For a few projects I'm working on, I'm interested in a TUI, so I'll have a look at your TUI crate, when this time comes.

1 reply

uazu Feb 26, 2021
Maintainer

The TUI crate only provides the very lowest core level of support for TUIs right now, i.e. resizing, buffering, turning on/off input and cleanup. You can write TUI stuff on top of it (I already have) but it doesn't support a "framebuffer" and minimised updates yet, i.e. what curses does. However for writing a simple editor or pager it is probably enough. (Framebuffer code is partly done.)

Yes, I'm leaning towards allowing Stakker to act as a single-threaded executor. I think it could work quite well. I need to get into the details.

d4h0 · 2021-02-27T19:48:36Z

d4h0
Feb 27, 2021
Author

The TUI crate only provides the very lowest core level of support for TUIs right now

Okay, thanks. Are you planning to add more functionality, when you are ready to work on your editor, or should the crate remain low-level?

1 reply

uazu Feb 28, 2021
Maintainer

It will get more features as I get time to work on it. The framebuffer minimised-update code is mostly done.

d4h0 · 2021-02-28T19:07:29Z

d4h0
Feb 28, 2021
Author

Thanks for answering all my questions!

I'll be on the lookout for a project that stakker would be a good choice for.

0 replies

d4h0 · 2022-02-14T11:50:23Z

d4h0
Feb 14, 2022
Author

Hi @uazu :)

In another thread you wrote "I'm not sure how to promote [Stakker] further when so much energy and enthusiasm is devoted to other runtimes".

I think, showing a benefit in performance or resource usage (if there is one) would help a lot to promote Stakker (because that is probably the most common reason why someone would want to use something else than async Rust).

Unfortunately, I still hadn't time to play with Stakker, or to even create a small benchmark (however, one of my current projects could probably benefit from it, so maybe I can build some components with Stakker).

Recently, there was a post on r/rust that linked to some benchmarks of gRPC implementations. The results show, that the Rust implementation (based on Tonic and Tokio) is significantly faster than all other implementations in single-thread mode.

However, the multi-threaded version is less performant than other implementations (still fast, but slower than several other implementations).

I'm wondering if such a benchmark would be a good fit to see if Stakker is more performant than alternatives.

I'd image that, if a Stakker-based implementations turns out to be significantly faster than other Rust-based implementations, that would convince some people in the community to have a look at Stakker (which I strongly feel, would deserve such attention).

However, I don't know much about gRPC, or how much work it would be to build 'stakker-grpc' and implement the benchmark code (it seems, there are some crates that could be used or adapted).

Besides that, 'stakker-grpc' itself would probably be an asset in itself.

gRPC is something I'm probably going to look at for my current project. So maybe, if nobody else wants to do it, I could give it a try.

Something else, that would help, I believe, would be a community-curated list of crates that can be used with Stakker. For example, I was looking for an HTTP client that can be used with Stakker, and found mio_httpc, which looks promising.

3 replies

uazu Feb 15, 2022
Maintainer

I was imagining something like an actor-based mini-chat application, implemented on top of Stakker and Tokio-style actors (i.e. channels) to show the raw performance difference of the two approaches, with some kind of a tester to stress both implementations at various scales. But it needs to be realistic and idiomatic in the two systems to avoid criticism.

The practical issue is always finding time to work on it. However I can review and make suggestions if you want to implement a crate or benchmark based on Stakker (with gRPC or anything else really).

d4h0 Feb 16, 2022
Author

The practical issue is always finding time to work on it.

Yes, that is very true. On top of that, I'd like to create a benchmark that is related to an actual use case I have. But maybe that's not realistic.

One major problem (for more complex benchmarks, that I'm interested in) is, that certain components are required, which itself most likely would influence the result of a benchmark.

For example, I'm interested in analyzing a huge amount of web applications, so one benchmark could be "request 100.000 URLs". But if the Tokio variant uses reqwest as its HTPP client and the Stakker variant uses mio_httpc, then that is an additional variable.

So maybe a simple chat application, as you propose, would be ideal, because this could depend on things that are built into the benchmarked runtimes.

One additional advantage would be, that such a benchmark could relatively easy be ported to other runtimes (for example, I'm interested in Tokio, Tokio-Uring, Mio, Stakker, Glommio, message.io, and a multithreaded version).

As mentioned, I unfortunately do not really have much time available for non-essential things, but I'm really interested in a benchmark like this. So I probably have to just start and complete it step-by-step over a longer period of time... :)

Because I have no clue about how to create a proper benchmark of "IO runtimes", my first step (at some point in the future) will be to ask on r/rust or the message board how to do it.

I can think of several ways to implement an actor-based mini-chat application, so (if you want/can) it would be great if you could give me a few hints about what you had in mind.

uazu Feb 19, 2022
Maintainer

I've been thinking about this a little. Effectively I'm talking about a pub-sub kind of scenario. Let's say we have a primitive chat-room application. So each user is a member of several chatrooms. Each user client has a TCP connection on which they receive lines of text in the format <chatroom> <source-user> <message>, and they can send messages in the format <chatroom> <message> to send a message or @join <chatroom> to join a chatroom, or @login <user-name> to declare what user this is. (We don't need any other commands for the purpose of this benchmark, I think, and we don't need any user-validation or passwords or anything because that's not relevant to what we're benchmarking). So the server has N TCP connections open to N users, and has to handle sending out received messages to all the users logged into each chatroom. We're presuming that this will run on a single machine, either single-threaded or multi-threaded (as an option for Tokio, to see if it performs better).

So for an actor system, we could just have an actor for each logged-in user, to handle the TCP connection. Then there could be an actor for each chatroom to distribute messages out to the actors for all logged-in users who are members of that chatroom. This will run single-threaded for the purpose of the benchmark. If this performs worse than Tokio, then we could maybe look at running it sharded or something over several threads. But hopefully we'll get a convincing performance advantage even running in a single thread.

The question is, how would you idiomatically implement this with async/await and Tokio for example? Since I am less familiar with async/await, that is more of a puzzle. I suppose we could use a similar structure, i.e. have some code doing a select! to handle everything that happens for a particular user, and then use channels to forward messages to some other code which handles the chatroom, and channels to forward messages back to the logged-in users. For Tokio, we could try this with both single-threaded and multi-threaded executors, to see how well they perform.

Then it needs a stress-testing application to open up a load of users and join rooms and send messages and gradually increase traffic load and measure performance (e.g. how long it takes for messages to be delivered after being sent). This can be done in Stakker for simplicity. I'm imagining plotting load level (X-axis) and message-delivery-time (mean or maximum, Y-axis) for each platform: Stakker, Tokio single-threaded, Tokio multi-threaded, and see how things compare. A plot has the advantage of letting the eye filter things and judge the overall trend.

Then I guess I should write up the whole thing as a blog article somewhere, and publish it and deal with feedback.

What do you think? I can give some time to this to help out in spare moments. I will have more free time in 2-3 months, though, all being well.

Edit: Fixed quoting

d4h0 · 2022-02-22T20:32:21Z

d4h0
Feb 22, 2022
Author

Thank you for the detailed description :)

This shouldn't be too difficult to implement.

My plan now is to first reasearch a bit how to properly benchmark network applications, after which I'll ask the Rust community for feedback (and how they would implement such a benchmark). Then I'll start to build implementations, and ask for a review of the code (here and on r/rust or the message board). The last step would be to run the benchmark and publish it.

I'll start with Tokio and Stakker, however I'm also interested in other runtimes (e.g. Mio on its own, message-io, Glommio, and a multithreaded version), which I might implement as well, if the above doesn't take too much effort.

Does this all make sense?

I basically only have time on some evenings, so this project could take some time to complete.

4 replies

uazu Feb 23, 2022
Maintainer

Having thought it through, I'm kind of tempted to try and find some time to work on it too. I think a Stakker version using multiple threads would be interesting too, especially if those threads communicated over TCP, which means that it could scale beyond a single machine, to demonstrate the advantages of handling load-balancing at a high level rather than low level. Glommio could also be coded to work the same way (i.e. to scale beyond the machine). Anyway first would be the tester, single-threaded Stakker, and multi-threaded Tokio implementations.

If you're interested I could create a project on my github to coordinate things, or on your github or whereever really -- I don't mind.

d4h0 Feb 24, 2022
Author

This sounds great :)

Sorry, this post is pretty long, however it contains much information that I've discovered and ideas I got today.

I think a Stakker version using multiple threads would be interesting too

Yes, I'm very much interested in this, too, and the benchmark probably wouldn't be complete without this (i.e. if Tokio has a variant that is multithreaded). Ideally, there also would be a Tokio thread-per-core variant (see below).

especially if those threads communicated over TCP

Would it make sense from a performance point-of-view to use TCP for inter-process communication?

I'd expect that de-/serialization would kill performance for most high-performance real-world use-cases.

On Linux Unix Domain Sockets would be a more performant and easier to secure alternative (basically, file system permissions can be used to restrict access to Unix Sockets, and the username of the other side can be requested).

which means that it could scale beyond a single machine, to demonstrate the advantages of handling load-balancing at a high level rather than low level.

Yes, I'm very much interested in exactly this. It makes most sense to run the benchmark in the cloud, and to set up the infrastructure via an Infrastructure-as-Code tool (e.g. Pulumi), so in theory this shouldn't be too much additional work. However, if this assumption turns out not to be true, a later version of the benchmark could introduce a distributed variant.

If you're interested I could create a project on my github to coordinate things, or on your github or whereever really -- I don't mind.

Sure, feel free to set up the project on your GitHub. This would make it possible to discuss specific topics in separate issues (instead of one long thread).

I was thinking about creating a dedicated GitHub organization to host the benchmark (or to look for an existing org that would be willing to do it). The benefit, I believe, would be that this "signals independence more" (not sure how to phrase this correctly), and that other maintainers maybe would be more willing to join the organization and maintain benchmarks for their projects (which would be ideal).

However, I might be wrong. And anyway, it seems it's easy to transfer a repo to an organization later. Maybe it's smarter to keep things simple at the beginning, so feel free to create a GitHub repo.

I hope I don't make myself too much additional work, but my "vision" is to help to create a benchmark suite which:

Contains benchmarks for realistic work loads, implemented in a realistic/idiomatic way
Helps users to select the best runtime for their specific use-case (if it's a common use-case)
And, maybe, even helps maintainers of projects to monitor the performance of their projects relative to related projects (or in general)

So the main goal would be to be useful, not to "forcefully show" that a specific runtime performs better than others (from what you wrote, I feel you think the same way).

Ideally there would be three categories of benchmark implementations:

Naive / Most ergonomic: For code that beginners would often create, that might not be perfect performance-wise, or code that uses features that are more ergonomic, but less performant
Optimal / Recommended: How most code should be written
Most performant (but still sane): For benchmarks that contain code that is not idiomatic, or normally recommended, but could be used if maximum performance is essential (this should still be something that a sane person actually would put into a production-grade application)

Perhaps an additional category called "Insane / Unrecommended" (or better something less insulting... :)) would make sense for benchmarks that don't fit the requirements of the third category.

Such a category system would make sense if the benchmark project receives a certain amount of contributions (e.g. by people that might be exited to show the superiority of their favorite project, or their skills as a Rust programmer... :)). If necessary, there should be some kind of 'democratic process' where members of the GitHub org decide which category a benchmarks belongs into.

To keep things simple, we most-likely just should keep this in mind for now, and add it when it makes sense.

Regarding work loads:

I thought about this a bit, and I think there are different kinds of work loads that are realistic and should be considered.

For example, some applications are simple in structure and just require maximum performance – for example:

Reverse proxies
Something like a port scanner
An application that just receives & logs messages (it seems that is the exact use case of DataDog, the creator of Glommio)

A benchmark with such a workload easily can be declared as an "unrealistic micro-benchmark", but I believe such a workload is realistic. I'd expect that Mio and other systems that enable an efficient thread-per-code architecture will be at the top of such a benchmark.

Another workload could be CPU intense requests. For example applications that offer image processing on demand, or are serving machine learning models. I estimate that a multithreaded server would perform better than asynchronous systems.

Many applications have a mixed workload of IO, CPU-intense work, and need to communicate with sub-systems. For example a web application that de-/serializes JSON (which is CPU-intensive), and uses a remote database, or the file system. For a somewhat complex application like this, using Mio probably would be too unergonomic, and multithreading probably would be too unperformant (for applications with high load). From what I've read, this could be a workload where Tokio shines (thread-per-core systems possible too).

I think, your chat-application-workload is as well valid for many applications. There probably should be a variant that adds some CPU-intense work. Maybe if the messages are in JSON, that would be enough. Perhaps adding some requests that require something like blake3-proof-of-work would be a good idea.

There was a relevant post on r/rust a few months ago (which I finally was able to find again!):

Scalable server design in Rust with Tokio

The author of the discussed article proposes a thread-per-core design for high-performance Tokio applications via the SO_REUSEPORT socket option (source code). According to the benchmark in the article, this doubles the performance of Tokio (actually, this is a benchmark of Tonic, which is used in the gRPC benchmark that I've linked earlier. I'd be curious what the result would be, if a SO_REUSEPORT-based variant would be added to that benchmark).

There is much criticism in the comments (probably nothing new to you).

The main criticism is the loss of the work-stealing scheduler, which should result in worse performance for real-world workloads. However, I don't get it, to be honest (likely because of a lack of deep knowledge about the topic):

If a worker thread/instance has too much work already – e.g. because some requests use more resources, and this worker instance was "unlucky" and got too many of these requests – couldn't it just stop accepting new requests, and let other instances handle the load?

Similarly, if one instance has too little work, it could signal to other instances to accept fewer requests (temporary). This would work if there is a way to implement such a signal efficiently (maybe a socket because this shouldn't happen often, or a AtomicBool?).

And if that is not enough, wouldn't it be possible to send requests to other worker threads? If this happens too frequently, some performance gains would be lost, but such cross-thread communication still should happen less frequently than what Tokio's work-stealing scheduler does (I assume).

Another option, if a worker instance can't keep up, would be to transform into a reverse proxy. Either to other local worker instances (if that makes more sense than inter-thread communication. Could be done via Unix Sockets), or to other servers in the cluster.

Anyway, I believe this is valid criticism that should be kept in mind when completing the design of the benchmark workloads. Basically, the workload needs to be more random / less uniform (which makes sense, of course, but I'm not 100% sure how that would ideally be implemented).

There was also a relevant and interesting-looking article by Carl Lerche linked, which seems to go into detail about how Tokio's work-stealing scheduler works, and why/when this is superior to other approaches:

Making the Tokio scheduler 10x faster

The article seems to contain a lot of relevant information, so I'll probably read it during my next session for this project.

PS: By the way, I just discovered another thread-per-core io_ring-based async runtime hidden in my ~1000 open tabs:

https://github.com/bytedance/monoio

They even have some benchmarks with Glommio and Tokio (unfortunately, partly in Chinese, and only simplistic benchmarks, its seems):

https://github.com/bytedance/monoio/blob/master/docs/en/benchmark.md

Just in case, you are interested in having a look.

PPS: Sorry, again, for the long post. I basically spend most of today researching and writing this post. My time-budged, therefore, is, basically, mostly used up for the next week or so... :)

uazu Feb 25, 2022
Maintainer

Yes, TCP would likely be slower than other options, but the point is that it demonstrates cross-machine scaling. If we only wanted cross-thread scaling then channels would do.

Your ideas are really ambitious. I think to start with we need: tester, Stakker single-threaded, Tokio multi-threaded. Then expand from there as there is interest and motivation. If your bigger plan takes shape, then yes I'm happy for the code to be moved elsewhere for independence, and I'd link there and contribute there instead.

For now, I will create a new project on my Github. If I have time next week I'll write the tester. I will randomize the waits between requests to maintain a roughly-constant request rate for realism.

The thing with all these different approaches is that the more synchronization happens (mutexes, channels, etc), the less work is getting done. That's how I see it anyway. But I guess whether I am right about that will come clearer with the benchmarking.

The new repository is up now. You should have got an invite.

d4h0 Feb 26, 2022
Author

Alright, that sounds perfect.

I think to start with we need: tester, Stakker single-threaded, Tokio multi-threaded.

I think, Tokio single-threaded would also be needed (as mentioned in the article I linked in my last post, that may have a significant impact on performance).

I will randomize the waits between requests to maintain a roughly-constant request rate for realism.

Additionally, I'd randomize the size of messages and maybe even the format (e.g. JSON (slow) and a binary format like bincode (fast, see this benchmark)). That should increase realism as well, and shouldn't be much additional work.

However, I'm not sure about the ratio. I probably would try 70% small messages, 20% medium-sized messages, and 10% big messages. And 50/50 for JSON vs. bincode. But you might have a better idea what better represents realistic payloads.

If I have time next week I'll write the tester.

Alright, after that (when the exact API is defined), I'll implement the Tokio variant/s.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there any benchmarks? #8

{{title}}

Replies: 8 comments 12 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Are there any benchmarks? #8

d4h0 Feb 22, 2021

Replies: 8 comments · 12 replies

uazu Feb 22, 2021 Maintainer

d4h0 Feb 23, 2021 Author

uazu Feb 24, 2021 Maintainer

d4h0 Feb 24, 2021 Author

uazu Feb 25, 2021 Maintainer

uazu Feb 25, 2021 Maintainer

d4h0 Feb 26, 2021 Author

uazu Feb 26, 2021 Maintainer

d4h0 Feb 27, 2021 Author

uazu Feb 28, 2021 Maintainer

d4h0 Feb 28, 2021 Author

d4h0 Feb 14, 2022 Author

uazu Feb 15, 2022 Maintainer

d4h0 Feb 16, 2022 Author

uazu Feb 19, 2022 Maintainer

d4h0 Feb 22, 2022 Author

uazu Feb 23, 2022 Maintainer

d4h0 Feb 24, 2022 Author

uazu Feb 25, 2022 Maintainer

d4h0 Feb 26, 2022 Author

d4h0
Feb 22, 2021

Replies: 8 comments 12 replies

uazu
Feb 22, 2021
Maintainer

d4h0
Feb 23, 2021
Author

uazu Feb 24, 2021
Maintainer

d4h0
Feb 24, 2021
Author

uazu Feb 25, 2021
Maintainer

uazu Feb 25, 2021
Maintainer

d4h0
Feb 26, 2021
Author

uazu Feb 26, 2021
Maintainer

d4h0
Feb 27, 2021
Author

uazu Feb 28, 2021
Maintainer

d4h0
Feb 28, 2021
Author

d4h0
Feb 14, 2022
Author

uazu Feb 15, 2022
Maintainer

d4h0 Feb 16, 2022
Author

uazu Feb 19, 2022
Maintainer

d4h0
Feb 22, 2022
Author

uazu Feb 23, 2022
Maintainer

d4h0 Feb 24, 2022
Author

uazu Feb 25, 2022
Maintainer

d4h0 Feb 26, 2022
Author