us_loop_on_sample_load and better multithreading #17

ghost · 2018-11-07T08:00:30Z

Various threading features are required. Reuse port, master listener & slave worker, etc.

victorstewart · 2018-12-14T06:52:13Z

A forking take on parallelism might be a better idea on Linux? I believe it's more performant, children vs threads. Certainly MUCH simpler memory model and no coordination between "threads" to be concerned with.

ghost · 2018-12-14T15:01:46Z

The lib is only threaded per-loop so essentially it's the same as you talk about. No data is shared, you run individual us_loop isolated to one thread each. Sharing data between threads is always slow and is not the purpose here.

ghost · 2018-12-14T15:03:17Z

The only thing you want to (somewhat) share between the loops is the listener, some sort of common entry point for connections. Some kind of master/slave set up.

victorstewart · 2018-12-14T16:37:36Z

I've been flirting with the idea of jumping ship to uWebSockets, but would need a "worker-per-logical-core" to do so. I was thinking of simply writing a fork-per-logical-core loop before initializing uWS::SSLApp, and setting SO_REUSEPORT. Seems like the least friction way. Do you have any warnings though?

(P.S. now that the HTTP implementation is complete, you should consider submitting it to the techempower benchmarks ;) )

ghost · 2018-12-14T17:39:50Z

Techempower are incompetent morons. And that's putting it lightly.

You already have this support in 0.14

victorstewart · 2018-12-14T17:56:44Z

haha. think you might be my computer soulmate.

I see what you mean with multithreaded_echo.cpp

victorstewart · 2018-12-16T02:23:38Z

cleared up for me how essentially there’s 0 performance difference between threads and processes inside the kernel

https://stackoverflow.com/questions/807506/threads-vs-processes-in-linux

ghost · 2018-12-16T03:35:44Z

I don't think it makes any difference if you have a fix amount of threads or processes basically acting as a wrapper of a cpu core. But I've always preferred threads bc they allow sharing without syscalls as messenger also threads are standardized to the language processes are not

ghost · 2018-12-26T17:46:36Z

A simple way that many use and was available in v0.14 is something like us_socket_transfer(us_socket *, us_loop *) kind of deal:

Very simple, you listen and accept from one thread like usual and you get your usual on_open event on the main thread, then you us_socket_transfer and it goes over to the slave thread. Very simple interface, allows you to select how to load balance things.

That would be step 1; the most simple way.

Forking can be necessary later on to play well with JavaScript environments that don't have threads where you need to also support forking kind of solutions.

ghost · 2018-12-26T17:47:35Z

Lwan works like that, they listen and accept on one thread and then just eventfd-epoll_wait kind of transfers that FD to another thread via a fast fifo-queue. That's simple.

ghost · 2018-12-26T17:50:42Z

You might want something like on_opening event where you get to early transfer the FD to another thread before any SSL work is done, essentially called immediately after accept. Then you get on_open called on the slave thread. Simple.

ghost · 2018-12-29T18:21:28Z

us_socket_context_on_distribute:

returns the us_socket_context where on_open should happen

ghost · 2021-02-08T15:54:55Z

It already works good enough. And if anything all you need is a way to stop listening on the context with the most connections. That's all really.

And this can be done from any thread, so any thread can stop polling for accept for some other thread. In short - no need for a master.

ghost · 2021-02-08T16:02:00Z

If you have 4 CPUs then the 3 with the least amount of connections should poll for accept. Whenever you accept check if you are the one with the most connections and in that case, stop polling for accept and start polling for accept on the one that previously did not poll for accept. It's that simple, and you can set the granularity to 50 so that the switching does not trigger all the time.

And if you have 16 CPUs then maybe it is enough to only poll for accept on maybe 8 cores. Same rule, only with a limit.

ghost · 2021-02-08T16:03:20Z

And this switching should simply trigger on every accept and every close. That's it

ghost · 2021-02-08T16:06:56Z

Fixing this should also enable this support on macOS (and maybe Windows). When we have this enforcement all platforms should work with this.

ghost · 2021-02-08T16:46:53Z

I forgot libuv is made by ogres and is not thread safe even though the underlying kernel is 🤦

So the rule has to be triggered by every thread, on their 4-second timers;

wait for 4-second timeout
lock a common list for all threads
are we holding more insert metric than other threads AND ther is at least one other poller, WE stop polling for accept
are we holding fewer insert metric than other threads, then WE start polling for accept
unlock list
goto 1

With this, the metric can be number of connection OR any metric as returned by a callback (such as memory usage, CPU-time usage, etc). This callback will be implemented with number of connections as default metric but should be possible to override.

This way load balancing is "re-routed" every 4 seconds, which is not too often to cause extra load, while not too long to be irrelevant. It is just right.

ghost · 2021-02-08T16:57:28Z

Really, you can just make it:

us_loop_on_sample_load as a callback that returns the time from CLOCK_THREAD_CPUTIME_ID and that can be sampled since last timeout. Then you have load balancing that is based on actual per-thread CPU-time usage which is probably the most accurate

ghost added the enhancement label Nov 7, 2018

ghost mentioned this issue Dec 22, 2018

Bind Socket to a Logical Core #23

Closed

ghost changed the title ~~Threading examples & features~~ Slave & master threading support, us_socket_context_adopt and co. Dec 26, 2018

ghost changed the title ~~Slave & master threading support, us_socket_context_adopt and co.~~ us_loop_on_sample_load and better multithreading Feb 8, 2021

xlshaoscu mentioned this issue Apr 28, 2022

When Abnormal Interruption，Could lead to use after free #180

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

us_loop_on_sample_load and better multithreading #17

us_loop_on_sample_load and better multithreading #17

ghost commented Nov 7, 2018

victorstewart commented Dec 14, 2018

ghost commented Dec 14, 2018

ghost commented Dec 14, 2018

victorstewart commented Dec 14, 2018 •

edited

ghost commented Dec 14, 2018

victorstewart commented Dec 14, 2018 •

edited

victorstewart commented Dec 16, 2018

ghost commented Dec 16, 2018

ghost commented Dec 26, 2018

ghost commented Dec 26, 2018

ghost commented Dec 26, 2018

ghost commented Dec 29, 2018

ghost commented Feb 8, 2021

ghost commented Feb 8, 2021

ghost commented Feb 8, 2021

ghost commented Feb 8, 2021

ghost commented Feb 8, 2021

ghost commented Feb 8, 2021

us_loop_on_sample_load and better multithreading #17

us_loop_on_sample_load and better multithreading #17

Comments

ghost commented Nov 7, 2018

victorstewart commented Dec 14, 2018

ghost commented Dec 14, 2018

ghost commented Dec 14, 2018

victorstewart commented Dec 14, 2018 • edited

ghost commented Dec 14, 2018

victorstewart commented Dec 14, 2018 • edited

victorstewart commented Dec 16, 2018

ghost commented Dec 16, 2018

ghost commented Dec 26, 2018

ghost commented Dec 26, 2018

ghost commented Dec 26, 2018

ghost commented Dec 29, 2018

ghost commented Feb 8, 2021

ghost commented Feb 8, 2021

ghost commented Feb 8, 2021

ghost commented Feb 8, 2021

ghost commented Feb 8, 2021

ghost commented Feb 8, 2021

victorstewart commented Dec 14, 2018 •

edited

victorstewart commented Dec 14, 2018 •

edited