Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

us_loop_on_sample_load and better multithreading #17

Open
ghost opened this issue Nov 7, 2018 · 18 comments
Open

us_loop_on_sample_load and better multithreading #17

ghost opened this issue Nov 7, 2018 · 18 comments

Comments

@ghost
Copy link

ghost commented Nov 7, 2018

Various threading features are required. Reuse port, master listener & slave worker, etc.

@ghost ghost added the enhancement label Nov 7, 2018
@victorstewart
Copy link

A forking take on parallelism might be a better idea on Linux? I believe it's more performant, children vs threads. Certainly MUCH simpler memory model and no coordination between "threads" to be concerned with.

@ghost
Copy link
Author

ghost commented Dec 14, 2018

The lib is only threaded per-loop so essentially it's the same as you talk about. No data is shared, you run individual us_loop isolated to one thread each. Sharing data between threads is always slow and is not the purpose here.

@ghost
Copy link
Author

ghost commented Dec 14, 2018

The only thing you want to (somewhat) share between the loops is the listener, some sort of common entry point for connections. Some kind of master/slave set up.

@victorstewart
Copy link

victorstewart commented Dec 14, 2018

I've been flirting with the idea of jumping ship to uWebSockets, but would need a "worker-per-logical-core" to do so. I was thinking of simply writing a fork-per-logical-core loop before initializing uWS::SSLApp, and setting SO_REUSEPORT. Seems like the least friction way. Do you have any warnings though?

(P.S. now that the HTTP implementation is complete, you should consider submitting it to the techempower benchmarks ;) )

@ghost
Copy link
Author

ghost commented Dec 14, 2018

Techempower are incompetent morons. And that's putting it lightly.

You already have this support in 0.14

@victorstewart
Copy link

victorstewart commented Dec 14, 2018

haha. think you might be my computer soulmate.

I see what you mean with multithreaded_echo.cpp

@victorstewart
Copy link

cleared up for me how essentially there’s 0 performance difference between threads and processes inside the kernel

https://stackoverflow.com/questions/807506/threads-vs-processes-in-linux

@ghost
Copy link
Author

ghost commented Dec 16, 2018

I don't think it makes any difference if you have a fix amount of threads or processes basically acting as a wrapper of a cpu core. But I've always preferred threads bc they allow sharing without syscalls as messenger also threads are standardized to the language processes are not

@ghost ghost mentioned this issue Dec 22, 2018
@ghost
Copy link
Author

ghost commented Dec 26, 2018

A simple way that many use and was available in v0.14 is something like us_socket_transfer(us_socket *, us_loop *) kind of deal:

Very simple, you listen and accept from one thread like usual and you get your usual on_open event on the main thread, then you us_socket_transfer and it goes over to the slave thread. Very simple interface, allows you to select how to load balance things.

That would be step 1; the most simple way.

Forking can be necessary later on to play well with JavaScript environments that don't have threads where you need to also support forking kind of solutions.

@ghost
Copy link
Author

ghost commented Dec 26, 2018

Lwan works like that, they listen and accept on one thread and then just eventfd-epoll_wait kind of transfers that FD to another thread via a fast fifo-queue. That's simple.

@ghost
Copy link
Author

ghost commented Dec 26, 2018

You might want something like on_opening event where you get to early transfer the FD to another thread before any SSL work is done, essentially called immediately after accept. Then you get on_open called on the slave thread. Simple.

@ghost ghost changed the title Threading examples & features Slave & master threading support, us_socket_context_adopt and co. Dec 26, 2018
@ghost
Copy link
Author

ghost commented Dec 29, 2018

us_socket_context_on_distribute:

returns the us_socket_context where on_open should happen

@ghost
Copy link
Author

ghost commented Feb 8, 2021

It already works good enough. And if anything all you need is a way to stop listening on the context with the most connections. That's all really.

And this can be done from any thread, so any thread can stop polling for accept for some other thread. In short - no need for a master.

@ghost
Copy link
Author

ghost commented Feb 8, 2021

If you have 4 CPUs then the 3 with the least amount of connections should poll for accept. Whenever you accept check if you are the one with the most connections and in that case, stop polling for accept and start polling for accept on the one that previously did not poll for accept. It's that simple, and you can set the granularity to 50 so that the switching does not trigger all the time.

And if you have 16 CPUs then maybe it is enough to only poll for accept on maybe 8 cores. Same rule, only with a limit.

@ghost
Copy link
Author

ghost commented Feb 8, 2021

And this switching should simply trigger on every accept and every close. That's it

@ghost
Copy link
Author

ghost commented Feb 8, 2021

Fixing this should also enable this support on macOS (and maybe Windows). When we have this enforcement all platforms should work with this.

@ghost
Copy link
Author

ghost commented Feb 8, 2021

I forgot libuv is made by ogres and is not thread safe even though the underlying kernel is 🤦

So the rule has to be triggered by every thread, on their 4-second timers;

  1. wait for 4-second timeout
  2. lock a common list for all threads
  3. are we holding more insert metric than other threads AND ther is at least one other poller, WE stop polling for accept
  4. are we holding fewer insert metric than other threads, then WE start polling for accept
  5. unlock list
  6. goto 1

With this, the metric can be number of connection OR any metric as returned by a callback (such as memory usage, CPU-time usage, etc). This callback will be implemented with number of connections as default metric but should be possible to override.

This way load balancing is "re-routed" every 4 seconds, which is not too often to cause extra load, while not too long to be irrelevant. It is just right.

@ghost ghost changed the title Slave & master threading support, us_socket_context_adopt and co. us_loop_on_sample_load and better multithreading Feb 8, 2021
@ghost
Copy link
Author

ghost commented Feb 8, 2021

Really, you can just make it:

us_loop_on_sample_load as a callback that returns the time from CLOCK_THREAD_CPUTIME_ID and that can be sampled since last timeout. Then you have load balancing that is based on actual per-thread CPU-time usage which is probably the most accurate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant