Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manifold-scheduler-pool can be a bottleneck on higher loads #479

alexander-yakushev opened this issue Feb 8, 2019 · 1 comment


Copy link

@alexander-yakushev alexander-yakushev commented Feb 8, 2019

Feel free to mark this as a minor issue, and I don't have a generic solution right now, just throwing around some ideas.

manifold-scheduler-pool that is used whenever manifold.time/in is used (and transitively for all manifold.deferred/timeout!) is based on Java's ScheduledThreadPoolExecutor. That executor uses blocking queues and is not terribly efficient when facing 100k+/sec scheduling rate and grows a sizeable tail in the queue. Aleph is quite trigger-happy about creating timeouts, especially when you use a client and set :pool-timeout, :connection-timeout, and the other similar parameters.

The immediate solution I came up with is replacing the default scheduler with Netty's HashedTimerWheel, which uses JCTools' lockless MPSC queue under the hood. No extra dependencies are needed since Netty is already there; however, this approach cannot be blindly applied to Manifold itself as people might use it without Aleph (and Netty). Perhaps, we could detect on pool initialization whether Netty is present and create a more efficient executor then? I don't know.

Anyway, here's a hack for those who might run into a similar problem:

(def ^:private hashed-timer-clock
  "A replacement for the default manifold.time clock that is used for scheduling
  timeouts. We do many timeouts, need something faster. Netty's HashedWheelTimer
  is a good replacement, based on JCTools' MPSC lockless queue."
  (let [;; 10 milliseconds is the minimal resolution - should be enough.
        timer (HashedWheelTimer.
               (thread-factory (fn [] "manifold-timeout-scheduler"))
               10 TimeUnit/MILLISECONDS 1024)
        ;; We also create a regular ScheduledExecutor to serve IClock's `every`
        ;; functionality since WheelTimer doesn't do it. Yet, those periodic
        ;; schedulings are less frequent, so we can afford using the default.
        periodic-clock (mtime/scheduled-executor->clock
                         (thread-factory (fn [] "manifold-periodic-scheduler"))))]
    (reify IClock
      (in [_ interval f]
        (.newTimeout timer (reify TimerTask (run [_ _] (f)))
                     interval TimeUnit/MILLISECONDS))
      (every [_ delay period f]
        (.every ^IClock periodic-clock delay period f)))))

;; And now, hack into the root of a dynamic var. Kids, don't try this at home.
(alter-var-root #'mtime/*clock* (constantly hashed-timer-clock))

This comment has been minimized.

Copy link

@kachayev kachayev commented Feb 8, 2019

I've been looking into HashedWheelTimer for quite some time to replace manifold timeouts. I did some experiments using AsyncHttpClient as an example.

Netty does not use this timer internally tho', it turns out that in practice it's better to leverage the same tasks queue that's used for all I/O tasks. I think that it would be a decent improvement to move connection/request etc timeouts to Netty's executor. The same way it's done for WebSocket handshake timeout handling, here.

The harder part is testing & benchmarking. If you're volunteering to measure performance improvements on 100k+/sec, - I'm more than glad to help with the implementation.

Regarding Manifold itself, I assume that the scheduler there aims to be a general purpose scheduler. It's hard to make a general purpose implementation that works well in I/O bounded context, tailored solution might be ar order of magnitude better. So, I would say that "better peformance for Aleph timeouts" and "better Manifold scheduler" are 2 separate tasks. Having a scheduler with a single thread to execute callbacks might be tough not only from performance/thoughtput point of view, one of the problems we've spotted a while ago in our projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
2 participants
You can’t perform that action at this time.