Skip to content

Highly-concurrent dispatchers for parallelization of work on multicore-CPU architectures for java 7, 8. Used advanced caching and concurrency techniques.

License

Notifications You must be signed in to change notification settings

vibneiro/dispatching

Repository files navigation

Dispatch

Dispatch is an alternative highly-concurrenct library for java, providing a set of dispatchers for parallelization of work on multicore-CPU architectures.

For more thorough understanding, please follow this post

Author: Ivan Voroshilin

email: vibneiro@gmail.com

##How to build:

git clone https://github.com/vibneiro/dispatching.git
cd dispatching
mvn clean package

Jar-files are under:

  • \dispatching\dispatch-java-7\target\dispatch-7.1.0-SNAPSHOT.jar
  • \dispatching\dispatch-java-8\target\dispatch-8.1.0-SNAPSHOT.jar

See Test examples to kick-start.

##Dispatchers

###Dispatcher.java The main interface implemented by all dispatchers.

Each task has a corresponding dispatchId. Tasks with the same dispatchId get processed sequentially (syncrhonously). This allows to run ordered tasks.

All dispatchers have an option to schedule tasks in your Executor, by default it is ForkJoinPool.

Types of dispatchers

###WorkStealingDispatcher.java

When to use:

  1. Unbalanced tasks cause inefficient CPU-utilization. The goal is to use CPU-cores more efficiently.
  2. Tasks are not blocked by I/O and reasonably small to be proccessed. This come in handy, especially for event-driven async processing.

Algorithm:

The main idea in this dispatcher it to separate the queue from the worker-thread, FIFO semantics are retained for tasks with the same dispatchId. Any free thread can take on a task for execution.

For tasks that differ in execution time, some dispatch queues might be more active than others causing unfair balance among workers (threads). Even for equal tasks, this scales much better unlike in the standard Executors, which is proved by benchmarking tests below. ForkJoinPool is used under the hood for this reason by default. The work is spread out more effeciently by the virtue of work-stealing and reduced contention compared to the standard implementations of Executors.

Prunning of the map happens only for entries that have completed futures and is done on reaching cache capacity (atomically) via WeakReference values. tryLock is used for optimistic cache eviction (this idea is derived from Guava/Caffeine projects).

There are 2 versions of this dispatcher, the performance signficantly differs, giving a preference to JDK 8 enhancements:

  • JDK 7 and later: based on Guava's ListenableFuture.
  • JDK 8 and later: based on CompletableFuture.

###ThreadBoundDispatcher.java

When to use:

  1. Each tasksId must be stricty pinned to a particular Thread. This come in handy for low latency systems, where context switch is unacceptable (CPU affinity also can be exploited additionaly).
  2. Tasks mustn't differ much in the computation size.

Algorithm: Each tasksId is stricty pinned to its Thread. Each workerthread has a separate ConcurrentBlockingQueue and processes tasks in the FIFO order.

MicroBenchmarks

Benchmarks were written on JMH framework for JDK 7 and 8 separately and run on iMac Core i5 CPU @ 2.50GHz (4 cores) 8 GB, Yosemite OS. An empty Runnable synthetic task is used to mitigate side-effects.

Source-code for JDK7 Benchmarks

Source-code for JDK8 Benchmarks

Benchmark mode: Throughput, ops/time

3 test-cases:

  1. A single dispatch-queue: putting new tasks always to the same dispatchId.
  2. Counting dispatchId: one-off queue of size = 1 per task, since dispatchId is always incremented by 1.
  3. Randomly filled set of queues with a size = 32768. TODO: try 1024

The following params are used for JMH benchmarking:

  • { Bounded, Unbounded } caches;
    • Purpose: analyze the impact of eviction time on the overall performance.
  • 2 types of ExecutorService { ThreadPoolExecutor, ForkJoinPool };
    • Purpose: analyze the impact of 2 different executors on throughput.
  • 32 user threads for all 3 tests;
    • Purpose: analyze contention impact on concurrent data-structures.

##How to run the benchmark

(For JDK 7, just replace with a digit 8 with 7 where appropriate, as per below):

git clone https://github.com/vibneiro/dispatching.git
cd dispatching
mvn clean package
cd benchmarks-java-8
  • CaffeinedDispatcherBenchmark:
java -server -Xms5G -Xmx5G -jar target/benchmarks-java-8.jar CaffeinedDispatcherBenchmark -p cacheType="Bounded, Unbounded" -wi 5 -i 5
  • WorkStealingDispatcherBenchmark:
 java -server -Xms5G -Xmx5G -jar target/benchmarks-java-8.jar WorkStealingDispatcherBenchmark -p cacheType="Unbounded" -p threadPoolType="ForkJoinPool,FixedThreadPool" -wi 5 -i 5
  • ThreadBoundHashDispatcher:
java -server -Xms5G -Xmx5G -jar target/benchmarks-java-8.jar ThreadBoundHashDispatcherBenchmark -wi 10 -i 5

Benchmark graphs:

Important note: As can be seen, after introducing significant updates to Java 8, ForkJoinPool is a way more scalable, including ConcurrentHashMap changes compared to JDK 7.

####JDK 8: 1.8.0_45

####Bounded Caching:

Random dispatchIds from a fixed set

Single dispatchId

Unique dispatchids

####Unbounded Caching: Random dispatchIds from a fixed set

Single dispatchId

Unique dispatchids

####JDK 7: version jdk1.7.0_71

####Unbounded Caching:

Random dispatchIds from a fixed set

Single dispatchId

Unique dispatchids

About

Highly-concurrent dispatchers for parallelization of work on multicore-CPU architectures for java 7, 8. Used advanced caching and concurrency techniques.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages