Skip to content

Commit

Permalink
refactor: rewrite section
Browse files Browse the repository at this point in the history
  • Loading branch information
npepinpe committed Jan 21, 2024
1 parent 4284ff6 commit dd013f3
Showing 1 changed file with 14 additions and 4 deletions.
18 changes: 14 additions & 4 deletions chaos-days/blog/2024-01-19-Job-Activation-Latency/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ Grossly simplified, the implementation worked like this:
- Whenever jobs are received from a partition, it forwards them to the client
- When all partitions are exhausted, or the maximum number of jobs have been activated, the request is closed


Already we can infer certain performance bottle necks based on the following:

- Every request - whether client to gateway, or gateway to broker - adds delay to the activation latency
Expand Down Expand Up @@ -93,8 +92,21 @@ However, there are still some issues:

In order to solve these issues, the team decided to implement [a push-based approach to job activation](https://github.com/camunda/zeebe/issues/11231).

Essentially, we added a new `StreamActivatedJobs` RPC to our gRPC protocol, a so-called [server streaming RPC](https://grpc.io/docs/what-is-grpc/core-concepts/#server-streaming-rpc). In our case, this is meant to be a long-lived stream, such that the call is completed only if the client terminates it, or if the server is shutting down.

The stream itself has the following lifecycle:

![Job push](./job-push.png)

- The client initiates the stream by sending a job activation request much like with the `ActivateJobs` RPC.
- Since the stream is meant to be long lived, however, there is no upper bound on the number of jobs to activate.
- The gateway registers the new stream with all brokers in the cluster
- Note that there is no direct connection between brokers and client; the gateway acts as a proxy for the client.
- When jobs are available for activation (e.g. on creation, on timeout, on backoff, etc.), the broker activates the job and pushes it to the gateway.
- The gateway forwards the job to the client.

[You can read more about the implementation as part of our docs.](https://docs.camunda.io/docs/components/concepts/job-workers/#how-it-works)

This solved most, if not all, of the problems listed above:

- Brokers push jobs out immediately as they become available, removing the need for a gateway-to-broker request.
Expand All @@ -104,11 +116,9 @@ This solved most, if not all, of the problems listed above:
- Scaling out your clients adds little to no load to the system, as idle clients simply do nothing.
- Even if you have a lot of jobs, in the average case, you never have to iterate over them and instead the broker pushes the job out on creation.

### Implementation

### Tests, results, and comparisons


### Tests, results, and comparisons

## Future work and documentation

Expand Down

0 comments on commit dd013f3

Please sign in to comment.