Skip to content

Conversation

Spikhalskiy
Copy link
Contributor

@Spikhalskiy Spikhalskiy commented Sep 25, 2022

Why

WorkflowExecutorCache has a global (for all workflow task processors) lock that especially comes into play when forced evictions are needed from the cache.
The only reason for this lock is to update the cache and inProcessing collection in a thread-safe manner.
inProcessing in the cache is not needed, as it duplicates the functionality and the state of WorkflowRunLockManager that maintains all RunIds currently in-flight on the worker and does it in an effective manner for a highly concurrent environment.

What

This PR removes the global lock and makes the cache to use the effective WorkflowRunLockManager as a source of truth of in-flight RunIds

@Spikhalskiy Spikhalskiy force-pushed the rework-workflow-cache-1 branch from 2623657 to 08915f9 Compare September 25, 2022 16:47
@Spikhalskiy
Copy link
Contributor Author

Bypassing review requirements to put this PR to test. Code Review will be done retroactively and comments will be addressed in a separate PR before the next release of Java SDK.

@Spikhalskiy Spikhalskiy merged commit e620a9f into temporalio:master Sep 25, 2022
@Spikhalskiy Spikhalskiy deleted the rework-workflow-cache-1 branch September 25, 2022 17:35
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


final class WorkflowRunLockManager {
public final class WorkflowRunLockManager {
private final Map<String, RefCountedLock> runIdLock = new ConcurrentHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know not changed in this PR, but would change to type ConcurrentMap for documentation/expectation purposes. Confused me at first read.

In fact, after review, maybe the documented type for this should be ConcurrentHashMap or at least document the expectations around compute, because Map.compute isn't thread-safe obviously and ConcurrentMap.compute can run the closure multiple times (which is bad for us because we have a side-effecting closure there). But ConcurrentHashMap.compute seems to guarantee a single execution.

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, Core has per-run locks.

It's a much bigger change, but, the bit of code here about multiple WFTs for the same Run ID serializing application by waiting on the lock could likely be simplified by stuffing them into a per-run queue which is kinda how core operates (it's streams/channels, but same idea).

That's a bigger change obviously, food for thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants