-
Notifications
You must be signed in to change notification settings - Fork 99
💥 C Bridge: Custom slot supplier rework #1025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Appreciate the comments a lot. Only one comment here that's important about if we should panic in an extra spot
drop(unsafe { Arc::from_raw(completion_ctx) }); | ||
true | ||
} | ||
SlotReserveOperationState::Pending => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docstring seems like this should also be a panic? It does seem wrong to call this if the cancel didn't happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it could panic, there is no reason to panic as no invariants have been broken. Safely returning false here may make it easier to make the implementation safe. The other branch panics because it potentially reads from freed memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised the docstring to make the intention clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet!
What was changed
💥 BREAKING CHANGE: Redesigned the code flow around custom slot supplier in C bridge to ensure memory safety in case of cancellation. From Lang's point of view it works mostly the same as before, with the following differences:
user_data
argument (not used by .Net implementation but very useful for other languages).reserve
, thectx
argument is short-lived, andcompletion_ctx
replacessender
.token_source
is removed fromSlotReserveCtx
and functiontemporal_core_set_reserve_cancel_target
is removed.cancel_reserve
takescompletion_ctx
argument instead oftoken_source
.temporal_core_complete_async_reserve
is safe to call after cancellation, making it possible to avoid race condition. This function now returns bool indicating whether the reservation was completed or cancelled.temporal_core_complete_async_cancel_reserve
that has to be called after cancellation to clean up resources.Additionally, implemented
available_slots
functionality, and renamed the callback type aliases to make their relation toCustomSlotSupplierCallbacks
more obvious.There is a matching PR in .Net SDK that implements the new API: temporalio/sdk-dotnet#532
Why?
The previous implementation had memory corruption bugs, see temporalio/sdk-dotnet#458
Checklist
Part of [Bug] Issues with ReserveCtxFromBridge sdk-dotnet#458
How was this tested:
There is no test for the original bug. Due to the nature of this bug, it's impossible to write a test that reliably triggers the problematic conditions. The fix can be verified over time by the lack of transient CI failures in .Net SDK caused by this bug.
General tests of the custom slot supplier exercising these APIs are done inside .Net SDK, see temporalio/sdk-dotnet#532