Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability for certain task failure types to fail workflow #205

Merged
merged 2 commits into from
Mar 21, 2024

Conversation

cretz
Copy link
Member

@cretz cretz commented Mar 18, 2024

What was changed

  • Added TemporalWorkerOptions.WorkflowFailureExceptionTypes to allow configuring failure exception types at a worker level
  • Added WorkflowAttribute.FailureExceptionTypes to allow configuring failure exception types at a workflow level
  • Update core to get Enable failing WF on nondeterminism errors sdk-core#702
  • Abstracted InvalidWorkflowOperationException into separate exceptions for InvalidWorkflowSchedulerException and WorkflowNondeterminismException
    • This exception used to be used for both, but it's abstracted now so the latter can be referenced as a failure type
  • Updated bridge code to include worker-level and workflow-type-level non-determinism-as-fail based on the presence of WorkflowNondeterminismException (or super type) in worker/workflow options
  • Documented in README, API doc, and marked experimental
  • Several tests for several scenarios

Checklist

  1. Closes [Feature Request] Ability for certain task failure types to fail workflow #166

@cretz cretz requested a review from a team March 18, 2024 17:23
/// this exception can be used as a failure exception type to have non-deterministic exceptions
/// fail workflows.
/// </remarks>
public class WorkflowNondeterminismException : InvalidWorkflowOperationException
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Java capitalizes the D here but core does not, open to suggestions

Comment on lines 361 to 365
bool nondeterminism_as_workflow_fail;
/**
* This is expected to be a newline-delimited list
*/
struct ByteArrayRef nondeterminism_as_workflow_fail_for_types;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have these specially for nondeterminism rather than passing through the types and converting that to the Rust types?

If/when we add more types there will need to be two fields for every future type rather than just adding a branch to the converter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a private FFI boundary and I only expose the most I need. I am trying to avoid overcomplicating the FFI code until there is a second type. I would not be surprised if that enum stays as only one item forever, heh, but it's much easier for Rust to build those abstractions than C.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but, you've got to do the conversion either way. It's either the same amount of complicated, or if we ever add anything else, less complicated. You can just pass through an an enum or something, super simple, and less work to extend later.

Copy link
Member Author

@cretz cretz Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more complicated to do C arrays (not "super simple") and I have been lucky to avoid them so far because I can make expectations about how I can split single-value fields. But it sounds like my luck is running out, ug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added string arrays in this commit and removed the newline restriction

opt.nondeterminism_as_workflow_fail_for_types
.to_option_str()
.map(|s| {
s.split('\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make the to map helper for this generic and use it here.

I'm still not a fan of parsing on newlines like this, but it's not hugely important to change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have a map helper for metadata (and it's newline delimited too), but I admit being lazy, because getting collections of strings right in a non-annoying way over FFI boundary requires basically building your own structures (or saying something like \0 delimited) and I was hoping to avoid it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added string arrays in this commit and removed the newline restriction

Comment on lines +227 to +228
/// <see cref="Exceptions.FailureException" /> + cancellation and suspend via task failure
/// all others. But this default may change in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// <see cref="Exceptions.FailureException" /> + cancellation and suspend via task failure
/// all others. But this default may change in the future.
/// <see cref="Exceptions.FailureException" />. Other exception types will cause either cancellation or suspension via task failure. This default may change in the future.

Copy link
Member Author

@cretz cretz Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessarily true with how the code is today. I trap cancellation errors and return workflow fail even when cancellation was not requested because of how common they appear and that it's unfair to have already-started timer cancellation be a wf fail but not-yet-started timer cancellation be a task fail.

Comment on lines +60 to +61
/// <see cref="Exceptions.FailureException" /> + cancellation and suspend via task failure
/// all others. But this default may change in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// <see cref="Exceptions.FailureException" /> + cancellation and suspend via task failure
/// all others. But this default may change in the future.
/// <see cref="Exceptions.FailureException" />. Other exception types will cause either cancellation or suspension via task failure. This default may change in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment on similar change

Comment on lines 156 to 159
if (name != null && name.Contains('\n'))
{
errs.Add("Workflow name cannot have a newline");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example why the newline thing scares me.

I mean, it's dumb to have a newline in a workflow name anyway, but, still.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already occurs with metadata today, but I am preventing the newline explicitly. I don't think this use case is worth me building my own C string vec representation. But I can if I must.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well does that really mean we need to build our own? I've got to imagine this comes up fairly often...

And yeah, I know it's happening with metadata but the fact that we'll sometimes have to deal with edge cases like this as we add new things says to me it probably is worth a little bit of effort to just deal with that problem for good.

Copy link
Member Author

@cretz cretz Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C what you have to do is accept a pointer to the first element and then a length. And each element needs to know its size. And with ownership rules over FFI, it means each element needs to remain owned on the lang side. So it's a pointer to an owned array of pointers to owned strings. But this is totally doable, I've just avoided it because I've never needed it and it's ugly/dangerous. Didn't seem worth it just to keep newlines in workflow types, but I will try to add it now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added string arrays in this commit and removed the newline restriction

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Yeah, that looked pretty easy and worth it.

@cretz cretz merged commit 67a52ff into temporalio:main Mar 21, 2024
6 checks passed
@cretz cretz deleted the workflow-failure-exception-types branch March 21, 2024 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Ability for certain task failure types to fail workflow
2 participants