Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiOperationExecutionFailure #398

Merged
merged 8 commits into from
Apr 24, 2024

Conversation

stephanos
Copy link
Contributor

@stephanos stephanos commented Apr 19, 2024

What changed?

Added MultiOperationExecutionFailure and updated comments.

Usage can be seen in temporalio/api-go#158

Why?

The current MultiOperation error response isn't easy to parse; by adding this explicit type it can be parsed unamigously.

Breaking changes

Changes the error behavior of the experimental MultiOperation API. (not in use yet)

Server PR

Not needed.

@stephanos stephanos marked this pull request as ready for review April 20, 2024 00:38
@stephanos stephanos requested review from a team as code owners April 20, 2024 00:38
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting we don't merge this until we understand the ramifications of including Google gRPC protos into SDKs separately from gRPC libraries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little worried about the inclusion of this object with our SDKs and how they code gen. Up until now we have only included third party protos that are either part of standard protobuf (so are never gen'd, they come with the protobuf library of that language) or are only used for annotations (so can be ignored at runtime).

To now include a third party proto for use at runtime that also ships with gRPC libraries of languages (some SDKs don't include gRPC dependencies) may prevent some challenges/conflicts (you could get duplicate type registration errors when they do add a gRPC dependency for their own use).

If we want to go this route, I think we need to confirm for every SDK that it does not clash with that language's gRPC library use. An alternative would be to create your own message that mimics the gRPC status message.

Copy link
Contributor Author

@stephanos stephanos Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. What do you think about this alternative:

// One `google.grpc.Status` for each requested operation from the failed MultiOperation. The failed
// operation(s) have the same error details as if it was executed separately. All other operations have the
// status code `Aborted` and `MultiOperationExecutionAborted` is added to the details field.
repeated google.protobuf.Any statuses = 1;

Creating a copy of the gRPC status message, like you suggested, could also work ofc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we create our own status message type that mimics the GRPC one; that will make integrating new languages easier as we don't know if any future language's GRPC library will have this conflict.

It also gives us full control of the object, so we could annotate it with additional details like which operation each failure is for (which the current approach doesn't do, as far as I can tell).

Copy link
Contributor Author

@stephanos stephanos Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

it also gives us full control of the object, so we could annotate it with additional details like which operation each failure is for

My intent was/is to use the exact same Status message so it can be re-used (after a conversion step) by the existing SDK code. Not knowing the other SDKs, at least in Go it unlocks straight-forward code re-use that makes the integration of this error type easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using google.rpc.Status inside an Any has the same problems. People have to have that proto to use it. Some of our SDKs don't have this proto by default and can't have their own version because it'd clash in the global type registry if a user did have gRPC w/ its form of the proto w/ the same qualified name (Any just contains the type URL as a qualified name of the message it usually looks up in a global type registry).

Not knowing the other SDKs, at least in Go it unlocks straight-forward code re-use that makes the integration of this error type easier.

This is one of those (rare) cases where the different SDKs make a difference. The only reason it works in Go is because we depend on https://pkg.go.dev/google.golang.org/grpc which transitively depends on https://pkg.go.dev/google.golang.org/genproto/googleapis/rpc/status. Some other SDKs do not have a representation of the gRPC status proto because they do not depend on the gRPC library in lang (because Rust Core does their gRPC, not the lang themselves).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for providing the SDK perspective, that's very helpful 👍

Comment on lines 195 to 194
// NOTE: `RpcStatus` is modelled after
// [`google.rpc.Status`](https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto).
//
// The `RpcStatus` type defines a logical error model that is suitable for
// different programming environments, including REST APIs and RPC APIs. It is
// used by [gRPC](https://github.com/grpc). Each `RpcStatus` message contains
// three pieces of data: error code, error message, and error details.
//
// (-- api-linter: core::0146::any=disabled
// aip.dev/not-precedent: detail is meant to hold arbitrary payloads. --)
message RpcStatus {
// The status code, which should be an enum value of
// [google.rpc.Code][google.rpc.Code].
int32 code = 1;

// A developer-facing error message, which should be in English. Any
// user-facing error message should be localized and sent in the
// [google.rpc.Status.details][google.rpc.Status.details] field, or localized
// by the client.
string message = 2;

// A list of messages that carry the error details. There is a common set of
// message types for APIs to use.
repeated google.protobuf.Any details = 3;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating between embedding it in MultiOperationExecutionFailure and a standalone message. Let me know if you think it should be embedded instead. I don't expect re-use of this message, but at the same time, I don't see any MultiOperation-specific needs either.

Copy link
Member

@cretz cretz Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think embedded. It's compatible to move a message later, might as well put this as a nested message in the only place that needs it for now IMO

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we didn't have to have whole gRPC statuses, but I can see no better way

repeated RpcStatus statuses = 1;

// NOTE: `RpcStatus` is modelled after
// [`google.rpc.Status`](https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need to copy the docs though, just pointing to that is enough. It's confusing to see some Google proto docs on the fields here. But no big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's helpful feedback 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like word Rpc here. It is "status of the step which is a single operation". OperationStatus looks better to me.

@stephanos stephanos merged commit 269455f into temporalio:master Apr 24, 2024
3 checks passed
@stephanos stephanos deleted the multiops-errors branch April 24, 2024 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants