Callbacks as a plugin part 1 #5446

bergundy · 2024-02-21T22:45:59Z

Notes for reviewers

After talking to @yycptt, I decided to go with his approach for state and task staleness checks, which will be detailed in a later PR but for now I'm staying with the original design
This PR's base is the nexus branch
I don't consider this a final approach but I do think it's a step in the right direction, we need to model more state machines on top of this to form a more solid API
This PR is part one of two or maybe three for this refactoring work
This PR does not compile, I cherry picked the hsm and plugins directories from the WIP sub-state-machines branch.

What changed?

Modified the statemachines abstraction to be a bit more generic
Rewrote callbacks as a plugin using this framework

Why?

This centralizes most the callback code in the plugin directory instead of having it spread out the entire project moving common concerns such as staleness checks, task generation, and (in the future) replication into a framework and should generally help speed up feature delivery and maintainability.

I plan to leverage this framework when implementing Nexus operations.

How did you test it?

Existing tests from the feature branch and added unit tests.

pdoerner · 2024-02-23T00:59:19Z

service/history/plugins/callbacks/executors/executors.go

+	destination string
+
+	url        string
+	completion nexus.OperationCompletion


I'm a little confused. Is this callback implementation intended to be generic? It seems to be coupled to some Nexus concepts. If I were to add a new callback variant, would I need to define a new executable for that or would I modify this one?

For now this is what we support, we can extend this same code to support more types of callbacks.

pdoerner · 2024-02-23T00:59:22Z

service/history/plugins/callbacks/config.go

+)
+
+// InvocationTaskTimeout is the timeout for executing a single callback invocation task.
+var InvocationTaskTimeout = "plugin.callback.invocation.taskTimeout"


I'm a little nervous about setting a precedent of putting dynamic config keys somewhere other than dynamicconfig/constants. The current organization definitely is not perfect, but I worry that without a defined best practice it will get confusing.

I understand the concern, but I think this is a step in the right direction for our codebase.
I even want the proto definitions for the callback state machine to be included in the plugin directory.

pdoerner · 2024-02-23T00:59:25Z

proto/internal/temporal/server/api/persistence/v1/executions.proto

-message CallbackInfo {
-    // The namespace failover version at the time this callback info was updated.
+// State-machine information.
+message StateMachine {


Maybe StateMachineInfo to keep the naming scheme consistent?

I considered this and that's what I had before changing this name.
I like just calling this state machine, I don't think info is adding much. It's also not used in the exact same way as the other Info messages.

pdoerner · 2024-02-23T00:59:32Z

proto/internal/temporal/server/api/persistence/v1/executions.proto

+    string id = 2;
+    // Namespace failover version on the corresponding state machine object, used for staleness detection when global
+    // namespaces are enabled.
+    int64 version = 3;


Maybe namespace_failover_version to keep the variable name consistent with other places we reference this type of version?

That would be fine with me but in most other places in the codebase it's called version. Maybe we need to change all of the new names to version for consistency?

Ah I didn't realize those other versions were also for namespace failover. I like the more descriptive name, but not too picky.

I'm taking your suggestion.

pdoerner · 2024-02-23T01:01:14Z

service/history/plugins/callbacks/tasks.go

+	return nil, nil
+}
+
+func RegisterTaskSerializer(reg *statemachines.Registry) error {


I'm a little curious why registering a state machine, executor, and task serializer are all separate steps. Is it possible to have one without the others?

Yeah, I considered adding a RegisterPlugin method and having a more "well defined" concept but for now I'd rather have the flexibility before solidifying that interface.

MichaelSnowden · 2024-02-22T19:42:51Z

service/history/plugins/callbacks/executors/executors_test.go

+			name: "non-retryable-error",
+			caller: func(r *http.Request) (*http.Response, error) {
+				return &http.Response{StatusCode: 500}, nil
+			},
+			assertOutcome: func(t *testing.T, cb callbacks.Callback) {
+				require.Equal(t, enumspb.CALLBACK_STATE_BACKING_OFF, cb.PublicInfo.State)
+			},


I think the returned error should be a non-retryable 4xx and the expected state should be something terminally failed

MichaelSnowden · 2024-02-22T20:52:06Z

service/history/plugins/callbacks/fx/module.go

+var Module = fx.Module(
+	"plugin.callbacks",
+	fx.Provide(callbacks.ConfigProvider),
+	fx.Invoke(callbacks.RegisterTaskSerializer),


FYI, Invoke is variadic in case you want to use that style instead

MichaelSnowden · 2024-02-23T04:42:33Z

service/history/statemachines/tasks.go

+}
+
+// TaskKind represents the possible set of kinds for a task.
+// Each kind is mapped to a concrete [tasks.Task] instance and is backed by specific protobuf message; for example,


It's mapped to a concrete [tasks.Task] implementation, not instance, right?

Yes, thanks.

MichaelSnowden · 2024-02-23T05:11:12Z