Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proto and persistence definitions for replicated versioning #4172

Merged
merged 6 commits into from Apr 20, 2023

Conversation

bergundy
Copy link
Member

While designing replication for the worker versioning feature, we realized we need to make some schema changes and add a new table to track user provided data for a task queue.
When a new cluster comes online, we'll need to seed it with data from the task_queue_user_data table.
This table will be scanned and replication tasks will be generated from each entry in the table.

@bergundy bergundy requested a review from a team as a code owner April 14, 2023 21:54
@bergundy bergundy self-assigned this Apr 14, 2023
go.mod Outdated
@@ -142,3 +142,8 @@ require (
modernc.org/strutil v1.1.3 // indirect
modernc.org/token v1.1.0 // indirect
)

replace (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE that I'm merging this PR into the worker-versioning feature branch, we should keep these replacements until this Go SDK change is published and we can merge the API changes.

@bergundy
Copy link
Member Author

I know I'll need to provide a schema upgrade script for adding these new tables but for now let's just make sure that the definitions are correct.

// If the request's hash matched, this variant is set (and will be true).
bool matched_req_hash = 2;
temporal.server.api.persistence.v1.TaskQueueUserData user_data = 1;
int64 user_data_version = 2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to change this to a bool

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this file from the PR, I'll add in a follow up PR though.


// Container for all persistent user provided data for a task queue.
message TaskQueueUserData {
// Updated every time a new timestamp is generated to ensure the semantics of the Hybrid Logical Clock.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confusing. "every time a new timestamp is generated" for what purpose?

Something like "Updated whenever any contained data is modified" is more straightforward, unless that's inaccurate in some way I don't follow.

task_queue_name text,
data binary, -- temporal.server.api.persistence.v1.TaskQueueUserData as proto binary blob
version int64, -- Version of this row, used for optimistic concurrency
updated_at timestamp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the timestamp here if it's in the blob? For sorting or something?

Including the version column, it's duplicative of the HLC in the blob. Might we only have it here, and not need it in the blob?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be used for sorting really..
I initially wanted to use this when seeding the a new cluster with replication tasks to verify the table isn't updated in the meantime but I think it could also be useful as general metadata on the table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might we remove it from the blob itself then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same time, here's it's wall clock and on the struct it's a cluster local HLC.
Also note that the HLC only progresses when values are generated due to user interaction, not when applying replication events.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also confused at why there are both and what the function of each one is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just leave this out and see if we need it later.
I was thinking we could use this timestamp to verify that no new items are added after we seed the replication queue when a new cluster comes online.

Copy link
Member

@dnr dnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's lots of unrelated stuff in the diff.. can you rebase the branch manually (push without a PR) and then rebase this PR on top of that to have only the versioning-related stuff?

temporal.server.api.clock.v1.HybridLogicalClock state_update_timestamp = 3;
}

// An internal represenation of https://github.com/temporalio/api/blob/9206f1556469515d8e5a82c4d4bf21f2dd9730ce/temporal/api/taskqueue/v1/message.proto#L92
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write the fully qualified name instead of linking to a random commit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will do.

// An internal represenation of https://github.com/temporalio/api/blob/9206f1556469515d8e5a82c4d4bf21f2dd9730ce/temporal/api/taskqueue/v1/message.proto#L92
message CompatibleVersionSet {
// Set IDs are used internally by matching.
// The first set ID for a given set is the first added build ID.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's not, or at least not directly. just leave this out? it's not required to describe the semantics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll just leave it out.
I think it should be the first build ID, but let's discuss off PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be a hash of the first build id, perhaps. but no need to describe that here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this line

// HLC timestamp representing when the state was updated or the when build ID was originally inserted.
// (-- api-linter: core::0142::time-field-type=disabled
// aip.dev/not-precedent: Using HLC instead of wall clock. --)
temporal.server.api.clock.v1.HybridLogicalClock state_update_timestamp = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use these timestamps for anything other than delete, right? should we consider having this be a dual-purpose field: "if absent, id is active. if present, it's deleted with this timestamp." ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because if a cluster rejects the deletion it may "resurrect" the build ID by setting a newer timestamp and state to ACTIVE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay. I guess you can do it by having two fields for an update timestamp and a delete timestamp, but there's no real advantage of that over this

// Updated whenever user data is directly updated due to a user action but not when applying replication events.
// The clock is referenced when new timestamps are generated to ensure it produces monotonically increasing
// timestamps.
temporal.server.api.clock.v1.HybridLogicalClock clock = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly is this one used for again? I thought the timestamps within versioning_data are enough to merge

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This keeps the last clock in order to produce a new clock on update.
We need this to ensure that the clock is monotonically increasing in a single cluster context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right. when I read reading above I was wondering where it stores the last used timestamp to avoid duplicates, but I didn't connect this to that. to be nitpicky: the comment above should say it's the last recorded cluster-local HLC timestamp for this task queue. another task queue might have a later one recorded (due to clock skew) but we never compare them across task queues so it doesn't matter.

CREATE TABLE task_queue_user_data (
namespace_id uuid,
task_queue_name text,
data binary, -- temporal.server.api.persistence.v1.TaskQueueUserData as proto binary blob
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add data_encoding and change the above comment to not say it's a proto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this for the sql tables and forgot cassandra, thanks!

schema/cassandra/temporal/schema.cql Outdated Show resolved Hide resolved
task_queue_name text,
data binary, -- temporal.server.api.persistence.v1.TaskQueueUserData as proto binary blob
version int64, -- Version of this row, used for optimistic concurrency
updated_at timestamp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also confused at why there are both and what the function of each one is

@bergundy
Copy link
Member Author

there's lots of unrelated stuff in the diff.. can you rebase the branch manually (push without a PR) and then rebase this PR on top of that to have only the versioning-related stuff?

Yeah, I updated the PR branch and forgot to update the base branch... first time trying this method...

// HLC timestamp representing when the state was updated or the when build ID was originally inserted.
// (-- api-linter: core::0142::time-field-type=disabled
// aip.dev/not-precedent: Using HLC instead of wall clock. --)
temporal.server.api.clock.v1.HybridLogicalClock state_update_timestamp = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay. I guess you can do it by having two fields for an update timestamp and a delete timestamp, but there's no real advantage of that over this

// An internal represenation of https://github.com/temporalio/api/blob/9206f1556469515d8e5a82c4d4bf21f2dd9730ce/temporal/api/taskqueue/v1/message.proto#L92
message CompatibleVersionSet {
// Set IDs are used internally by matching.
// The first set ID for a given set is the first added build ID.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be a hash of the first build id, perhaps. but no need to describe that here

// Updated whenever user data is directly updated due to a user action but not when applying replication events.
// The clock is referenced when new timestamps are generated to ensure it produces monotonically increasing
// timestamps.
temporal.server.api.clock.v1.HybridLogicalClock clock = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right. when I read reading above I was wondering where it stores the last used timestamp to avoid duplicates, but I didn't connect this to that. to be nitpicky: the comment above should say it's the last recorded cluster-local HLC timestamp for this task queue. another task queue might have a later one recorded (due to clock skew) but we never compare them across task queues so it doesn't matter.

data binary, -- temporal.server.api.persistence.v1.TaskQueueUserData
data_encoding text, -- Encoding type used for serialization, in practice this should always be proto3
version int64, -- Version of this row, used for optimistic concurrency
-- task_queue_name is not a part of the parititioning key to allow iterating all task queues in a single namespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: even if it was part of the partition key, we could still iterate. it would just be more expensive

Suggested change
-- task_queue_name is not a part of the parititioning key to allow iterating all task queues in a single namespace.
-- task_queue_name is not a part of the parititioning key to allow cheaply iterating all task queues in a single namespace.

message HybridLogicalClock {
// Wall clock - A single time source MUST guarantee that 2 consecutive timestamps are monotonically non-decreasing.
// e.g. by storing the last wall clock and returning max(gettimeofday(), lastWallClock).
int64 wall_clock = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to document the units and epoch or does it not matter?

@bergundy
Copy link
Member Author

Addressed all outstanding comments and also made a few schema changes after running integration tests and finding a few flaws.

@bergundy
Copy link
Member Author

I'm merging this.
@dnr you may want to take a look at the sql tables, those changed a bit to use VARCHAR(255) for task_queue_name instead of task_queue_id in the previous commit.
The other matching tables use task_queue_id which is a combination of namespace ID, task queue name, and task queue type but I figured it's better to have separate columns for namespace and task queue.
If there are any comments I'll address in the next PR.

@bergundy bergundy merged commit 57b6718 into temporalio:worker-versioning Apr 20, 2023
4 of 5 checks passed
@bergundy bergundy deleted the task-queue-user-data branch April 20, 2023 05:53
dnr pushed a commit that referenced this pull request May 26, 2023
Note: This commit came from a feature branch and is not expected to build.
dnr pushed a commit to dnr/temporal that referenced this pull request May 26, 2023
…io#4172)

Note: This commit came from a feature branch and is not expected to build.
dnr pushed a commit that referenced this pull request May 26, 2023
Note: This commit came from a feature branch and is not expected to build.
dnr pushed a commit that referenced this pull request May 26, 2023
Note: This commit came from a feature branch and is not expected to build.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants