Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix partial namespace creation #4267

Merged

Conversation

samanbarghi
Copy link
Contributor

@samanbarghi samanbarghi commented May 2, 2023

What changed?
Do not return NamespaceAlreadyExists when it is exists in first table but not in second table, a follow up attempt to create the namespace should resolve the problem by adding the missing record to the second table. Alternatively, if the namespace with the same name and ID exists in the second table, return a NamespaceAlreadyExists error.

Also, now that it's possible to get to a point inCreateNamespaceInV2Table to get a row with the same namespace and id as the request, only delete the entry in namespace_by_id only if the Ids do not match.

Fixes an existing bug with checking the id in the returned entry from the namespace table.

Why?
For Cassandra implementation,
Namespace creation put 1 record into namespaces_by_id table, and subsequently another record into namespaces table.
If the first step succeed, but second step failed, we would end up in an unrecoverable situation.
Retry to create namespace will return NamespaceAlreadyExists error, but when trying to access the namespace will return NotFound error.

How did you test it?
Existing tests

Potential risks

Is hotfix candidate?
No

@samanbarghi samanbarghi requested a review from a team as a code owner May 2, 2023 19:56
@samanbarghi samanbarghi force-pushed the bug/fix-partial-namespace-creation branch from 4c39976 to a177c25 Compare May 5, 2023 05:18
common/persistence/cassandra/metadata_store.go Outdated Show resolved Hide resolved

if id, ok := previous["Id"].([]byte); ok {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bug here: it should be previous["id"] instead of previous["Id"]. Also, its type is goql.UUID which cannot be casted to []byte. This code path would never trigger, it would fall back on retuning the same error after this which was correct but hides the issue.

if id != request.ID {
// Namespace with the given name already exists with a different ID.
// Delete orphan namespace record before returning back to user
deleteOrphanNamespace()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are calling this method if an entry with the same id and name exists in namespace_by_id table, if we find out that an entry with same name but different id exists, it means a new row was added to namespace_by_id table and should be removed, otherwise having an existing row is expected.

m.logger.Warn("Unable to delete orphan namespace record. Error", tag.Error(errDelete))
}
}

if err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to delete orphan row from namespace_by_id table, since we cannot be sure if it's an orphan row or the corresponding entry exists in namepsaces. This should be cleaned up later if CreateNamespace is called with the same namespace but a different Id if it's an orphan entry.

// If namespace does not exist already and applied is false,
// notification_version does not match our expectations and it's conditional failure.
// Delete orphan namespace record before returning back to user
deleteOrphanNamespace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not required since first table can handle retry now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if retry does not come right away and later the namespace with the same name but different id is added? If that's a possibility, there will be an orphan entry in the first table. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I am not against doing this, mainly want to confirm my understanding. Please feel free to land PR.
  • From guarantee perspective, the deletion can also fail, so there's always a possibility there will be an orphan entry in the first table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, both approaches are fine I guess. I am a fan of cleaning up and leave things as they were in case of error, so let me keep it as is then.

@samanbarghi samanbarghi merged commit 3cbab42 into temporalio:master May 11, 2023
10 checks passed
MichaelSnowden added a commit that referenced this pull request May 12, 2023
MichaelSnowden added a commit that referenced this pull request May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants