Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Committed transactions lost when the cluster config placement was invalid #16402

Closed
1 task done
FranckPachot opened this issue Mar 12, 2023 · 3 comments
Closed
1 task done
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@FranckPachot
Copy link
Contributor

FranckPachot commented Mar 12, 2023

Jira Link: DB-5811

Description

If we create a cluster without a valid cluster level data placement config, which is the default with yugabyted start, we cannot create tables and get an error Not enough tablet servers in the requested placements

However, I can set the placement at tablespace level, like with all leaders in one region and followers in the other regions, and can create tables and run transactions.

In this case it seems that the transaction table is not replicated (because of the invalid cluster-level config) which means that if one region fails we cannot read data anymore (ERROR: Query error: GetTransactionStatus RPC) until it is back, and if it never comes back the committed changes are lost.

This is easy to reproduce following all steps of https://dev.to/yugabyte/simulate-network-latency-in-a-yugabytedb-cluster-on-a-docker-lab-264a except the yugabyted configure data_placement. In this case we will see that the COMMIT (END; in PgBench) doesn't wait on other nodes to acknowlege and taking down the first node (where the leaders are) will make the committed changes unreadable.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@nchandrappa
Copy link
Contributor

@FranckPachot For multi-zone/multi-region setup using yugabyted, its mandatory to run the configure data_placement command so that the right cluster level config gets applied. Yugabyted computes the data placement constraint based on the values provided in cloud_location.

Currently, in yugabyted, when a 3rd node joins the cluster, we automatically apply the data placement with the default placement constraint cloud1.datacenter1.rack1 and change the RF to 3. I believe we should update this behavior if the cloud_location value is provided during the start.

Proposed change

During the cluster creation, when a 3rd node joins the cluster, apply the data placement constraint based on the cloud_location value.

  • This solves the issue when someone is trying an RF-3 on single-zone
  • For multi-zone/multi-region it's mandatory to run the configure data_placement command

@yugabyte-ci yugabyte-ci added area/docdb YugabyteDB core features and removed area/ysql Yugabyte SQL (YSQL) labels Mar 28, 2023
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Mar 28, 2023
@bmatican bmatican assigned nchandrappa and unassigned es1024 Mar 30, 2023
gargsans-yb added a commit that referenced this issue Apr 4, 2023
Summary:
Changes to set the cluster config including data placement policy and rf to 3 as soon as
the 3rd node joins through start command. `yugabyted configure data_placement` is still necessary
for setting up a multi-zone/region cluster.

Test Plan: Manual Testing

Reviewers: nikhil

Reviewed By: nikhil

Subscribers: sgarg-yb

Differential Revision: https://phabricator.dev.yugabyte.com/D23971
gargsans-yb added a commit that referenced this issue Apr 4, 2023
…he 3rd node joins

Summary:
Changes to set the cluster config including data placement policy and rf to 3 as soon as
the 3rd node joins through start command. `yugabyted configure data_placement` is still necessary
for setting up a multi-zone/region cluster.

Test Plan: Manual Testing

Reviewers: nikhil

Reviewed By: nikhil

Subscribers: sgarg-yb

Differential Revision: https://phabricator.dev.yugabyte.com/D24133
@gargsans-yb
Copy link
Contributor

Landed the changes to master and Backported to 2.17.3

premkumr pushed a commit to premkumr/yugabyte-db that referenced this issue Apr 10, 2023
…e joins

Summary:
Changes to set the cluster config including data placement policy and rf to 3 as soon as
the 3rd node joins through start command. `yugabyted configure data_placement` is still necessary
for setting up a multi-zone/region cluster.

Test Plan: Manual Testing

Reviewers: nikhil

Reviewed By: nikhil

Subscribers: sgarg-yb

Differential Revision: https://phabricator.dev.yugabyte.com/D23971
@FranckPachot
Copy link
Contributor Author

@gargsans-yb I see the changes for yugabyted to avoid this misconfiguration but shouldn't we raise an error in DocDB when the transaction table cannot be created correctly?
A user may do the same misconfiguration manually and will loose data one day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

5 participants