Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eventually-consistent reads from a tablet replica without any committed records will time out indefinitely #124

Open
mbautin opened this issue Mar 23, 2018 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@mbautin
Copy link
Collaborator

mbautin commented Mar 23, 2018

Jira Link: DB-1875
Found when investigating #120. The issue is because we need to obtain a safe time value when reading, and we currently don't allow that safe time to be the minimum possible hybrid time. We wait to get the propagated safe time from the leader, or learn about an operation with a certain hybrid time being committed in Raft.

@mbautin mbautin self-assigned this Mar 23, 2018
yugabyte-ci pushed a commit that referenced this issue Mar 24, 2018
…wait until servers converge on committed OpId as opposed to just last received

Summary:
LinkedListTest.TestLoadWhileOneServerDownAndVerify brings one tablet server down, loads some data,
then brings the server up and waits for it to catch up, then shuts down the two other servers and
verifies it can still read the data from the first server. It used to fail once in a while as
pointed out by @lumedar in https://forum.yugabyte.com/t/why-does-the-unit-test-fail-to-pass/130.
The issue turned out to be in how we wait for the server that was originally down to catch up
with the rest. We waited for last received OpIds to be the same in all servers' Raft logs, but
that is not sufficient: only the leader can tell a follower that some operation is committed, and
if we shut down the two servers that originally loaded the data too early, the third server might
actually end up with a zero committed id. Filed #124 to track that remaining issue separately, and
fixed the test to wait for all servers' committed OpIds to also match their last received OpIds
All these 6 OpIds, in case of RF=3, should be the same for the test to proceed.

Test Plan: ybd linked_list-test --gtest_filter LinkedListTest.TestLoadWhileOneServerDownAndVerify -n 100

Reviewers: bogdan, sergei, bharat, kannan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D4454
@rkarthik007 rkarthik007 added this to To Do in Fault Tolerance via automation Apr 3, 2018
@rthallamko3 rthallamko3 added the area/docdb YugabyteDB core features label Mar 12, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 9, 2022
@yugabyte-ci yugabyte-ci assigned rthallamko3 and unassigned mbautin Jul 27, 2022
@rthallamko3
Copy link
Contributor

@amitanandaiyer , Seems like bc3bc7d fixes this? Can we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
Development

No branches or pull requests

4 participants