Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Investigate why the first request picks a local limit only when the read time is not set. #22158

Open
1 task done
pao214 opened this issue Apr 25, 2024 · 0 comments
Open
1 task done
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage

Comments

@pao214
Copy link
Contributor

pao214 commented Apr 25, 2024

Jira Link: DB-11085

Description

Context

Read time is either picked on docdb

Status ReadQuery::DoPickReadTime(server::Clock* clock) {
...
  const auto read_time_was_empty = !read_time_;
  if (read_time_was_empty) {
    safe_ht_to_read_ = VERIFY_RESULT(abstract_tablet_->SafeTime(require_lease_));
    // If the read time is not specified, then it is a single-shard read.
    // So we should restart it in server in case of failure.
    read_time_.read = safe_ht_to_read_;
    if (transactional()) {
      read_time_.global_limit = clock->MaxGlobalNow();
      read_time_.local_limit = std::min(safe_ht_to_read_, read_time_.global_limit);

      VLOG(1) << "Read time: " << read_time_.ToString();
    } else {
      read_time_.local_limit = read_time_.read;
      read_time_.global_limit = read_time_.read;
    }

or it is provided to docdb

Status ReadQuery::DoPickReadTime(server::Clock* clock) {
...
  const auto read_time_was_empty = !read_time_;
  if (read_time_was_empty) {
...
  } else {
    ...
    safe_ht_to_read_ =
        (current_safe_time > read_time_.read
             ? current_safe_time
             : VERIFY_RESULT(abstract_tablet_->SafeTime(
                   require_lease_, read_time_.read, context_.GetClientDeadline())));
  }

In the latter case, the local limit is set only for the second RPC to the tablet as part of the txn.

Status ReadQuery::Complete() {
...
    const auto result = VERIFY_RESULT(DoRead());
...
    read_time_ = result;
    // If read was successful, then restart time is invalid. Finishing.
    // (If a read restart was requested, then read_time would be set to the time at which we have
    // to restart.)
    if (!read_time_) {
      // allow_retry means that the read time was not set in the request and therefore we can
      // retry read restarts on the tablet server.
      if (!allow_retry_) {
        auto local_limit = std::min(safe_ht_to_read_, used_read_time_.global_limit);
        resp_->set_local_limit_ht(local_limit.ToUint64());
      }
      break;
    }

This is stated in c784595.

It starts off as global_limit for the first request to a tablet as part of the transaction, but then. for second and later requests to that tablet, is set to the safe time on that tablet returned to the YQL engine by the response to the first request.

but no reason was given for why local limit is not used for the first request. Moreover, this claim is not even accurate when the read time is picked on the tablet.

Motivation

local limit is most useful when the original time of the scanned intent is higher than this limit. Then, we can skip checking the uncertainty window because we have just established a causal relation: the read operation happens before the provisional write op. Similar logic applies to fast path writes that bypass the provisional write mechanism.

This local limit logic should be independent of whether the read time is picked on the tablet or provided to the tablet.

NOTE: The local limit is still picked as safe time of the tablet and not provided to docdb. So, the causal relation still holds.

Scenario 1

  • Read RPC arrived at tablet 1 from conn1 with a read time and no local_limit (first RPC).
  • A provisional intent from conn2 is replicated after the read RPC arrived.
  • This provisional intent is applied to regular docdb with intent_ht as the hybrid time of the provisional write.
  • The read RPC observed the regular record but no local limit was set. This leads to a read restart error.

The read clearly happened before the provisional write is raft-replicated. The operations are concurrent, implying no uncertainty.

Scenario 2

  • Read RPC arrived at tablet 1 from conn1 with a read time and no local_limit (first RPC).
  • Fast path insert happened from conn2 at tablet 1 after the read RPC arrived at tablet 1.
  • The read RPC observed the fast path insert and no local limit was set. This leads to a read restart error.

The read clearly happened before the write operation and there is no "real" uncertainty.

Objective

  1. Add a test case to guard against the above scenarios raising read restarts.
  2. Set local time irrespective of whether the read time is picked on the tablet or not.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@pao214 pao214 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Apr 25, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage
Projects
None yet
Development

No branches or pull requests

2 participants