Skip to content

2.27.0.0-b112

@myang2021 myang2021 tagged this 16 May 13:53
Summary:
The symptom of the CHECK failure:

```
F20250514 00:37:44 /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang19-release/yugabyte-db/src/yb/tserver/tablet_server.cc:1650] Check failed: db_message_lists[i-1].first < db_message_lists[i].first (23 vs. 23) 1 6
```

This CHECK was added in recent commit e2fe6ee4a5184e7233179d05570bce8cac1cb658.

The db_message_lists should be a queue (implemented as a std::deque) of
(catalog_version, message_list) pairs sorted by catalog_version. The check
failure log indicates the queue has 6 pairs like

(23, m1), (23, m2), (24, m3), (25, m4), (26, m5)

The first two pairs have the same catalog version (we should have m1 == m2),
this isn't expected.

After debugging, I found it is because of the use of std::upper_bound where in
fact std::lower_bound should be used.

```
  // The queue isn't empty, insert the new pair to the right position. Because db_message_lists
  // is sorted, we can use std::upper_bound with a custom comparator function to find the right
  // insertion point.
  auto comp = [](uint64_t current_version,
                 const std::pair<uint64_t, std::optional<std::string>>& p) {
                return current_version < p.first;
              };
  auto it3 = std::upper_bound(db_message_lists->begin(), db_message_lists->end(),
                              new_catalog_version, comp);
  if (it3 != db_message_lists->end()) {
    db_message_lists->insert(it3, std::make_pair(new_catalog_version, message_list));
  } else {
    // We can reach here if the last version in the queue is the same as new_catalog_version.
  }
```

If the current queue has
```
(23, m1), (24, m3), (25, m4), (26, m5)
```

and we are trying to insert a new version pair (23, m2), it3 will point to (24,
m3), and we will insert new pair before (24, m3) so we end up with

```
(23, m1), (23, m2), (24, m3), (25, m4), (26, m5)
```

I tested with the following program to see the difference between upper_bound
and lower_bound:

```
tmp$ cat ub.cc
using namespace std;

int main() {
    deque<std::pair<uint64_t, std::optional<std::string>>> v = {{23, nullopt}, {24, nullopt}};
    auto comp = [](uint64_t current_version,
                   const std::pair<uint64_t, std::optional<std::string>>& p) {
                  return current_version < p.first;
                };
    // Finding upper bound for value 23 in deque v
    auto it = upper_bound(v.begin(), v.end(), 23, comp);
    if (it == v.end()) {
      cout << "upper bound for 23 is not found" << endl;
    } else {
      cout << "upper bound for 23 is " << it->first << endl;
    }
    return 0;
}
tmp$ /opt/rh/gcc-toolset-11/root/bin/g++ -std=c++20 -o ub ub.cc
tmp$ cat lb.cc
using namespace std;

int main() {
    deque<std::pair<uint64_t, std::optional<std::string>>> v = {{23, nullopt}, {24, nullopt}};
    auto comp = [](const std::pair<uint64_t, std::optional<std::string>>& p,
                   uint64_t current_version) {
                  return p.first < current_version;
                };
    // Finding lower bound for value 23 in deque v
    auto it = lower_bound(v.begin(), v.end(), 23, comp);
    if (it == v.end()) {
      cout << "lower bound for 23 is not found" << endl;
    } else {
      cout << "lower bound for 23 is " << it->first << endl;
    }
    return 0;
}
tmp$ /opt/rh/gcc-toolset-11/root/bin/g++ -std=c++20 -o lb lb.cc
tmp$ ./ub
upper bound for 23 is 24
tmp$ ./lb
lower bound for 23 is 23
```
We can see for the same deque, lower_bound finds 23, upper_bound finds 24.
Using lower_bound allows detection of duplication of 23.

I fixed the bug by using std::lower_bound instead of std::upper_bound.

Added a new unit test which would fail with the error before the diff (with
yb_test_delay_set_local_tserver_inval_message_ms added to force a repro of the
bug):

```
[ts-1] F0515 00:43:49.812948 924097 tablet_server.cc:1660] Check failed: db_message_lists[i-1].first < db_message_lists[i].first (2 vs. 2) 1 3
```
Jira: DB-16655

Test Plan: YB_EXTRA_TSERVER_FLAGS="--log_ysql_catalog_versions=true --vmodule=tablet_server=2" ./yb_build.sh debug --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.InvalMessageDuplicateVersion

Reviewers: kfranz, sanketh, mihnea

Reviewed By: kfranz

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D44000
Assets 2
Loading