Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] tests using test fixture PgMiniLargeClockSkewTest cause memory corruption, which sometimes makes them fail #16501

Closed
1 task done
mdbridge opened this issue Mar 20, 2023 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@mdbridge
Copy link
Contributor

mdbridge commented Mar 20, 2023

Jira Link: DB-5902

Description

Tests based on test fixture PgMiniLargeClockSkewTest have a heap overflow bug that corrupts the heap and sometimes causes them to spuriously fail. It is also unclear if they actually are testing large clock skew.

In particular, the following code used by this test fixture:

/home/mdbridge/code/yugabyte-db/src/yb/integration-tests/mini_cluster.cc:747:
server::SkewedClockDeltaChanger JumpClock(
    server::RpcServerBase* server, std::chrono::milliseconds delta) {
  auto* hybrid_clock = down_cast<server::HybridClock*>(server->clock());
  return server::SkewedClockDeltaChanger(
      delta, std::static_pointer_cast<server::SkewedClock>(hybrid_clock->physical_clock()));
}

Assumes that the clock the mini cluster has been initialized with is a SkewedClock, but the fixture set up does not specify this:

/home/mdbridge/code/yugabyte-db/src/yb/yql/pgwrapper/pg_mini-test.cc:615:
class PgMiniLargeClockSkewTest : public PgMiniTest {
 public:
  void SetUp() override {
    SetAtomicFlag(250000ULL, &FLAGS_max_clock_skew_usec);
    PgMiniTestBase::SetUp();
  }
};

Because of this bug, SkewedClockDeltaChanger attempts to change the delta of the underlying clock by mutating an unallocated location in the heap. :-(

To resolve this issue, the fixture needs to specify that time source and code should be added to fail explicitly if the correct time source is missing instead of just silently corrupting memory.

To reproduce this issue, you can run ASAN for the test PgMiniTest.ReadRestartSnapshot; note that ASAN is disabled for these tests so you will have to manually enable them by changing the source code first.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@mdbridge mdbridge added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Mar 20, 2023
@mdbridge mdbridge self-assigned this Mar 20, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Mar 20, 2023
@hari90 hari90 removed the status/awaiting-triage Issue awaiting triage label Mar 20, 2023
premkumr pushed a commit to premkumr/yugabyte-db that referenced this issue Apr 10, 2023
…bug in test fixture PgMiniLargeClockSkewTest

Summary:
See GitHub issue yugabyte#16501

Here I have have added checks to fail quickly if there are issues with
the clocks:
```
~/code/yugabyte-db/src/yb/integration-tests/mini_cluster.cc:747:
server::SkewedClockDeltaChanger JumpClock(
    server::RpcServerBase* server, std::chrono::milliseconds delta) {
  auto* hybrid_clock = down_cast<server::HybridClock*>(server->clock());
  DCHECK(hybrid_clock);
  auto skewed_clock =
      std::dynamic_pointer_cast<server::SkewedClock>(hybrid_clock->physical_clock());
  DCHECK(skewed_clock)
      << ": Server physical clock is not a SkewedClock; did you forget to set --time_source?";
  return server::SkewedClockDeltaChanger(delta, skewed_clock);
}
```

And selected the correct time source:
```
~/code/yugabyte-db/src/yb/yql/pgwrapper/pg_mini-test.cc:617:
class PgMiniLargeClockSkewTest : public PgMiniTest {
 public:
  void SetUp() override {
    server::SkewedClock::Register();
    FLAGS_time_source = server::SkewedClock::kName;
    SetAtomicFlag(250000ULL, &FLAGS_max_clock_skew_usec);
    PgMiniTestBase::SetUp();
  }
};
```

This also fixes GitHub issue yugabyte#14166

Fixes yugabyte#16501
Fixes yugabyte#14166
Fixes yugabyte#16520

Test Plan:
I have verified this fixes the problem for the commit and build type
that originally had the test failing:
```
git checkout f0fceca
yb_build.sh fastdebug --cxx-test pgwrapper_pg_mini-test --gtest_filter '*ReadRestartSnapshot' --gcc11 |& tee /tmp/generic.mdl.log
```

I have also verified when I use ASAN that the test still passes (it was
failing before)

I verified that failing to pick the correct time source gives a useful
error:
```
F0320 20:35:24.715245 1538550 mini_cluster.cc:753] Check failed: skewed_clock : Server physical clock is not a SkewedClock; did you forget to set --time_source?
```

Reviewers: hsunder, sergei

Reviewed By: hsunder, sergei

Subscribers: ybase, hsunder, bogdan, jason

Differential Revision: https://phabricator.dev.yugabyte.com/D23758
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants