Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] UndefinedBehavior in TwoDCTestParams/TwoDCYsqlTest.DeleteTableChecks and TwoDCTestParams/TwoDCTest.DeleteTableChecksCQL #13929

Closed
bmatican opened this issue Sep 8, 2022 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/high High Priority xCluster Label for xCluster related issues/improvements

Comments

@bmatican
Copy link
Contributor

bmatican commented Sep 8, 2022

Jira Link: DB-3426

Description

It seems we end up with a null tablet peer, while calling GetChanges, so there's probably some shutdown or cleanup logic, that's not thread safe.

https://jenkins.dev.yugabyte.com/job/github-yugabyte-db-alma8-master-clang12-asan/1258/artifact/build/asan-clang12-dynamic-ninja/yb-test-logs/tests-integration-tests__twodc-test/TwoDCTestParams__TwoDCTest_DeleteTableChecksCQL__3.log

https://jenkins.dev.yugabyte.com/job/github-yugabyte-db-alma8-master-clang12-asan/1255/artifact/build/asan-clang12-dynamic-ninja/yb-test-logs/tests-integration-tests__twodc_ysql-test/TwoDCTestParams__TwoDCYsqlTest_DeleteTableChecks__1.log

../../ent/src/yb/cdc/cdc_service.cc:658:16: runtime error: member call on null pointer of type 'yb::tablet::TabletPeer'
    #0 0x7f825c51e3e6 in yb::cdc::(anonymous namespace)::IsTabletPeerLeader(std::__1::shared_ptr<yb::tablet::TabletPeer> const&) ${BUILD_ROOT}/../../ent/src/yb/cdc/cdc_service.cc:658:16
    #1 0x7f825c522447 in yb::cdc::CDCServiceImpl::GetChanges(yb::cdc::GetChangesRequestPB const*, yb::cdc::GetChangesResponsePB*, yb::rpc::RpcContext) ${BUILD_ROOT}/../../ent/src/yb/cdc/cdc_service.cc:1272:31
...
@bmatican bmatican added kind/failing-test Tests and testing infra area/docdb YugabyteDB core features priority/high High Priority xCluster Label for xCluster related issues/improvements status/awaiting-triage Issue awaiting triage labels Sep 8, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug and removed status/awaiting-triage Issue awaiting triage labels Sep 8, 2022
@lingamsandeep
Copy link
Contributor

This seems to be a regression from the commit : d4df777

With this commit out of 25 runs of the test we consistently see ~10 failures. Without it, we see 0.

@lingamsandeep lingamsandeep assigned spolitov and unassigned hulien22 Sep 15, 2022
spolitov added a commit that referenced this issue Sep 19, 2022
Summary:
CDC Service has pretty weird logic for tablet peer state.
It ignores all failures except IsNotFound.
Since before D19068/d4df77709fc39c3cf2fca4d907c6a094898ff2df tablet peer was set, it was not crash.

But this logic is naturally wrong, we cannot rely that tablet peer would be non null in case of other failures.
So after D19068 it started to crash.

Fixed to check for general failure.

Test Plan: ybd debug --cxx-test twodc-test --gtest_filter TwoDCTestParams/TwoDCTest.DeleteTableChecksCQL/2 -n 28

Reviewers: slingam

Reviewed By: slingam

Subscribers: rahuldesirazu, nicolas, bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D19622
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/high High Priority xCluster Label for xCluster related issues/improvements
Projects
None yet
Development

No branches or pull requests

5 participants