New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] [YCQL] Failed to load sys catalog after incorrect packing of liveness columns #18157
Closed
1 task done
Labels
2.18 Backport Required
2.20 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Comments
zlareb1-yb
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Jul 10, 2023
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Jul 10, 2023
@zlareb1-yb , Can you clarify what version were you upgrading from? |
@rthallamko3 Universe upgrade was done from |
yugabyte-ci
added
priority/high
High Priority
and removed
status/awaiting-triage
Issue awaiting triage
priority/medium
Medium priority issue
labels
Jul 25, 2023
spolitov
added a commit
that referenced
this issue
Oct 15, 2023
Summary: The packed row is interpreted as row with liveness column. I.e. if we set all columns of this row to NULL, it will be row consisting of NULLs. But in YCQL we could insert row without liveness column, so setting all columns to NULL should result in deleting of such row. During compaction we could generate packed row for such row. It is incorrect and this diff fixes the issue. Jira: DB-7197 Test Plan: BackupTxnTest.DeleteWithCompaction CqlPackedRowTest.CompactWithoutLivenessColumn Reviewers: bogdan Reviewed By: bogdan Subscribers: rthallam, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29102
rthallamko3
changed the title
[After adding ignore_null_sys_catalog_entries] Failed to initialize client: Timed out (yb/rpc/rpc.cc:220): Could not locate the leader master: GetLeaderMasterRpc
[DocDB] [YCQL] Failed to load sys catalog after incorrect packing of liveness columns
Oct 16, 2023
spolitov
added a commit
that referenced
this issue
Oct 17, 2023
Summary: The packed row is interpreted as row with liveness column. I.e. if we set all columns of this row to NULL, it will be row consisting of NULLs. But in YCQL we could insert row without liveness column, so setting all columns to NULL should result in deleting of such row. During compaction we could generate packed row for such row. It is incorrect and this diff fixes the issue. Jira: DB-7197 Original commit: 4d5c482/D29102 Test Plan: BackupTxnTest.DeleteWithCompaction CqlPackedRowTest.CompactWithoutLivenessColumn Reviewers: bogdan, rthallam Reviewed By: bogdan, rthallam Subscribers: ybase, rthallam Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29386
spolitov
added a commit
that referenced
this issue
Oct 20, 2023
Summary: The packed row is interpreted as row with liveness column. I.e. if we set all columns of this row to NULL, it will be row consisting of NULLs. But in YCQL we could insert row without liveness column, so setting all columns to NULL should result in deleting of such row. During compaction we could generate packed row for such row. It is incorrect and this diff fixes the issue. Jira: DB-7197 Original commit: 4d5c482/D29102 Test Plan: BackupTxnTest.DeleteWithCompaction CqlPackedRowTest.CompactWithoutLivenessColumn Reviewers: bogdan, rthallam Reviewed By: rthallam Subscribers: ybase, rthallam Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29375
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.18 Backport Required
2.20 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Jira Link: DB-7197
Description
Upgrade from 2.19.1.0-b203 to 2.19.1.0-b203 was failing with below error:
org.yb.client.NonRecoverableException: Too many attempts: YRpc(method=GetMasterClusterConfig, service=yb.master.MasterService, tablet=null, attempt=22, maxAttempts=100, maxTimeoutMs=120000, elapsedTimeMs=116544). Master config (10.150.0.207:7100,10.150.0.208:7100,10.150.0.36:7100) has no leader.. Exceptions received: org.yb.client.ConnectionResetException: [Peer YB Master - 10.150.0.36:7100] Connection reset on [id: 0xd57c27e6, L:null ! R:/10.150.0.36:7100],org.yb.client.ConnectionResetException: [Peer YB Master - 10.150.0.208:7100] Connection reset on [id: 0xdc84ea0e, L:null ! R:/10.150.0.208:7100].
Error in master:
F0705 10:03:09.221561 37220 catalog_manager.cc:1143] T 00000000000000000000000000000000 P 7f806906d6f54b07a59dce6962ba59ee: Failed to load sys catalog: Corruption (yb/master/sys_catalog_writer.cc:71): Failed while visiting snapshots in sys catalog: System catalog snapshot is corrupted or built using different build type: Unexpected value type for metadata: 0, row: { 0 => { value: int8_value: 7 ttl_seconds: 0 write_time: kUninitializedWriteTime } 1 => { value: binary_value: "}~\212\312\324\361D\\\231\233\303s\321\255\315h" ttl_seconds: 0 write_time: kUninitializedWriteTime } 2 => { value: ttl_seconds: -1 write_time: 1687711964992311 } }, type: 7, id: 7D7E8ACAD4F1445C999BC373D1ADCD68
To resolve this. Retry was done by manually adding gflag
ignore_null_sys_catalog_entries
set totrue
master flags:
After applying the mentioned gflag, master service started working fine on n1 but on other 2 node, it is failing:
Logs observed:
E0706 09:56:20.952315 106781 async_initializer.cc:95] Failed to initialize client: Timed out (yb/rpc/rpc.cc:220): Could not locate the leader master: GetLeaderMasterRpc(addrs: [10.150.0.207:7100, 10.150.0.208:7100, 10.150.0.36:7100, 10.150.0.207:7100, 10.150.0.208:7100, 10.150.0.36:7100], num_attempts: 53) passed its deadline 90239.702s (passed: 1.540s): Network error (yb/util/net/socket.cc:534): recvmsg got EOF from remote (system error 108)
Slack Discussion - https://yugabyte.slack.com/archives/C01CB38CZHU/p1688639798730339?thread_ts=1688552470.326609&cid=C01CB38CZHU
cc: @kripasreenivasan @Arjun-yb @renjith-yb
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: