Skip to content

2024.2.10.0-b50

@hkimura hkimura tagged this 10 Jun 04:33
Summary:
Backport note: two conflicts, both from newer master code absent on 2024.2.
In yb-admin_cli.cc, dropped the `unsafe_release_object_locks_global` command (its
args, action, and REGISTER_COMMAND_HIDDEN) -- pre-existing object-locking master
code, not part of this change, and ReleaseObjectLocksGlobal does not exist on this
branch. Kept this change's DecodeHexPartitionKey helper, the new includes, and the
key-range args. The yb-admin-test.cc conflict was this change's new tests landing
next to branch-specific context; took the tests as-is. No other changes.

get_table_hash hashes a whole table. To narrow a detected inconsistency (e.g.
during xCluster verification) down to where the data diverged, allow the scan to
be restricted to a logical partition-key sub-range.

Add optional start_key / end_key arguments (raw partition keys, hex-encoded on
the command line -- the same encoding shown as partition_key_start /
partition_key_end by list_tablets). start_key is inclusive, end_key is exclusive;
an empty bound means unbounded on that side. The range is logical, so it is
cluster-independent: each cluster resolves it to whatever tablets it owns, which
is what makes it usable for cross-cluster comparison even when tablet boundaries
differ.

- DumpTabletDataRequestPB gains start_key and end_key.
- tablet::DumpTabletData builds each table's encoded bound as
  [table prefix][encoded partition key]: the table prefix (cotable_id /
  colocation_id bytes; empty for a non-colocated table) places the bound in this
  table's slice of the tablet, and the encoded partition key narrows within it.
  An empty user bound leaves that side at the iterator's natural table boundary.
- yb-admin's client skips tablets that do not overlap the requested range and
  forwards the bounds unchanged to every overlapping tablet.

A key range scopes a single table, so it requires a concrete table_id: combined
with a colocation parent id (which hashes every table in the tablet) it would be
ambiguous, and is rejected with InvalidArgument. Bad input is rejected up front
rather than silently hashing the wrong range: the CLI rejects malformed hex and an
inverted range (start_key >= end_key), and the server rejects a bound that is not a
2-byte hash for a hash-partitioned table.

Builds on #31952 (D53893, landed), whose per-table scoping this composes with:
pass a child colocated table id to hash one colocated table over a key range.

For #31951.

**Upgrade/Rollback safety:**
No persistent or on-disk format change, and no AutoFlag is needed. start_key and
end_key are optional fields on DumpTabletDataRequestPB, an on-demand admin RPC
used only by `yb-admin get_table_hash`; they are not written to disk, the WAL, or
sys.catalog, and are absent from any consensus/replication path. Mixed-version
behavior is safe: an old yb-admin sends no bounds, so a new tserver scans the full
table (unchanged); a new yb-admin's bounds are ignored by an old tserver, which
scans the full table rather than erroring -- run a version-matched yb-admin when
using start_key/end_key. No state is persisted, so rollback has nothing to undo.

Original commit: 208a0f88102d23f40365f4ba3d2c5005eddee88e / D53900

Test Plan:
AdminCliTest.TestGetTableXorHashKeyRange (non-colocated YCQL hash table): explicit
empty bounds reproduce the full-table totals, and a complementary 0x8000 split
partitions the rows so counts sum and hashes XOR back to the full totals.

PgLibPqTest.TestGetTableXorHashColocatedKeyRange (colocated, range-only table):
derives real mid-data split keys for id=4 and id=8 from the server's own
partitioning (a throwaway non-colocated SPLIT AT VALUES ((4), (8)) table), then --
passing the child colocated table id -- splits the table into three disjoint
segments [-inf, key(4)) / [key(4), key(8)) / [key(8), +inf) (the middle one
specifies both bounds), verifying exact per-segment row counts (3/4/3) and that
the segments recombine (counts sum, hashes XOR) to the full totals. Also asserts
that a key range against the colocation parent table id is rejected.

Verified locally (debug/clang21): AdminCliTest.TestGetTableXorHashKeyRange passes.
Relying on CSI for the full suite (incl. PgLibPqTest.TestGetTableXorHashColocatedKeyRange).

Reviewers: jhe

Reviewed By: jhe

Differential Revision: https://phorge.dev.yugabyte.com/D54293
Assets 2
Loading