Skip to content

2.31.0.0-b150

@andrei-mart andrei-mart tagged this 09 Jun 20:11
Summary:
The only case where Postgres currently requests backward scan on a hash
relation is if all the hash key columns are specified, and only the
order of the range column values matters. In that case the scan is
"partition change safe", the feature particularly important for the
backward scan. However, there are plans to remove this restriction
(GHI #30927), so Postgres will have cases of backward scan which are
not "partition change safe". This diff is to make sure those cases will
work as expected.

The main reason why it did not work was the discrepancy in the target
partition identification between the forward and the backward scans.
The identifier for the target partition (partition_key) is the tablet's lower
bound, or, in fact, any key higher than that, but still strictly less than the
next tablet's lower bound, as the lower tablet bound is implicitly inclusive.
The partition_key is initialized and maintained by the PgGate as a field of
the request. It is updated from the paging state returned from the DocDB.
In the forward scan case initialization and maintenance are straightforward.
For initialization the request's lower bound is used. In many cases the lower
bound matches the tablet bound. If it is not, the lower bound works better
in the case of tablet split. If the cached tablet bound is used as the partition
bound, there's a chance that tablet splits between the partition_key and the
lower bound, so scan will have to immediately move on to the next tablet,
making the request unnecessary. For maintenance, PgGate just copies the
partition_key from the paging state. The next partition's lower bound remains
valid even if target tablet splits.

In backward scan things are not so straightforward. For initialization it is
critical to use the upper bound as the partition key. If the cached lower
tablet bound is used as the partition key, tablet split between the partition
key and the request upper bound will skip a part of the scan, which is not
just a performance, but a correctness problem. For maintenance, if the
partition key is set based on the paging data and the cached partition list,
the split of the target tablet would make t-server to skip the scan of the
upper half of the tablet. To solve this problem the PgGate maintained
the upper_bound instead of the partition key. Because of that the same
functionality was implemented twice for the forward and backward cases.
Some cases were never implemented as not possible. In particular, backward
hash scans were always within one hash code, there were no need to change
partition key or support partition split.

This diff unifies handling for forward and backward cases by maintaining
the request's partition_key as the lower partition bound in forward scan,
or the upper partition bound in backward scan. The request's lower and
upper bounds are invariant, as well as the hash code bounds. The variable
request's fields between the pages are the paging_state and the partition_key.
The initialization of the partition key works off of the request's lower or upper
bound, depending on the scan direction. It is important to note that the
partition's upper bound in implicitly exclusive, while request's upper bound
may be inclusive, so we have to "increment" the request's upper bound to
make an "inclusive" partition's upper bound. For maintenance, the
partition_key is simply copied from the paging_state, either from next_row_key,
if specified, or from the next_partition_key. The paging_state is now
always copied from the response to the request by the PgGate as is, no more
modifications.

The request's partition_key should be distinguished from the partition_key
returned to the t-server by by YBOperation::GetPartitionKey function.
While the former may be based on cached partition list and somewhat
inaccurate, as discussed above, the t-server provides up to date partition list,
and expects the returned partition_key to be the lower bound of the target
partition.

Other notable changes:
- `yb_op->SetHashCode(...)` is removed from the batcher, as well as
  `YBOperation::SetHashCode` as no longer used. The batcher is
  overstepping its responsibilities in this case, and breaking in
  the case of hash backward scan, as the hash_code limits the scan
  scope. Instead, batcher expects that yb_op->GetPartitionKey(...)
  makes sure all required fields on the request are properly set.
  In some cases GetPartitionKey has already been doing so, in some
  cases hash_code and others were set earlier. This diff updates
  GetPartitionKey in other classes to satisfy that expectation.
- Removed `IsNonIndexBackwardScan`, as well as
  `IsPartitionsChangeDependantBackwardScan`. The "Index" means the
  index_request field in the request, which implies colocated scan,
  which is not partitioned and therefore "partition change safe".
  The `IsPartitionsChangeDependantBackwardScan` also assumed a hash
  scan as "partition change safe", which no longer true. Actually,
  the "partition change safe" matter less than it used to, or thought
  to be, so we just trivially check the scan direction.

Test Plan: ./yb_build.sh --cxx-test pggate_pggate_test_select --gtest_filter 'PggateTestBackwardScanSelect.HashBackwardScan*'

Reviewers: aagrawal, arybochkin, dmitry

Reviewed By: aagrawal, arybochkin

Subscribers: yql, ybase

Tags: #jenkins-ready, #jenkins-trigger

Differential Revision: https://phorge.dev.yugabyte.com/D51890
Assets 2
Loading