Skip to content

2.27.0.0-b445

@andrei-mart andrei-mart tagged this 13 Aug 21:54
Summary:
Unify forward and backward scan logic, make it more straightforward.

From now on, regardless of scan direction:
- if the scan has been completed, paging state is NOT returned
- if scan of a tablet has not been finished because some limit was hit, DocDB returns paging state with next_row_key set to resume scan from the same point later, see PgsqlReadOperation::SetPagingState
- otherwise Tablet::CreatePagingStateForRead returns paging state with next_partition_key set to the next partition to scan
- in all cases when paging state is returned, next_tablet_bound is added to the paging state if the current tablet's bound in scan direction (upper if forward scan, lower if backward) lays within the scan's scope defined by hash_code/max_hash_code and lower_bound/upper_bound.

The "scan has been completed" term above is fairly complicated. Scan may be for specific rows (ybctids are specified). Such scan is completed
when all ybctids in the request are processed. Request may have bounds, and request is complete if respective bound (upper in the case
of forward scan or lower in the case of backward scan) is reached. Finally, unbound request is completed when the last tablet is processed.

The next_tablet_bound is not used at this point. It is added for future improvements in PgGate:
1. Early request split. In many cases PgGate set request bounds to match tablet bounds. But tablets may split. Let's consider a request to some tablet with bounds a and b, which has split at point c. PgGate would sent scan with bounds a and b, which would be executed in parallel with requests to other tablets. The request may paginate multiple times, and without next_tablet_bound PgGate would only learn about the split after receiving the paging state with next_partition_key set to c and would switch the scan to the tablet [c, b]. With next_tablet_bound PgGate would know about the split after the first page. With that knowledge it would have an option to create one more parallel scan of tablet [c, b] right away. For next page of the current scan it should update the scan bound to c.

2. Merge sort in PgGate (D43357) requires all scan to return sorted result. It holds true if the scan stays within one tablet. If request paginates, PgGate does not know if scan would switch to other tablet eventually. So the Merge has to fetch the pages until the end. But if target node knows to return  next_tablet_bound, paging state without next_tablet_bound means it won't switch, so fetch can be postponed. Combined with 1., if next_tablet_bound is set, PgGate can create new Merge sort scan from the point c and postpone current scan.

**Upgrade/Rollback safety:**
New field next_tablet_bound is for future optimization, nothing depends on it at this time.
The diff changes the paging state addition logic in backward scan.
Old code did not check request bounds and always put paging state into the response, expecting PgGate to check them and not send the next request, if crossed.
New semantics is the same regardless of the direction.
Earlier boundary check makes similar logic in the PgGate redundant, so it can be removed in the future.
The diff changes only TServer side of the logic, and regression tests confirm safety.
Jira: DB-17337

Test Plan: The change is mostly refactor, make sure existing tests don't break

Reviewers: arybochkin, dmitry, sergei, timur, #db-approvers

Reviewed By: timur, #db-approvers

Subscribers: smishra, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D45071
Assets 2
Loading