Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply doc-count batching policy to transactions before pipelining #1808

Merged
merged 1 commit into from Aug 30, 2022

Conversation

wotbrew
Copy link
Contributor

@wotbrew wotbrew commented Aug 26, 2022

Problem

In #1762 pipelining was introduced during indexing from the golden stores. This improves IO resource utilisation and delivers a substantial improvement to ingest throughput.

However one implication is that document stores can receive a much higher set of ids to fetch a time, because now the documents for many transactions are fetched at once. There is some concern ( #1800 ) that the increased concurrent request volume may be triggering errors or breaching request-per-second limits.

Solution

This PR then represents an early mitigation strategy that attempts to allow some transaction batching while putting some limits on the number of fetches document stores will be requested to do in a single eager operation.

The mechanism added batches transactions according to number of referenced docs, so that many small transactions can benefit from a lot of batching together, but larger transactions will be issued in smaller batches.

Note that a single transaction can reference any number of documents, and so for large enough transactions this PR will not help - however that issue will have been around pre #1762.

This represents a tactical, speculative change and may not resolve the S3 / R2 problem raised in #1800. Further benchmarks are necessary to determine what impact the batching policy has on overall ingest throughput, though due to the lack of any batching prior to #1762, I assume no performance regression.

Configuration

A configuration variable is available to vary the ideal batch doc-count on the ingester: :batch-preferred-doc-count.
I would consider this variable a temporary tuning parameter we can use early on while testing but I would not consider it part of the configuration surface of XTDB.

core/src/xtdb/tx.clj Outdated Show resolved Hide resolved
core/src/xtdb/tx.clj Outdated Show resolved Hide resolved
core/src/xtdb/tx.clj Outdated Show resolved Hide resolved
Partition transactions according to number of referenced docs, so that many small transactions can benefit from batching without overwhelming existing fetch impls or exhausting memory/cpu resources.

Relates to xtdb#1800 - may fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants