feat: QuerySet::iterator(chunk_size) — chunked streaming (closes #23)#113
Merged
Conversation
ujeenet
added a commit
that referenced
this pull request
May 15, 2026
Self-review tweaks on PR #113: 1. `iterator(chunk_size)` now asserts `chunk_size > 0`. Silently producing an immediately-exhausted iterator on a 0 or negative value was a footgun for `iterator(unchecked_input as i64)` style callers. Three new no-DB validation tests pin the panic message + the positive-path success. 2. Doc-comment + cookbook now flag two concurrency hazards: - **Concurrent-write hazard**: each chunk is its own query, so INSERT/DELETE between chunks can skip or duplicate rows. Recommend a snapshot-isolation transaction (REPEATABLE READ) on write-concurrent tables. - **select_for_update doesn't propagate**: row locks held by a `.select_for_update()` queryset are released between chunks because each runs in its own implicit transaction. Redirects to `pool.begin()` + `.fetch_on(&mut *tx)` for full-drain locking. 3. Extracted shared fetch path into private `fetch_next_chunk(pool)`. `next_chunk` + `next_row` both call it instead of duplicating the clone-query / set-limit / set-offset / check-exhaustion block. Caller-side `seen` accounting stays per-method because the counting semantics differ (next_chunk counts the whole chunk immediately; next_row counts rows as they pop out of the buffer). 4. Live test `mixed_next_row_and_next_chunk_preserve_order` now asserts `rows_seen() == 50` mid-iteration after 50 next_row calls on a chunk_size=100 iterator — pins the buffer-aware accounting. All gates green: 5 live + 3 validation tests pass, all-features build clean, sqlite-tenancy litmus clean.
ujeenet
added a commit
that referenced
this pull request
May 15, 2026
Second polish pass on PR #113. The first polish (commit 8c58dca) added concurrency caveats but referenced a `next_chunk_on(&mut *tx)` method that doesn't exist — a reader copying the cookbook snippet verbatim hits a compile error. Three fixes: 1. Cookbook concurrent-write snippet replaced with an honest hand-rolled `select_rows_on` loop inside the snapshot-isolation transaction. Explicitly states "ChunkedIter takes `&Pool`, not a `&mut Transaction`, so the chunker API can't be used inside the tx directly." 2. Cookbook `select_for_update` redirect rewritten with the trade-off pair: `.fetch_on(&mut *tx)` for in-memory results, hand-rolled LIMIT/OFFSET for streamed ones. Both call out that you're outside the ChunkedIter API. 3. Same correction in the `.iterator()` rustdoc — points at [`select_rows_on`] as the actual primitive for tx-scoped chunked reads. Notes the future `iterator_on` companion as out-of-scope for issue #23. No behavior change. All gates green: 5 live + 3 validation tests pass, build clean across tri-feature + sqlite-tenancy litmus.
Django parity for `QuerySet.iterator(chunk_size=2000)`. Returns a
`ChunkedIter<T>` that fetches `chunk_size` rows at a time via
`LIMIT N OFFSET M`, never buffering the full result set. Right tool
for million-row exports / ETL / batch processors that would OOM with
`fetch_pool` returning `Vec<T>`.
```rust
let mut iter = Post::objects()
.order_by(&[("id", false)])
.iterator(2_000)?;
// Whole-chunk loop:
while let Some(chunk) = iter.next_chunk(&pool).await? {
for post in chunk { /* … */ }
}
// Row-by-row loop (buffer one chunk internally via VecDeque):
while let Some(post) = iter.next_row(&pool).await? { /* … */ }
```
- `ChunkedIter<T>` struct in `sql/executor.rs` — holds the compiled
`SelectQuery`, rotating `OFFSET`, exhaustion flag, and a
`VecDeque<T>` buffer for the row-by-row path.
- `QuerySet::iterator(chunk_size)` entry point. Compiles eagerly so
schema-validation errors surface here rather than mid-stream.
- `next_chunk(&pool)` — `Option<Vec<T>>`. Returns `None` when
exhausted. Drains internal buffer first if mixed with `next_row`.
- `next_row(&pool)` — `Option<T>`. O(1) per row via `VecDeque::pop_front`.
- `rows_seen()` / `is_exhausted()` — progress + termination state.
Works on every backend — LIMIT/OFFSET is the portable common
denominator. Documented trade-off: O(n²) total work on deep
pagination (each chunk reseeks); for truly streaming PG reads,
callers can drop into `transaction()` + raw `sqlx::query(...).fetch`
Stream directly.
5 live PG tests against a 1000-row fixture:
- `next_chunk_yields_all_rows_in_order` — 5 chunks of 200, row order
preserved across chunks.
- `next_row_yields_every_row_with_internal_buffering` — chunk_size
that doesn't divide evenly (150 → 7 chunks: 6×150 + 100).
- `mixed_next_row_and_next_chunk_preserve_order` — partial drain via
next_row, then `next_chunk` yields buffered remainder before
refetching.
- `iterator_respects_where_clause_and_short_final_chunk` — narrowed
filter, short final chunk (4 chunks: 30, 30, 30, 10).
- `empty_result_yields_none_first_call` — no DB hit on subsequent
calls after exhaustion.
Cookbook section added under "Row-level locks".
Self-review tweaks on PR #113: 1. `iterator(chunk_size)` now asserts `chunk_size > 0`. Silently producing an immediately-exhausted iterator on a 0 or negative value was a footgun for `iterator(unchecked_input as i64)` style callers. Three new no-DB validation tests pin the panic message + the positive-path success. 2. Doc-comment + cookbook now flag two concurrency hazards: - **Concurrent-write hazard**: each chunk is its own query, so INSERT/DELETE between chunks can skip or duplicate rows. Recommend a snapshot-isolation transaction (REPEATABLE READ) on write-concurrent tables. - **select_for_update doesn't propagate**: row locks held by a `.select_for_update()` queryset are released between chunks because each runs in its own implicit transaction. Redirects to `pool.begin()` + `.fetch_on(&mut *tx)` for full-drain locking. 3. Extracted shared fetch path into private `fetch_next_chunk(pool)`. `next_chunk` + `next_row` both call it instead of duplicating the clone-query / set-limit / set-offset / check-exhaustion block. Caller-side `seen` accounting stays per-method because the counting semantics differ (next_chunk counts the whole chunk immediately; next_row counts rows as they pop out of the buffer). 4. Live test `mixed_next_row_and_next_chunk_preserve_order` now asserts `rows_seen() == 50` mid-iteration after 50 next_row calls on a chunk_size=100 iterator — pins the buffer-aware accounting. All gates green: 5 live + 3 validation tests pass, all-features build clean, sqlite-tenancy litmus clean.
Second polish pass on PR #113. The first polish (commit 8c58dca) added concurrency caveats but referenced a `next_chunk_on(&mut *tx)` method that doesn't exist — a reader copying the cookbook snippet verbatim hits a compile error. Three fixes: 1. Cookbook concurrent-write snippet replaced with an honest hand-rolled `select_rows_on` loop inside the snapshot-isolation transaction. Explicitly states "ChunkedIter takes `&Pool`, not a `&mut Transaction`, so the chunker API can't be used inside the tx directly." 2. Cookbook `select_for_update` redirect rewritten with the trade-off pair: `.fetch_on(&mut *tx)` for in-memory results, hand-rolled LIMIT/OFFSET for streamed ones. Both call out that you're outside the ChunkedIter API. 3. Same correction in the `.iterator()` rustdoc — points at [`select_rows_on`] as the actual primitive for tx-scoped chunked reads. Notes the future `iterator_on` companion as out-of-scope for issue #23. No behavior change. All gates green: 5 live + 3 validation tests pass, build clean across tri-feature + sqlite-tenancy litmus.
4dbd581 to
108ad74
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #23. Django parity for
QuerySet.iterator(chunk_size=2000). Returns aChunkedIter<T>that fetcheschunk_sizerows at a time viaLIMIT N OFFSET M— never buffers the full result set. Right tool for million-row exports / ETL pipelines / batch processors that would OOM withfetch_poolreturningVec<T>.```rust
let mut iter = Post::objects()
.where_(Post::published.eq(true))
.order_by(&[("id", false)])
.iterator(2_000)?;
// 1. Whole-chunk loop:
while let Some(chunk) = iter.next_chunk(&pool).await? {
for post in chunk { /* … */ }
}
// 2. Row-by-row loop (internal VecDeque buffer, O(1) per row):
let mut iter = Post::objects().order_by(&[("id", false)]).iterator(2_000)?;
while let Some(post) = iter.next_row(&pool).await? { /* … */ }
```
API
QuerySet::iterator(chunk_size)Result<ChunkedIter<T>>ChunkedIter::next_chunk(&pool)Result<Option<Vec<T>>>None= doneChunkedIter::next_row(&pool)Result<Option<T>>ChunkedIter::rows_seen()i64ChunkedIter::is_exhausted()boolMixing
next_chunkandnext_rowon the same iterator is safe — the buffer drains in row order before any new DB fetch.Design notes
transaction()+ rawsqlx::query(...).fetchStream directly.pop_frontper row; aVec::remove(0)would be O(n).chunk_size(or empty), the iterator marks itself exhausted and every subsequent call returnsOk(None)without re-querying.Streamimpl viafutures-corebut went with the simplernext_chunk/next_rowasync-method API — zero new deps, two ergonomic styles, no Pin/Unpin gymnastics in user code.Tri-dialect
Works on every backend — LIMIT/OFFSET is the portable common denominator. Cookbook flags the deep-pagination trade-off + the PG escape hatch.
What landed
Tests
Live tests cover:
Verification
```
cargo test -p rustango --features tenancy,mysql,sqlite --tests → all passing
DATABASE_URL=... cargo test --features tenancy --test iterator_live → 5 passed
cargo test --workspace --all-features --no-run → clean
cargo build --no-default-features --features sqlite,tenancy → clean
```
Test plan
rows_seen()tracks correctly across both paths.is_exhausted()becomes true once drained.--all-features --no-rungate passes.