feat: QuerySet::iterator(chunk_size) — chunked streaming (closes #23) by ujeenet · Pull Request #113 · ujeenet/rustango

ujeenet · 2026-05-15T22:07:02Z

Summary

Closes #23. Django parity for QuerySet.iterator(chunk_size=2000). Returns a ChunkedIter<T> that fetches chunk_size rows at a time via LIMIT N OFFSET M — never buffers the full result set. Right tool for million-row exports / ETL pipelines / batch processors that would OOM with fetch_pool returning Vec<T>.

```rust
let mut iter = Post::objects()
.where_(Post::published.eq(true))
.order_by(&[("id", false)])
.iterator(2_000)?;

// 1. Whole-chunk loop:
while let Some(chunk) = iter.next_chunk(&pool).await? {
for post in chunk { /* … */ }
}

// 2. Row-by-row loop (internal VecDeque buffer, O(1) per row):
let mut iter = Post::objects().order_by(&[("id", false)]).iterator(2_000)?;
while let Some(post) = iter.next_row(&pool).await? { /* … */ }
```

API

Method	Returns	Description
`QuerySet::iterator(chunk_size)`	`Result<ChunkedIter<T>>`	Compiles eagerly — schema errors surface here, not mid-stream
`ChunkedIter::next_chunk(&pool)`	`Result<Option<Vec<T>>>`	Whole-chunk; `None` = done
`ChunkedIter::next_row(&pool)`	`Result<Option<T>>`	Row-by-row; buffers one chunk internally
`ChunkedIter::rows_seen()`	`i64`	Cumulative count (progress reporting)
`ChunkedIter::is_exhausted()`	`bool`	Post-drain flag

Mixing next_chunk and next_row on the same iterator is safe — the buffer drains in row order before any new DB fetch.

Design notes

LIMIT/OFFSET chunker, not a server-side cursor. Portable across PG / MySQL / SQLite with no transaction overhead. Trade-off documented in the cookbook: O(n²) total work on deep pagination (each chunk reseeks). For truly streaming PG reads, drop into transaction() + raw sqlx::query(...).fetch Stream directly.
VecDeque buffer for row-by-row path. O(1) pop_front per row; a Vec::remove(0) would be O(n).
Exhaustion is sticky. Once a chunk comes back smaller than chunk_size (or empty), the iterator marks itself exhausted and every subsequent call returns Ok(None) without re-querying.
No new dependencies. Considered exposing a Stream impl via futures-core but went with the simpler next_chunk / next_row async-method API — zero new deps, two ergonomic styles, no Pin/Unpin gymnastics in user code.

Tri-dialect

Works on every backend — LIMIT/OFFSET is the portable common denominator. Cookbook flags the deep-pagination trade-off + the PG escape hatch.

What landed

`src/sql/executor.rs` — `ChunkedIter` struct + `QuerySet::iterator(chunk_size)` + 4 methods on `ChunkedIter`.
`tests/iterator_live.rs` — 5 live PG tests against a 1000-row fixture.
`docs/orm.md` — cookbook section under "Row-level locks".

Tests

Suite	Count	Status
`tests/iterator_live.rs` (live PG, 1000-row fixture)	5	passing
`cargo test -p rustango --features tenancy,mysql,sqlite --tests` (full suite)	all	passing

Live tests cover:

next_chunk_yields_all_rows_in_order — 5 chunks of 200, row order preserved.
next_row_yields_every_row_with_internal_buffering — chunk_size that doesn't divide evenly (150 → 7 chunks: 6×150 + 100).
mixed_next_row_and_next_chunk_preserve_order — partial drain via next_row, then next_chunk yields buffered remainder before refetching.
iterator_respects_where_clause_and_short_final_chunk — narrowed filter, short final chunk (4 chunks: 30, 30, 30, 10).
empty_result_yields_none_first_call — no DB hit on subsequent calls after exhaustion.

Verification

```
cargo test -p rustango --features tenancy,mysql,sqlite --tests → all passing
DATABASE_URL=... cargo test --features tenancy --test iterator_live → 5 passed
cargo test --workspace --all-features --no-run → clean
cargo build --no-default-features --features sqlite,tenancy → clean
```

Test plan

Self-review tweaks on PR #113: 1. `iterator(chunk_size)` now asserts `chunk_size > 0`. Silently producing an immediately-exhausted iterator on a 0 or negative value was a footgun for `iterator(unchecked_input as i64)` style callers. Three new no-DB validation tests pin the panic message + the positive-path success. 2. Doc-comment + cookbook now flag two concurrency hazards: - **Concurrent-write hazard**: each chunk is its own query, so INSERT/DELETE between chunks can skip or duplicate rows. Recommend a snapshot-isolation transaction (REPEATABLE READ) on write-concurrent tables. - **select_for_update doesn't propagate**: row locks held by a `.select_for_update()` queryset are released between chunks because each runs in its own implicit transaction. Redirects to `pool.begin()` + `.fetch_on(&mut *tx)` for full-drain locking. 3. Extracted shared fetch path into private `fetch_next_chunk(pool)`. `next_chunk` + `next_row` both call it instead of duplicating the clone-query / set-limit / set-offset / check-exhaustion block. Caller-side `seen` accounting stays per-method because the counting semantics differ (next_chunk counts the whole chunk immediately; next_row counts rows as they pop out of the buffer). 4. Live test `mixed_next_row_and_next_chunk_preserve_order` now asserts `rows_seen() == 50` mid-iteration after 50 next_row calls on a chunk_size=100 iterator — pins the buffer-aware accounting. All gates green: 5 live + 3 validation tests pass, all-features build clean, sqlite-tenancy litmus clean.

Second polish pass on PR #113. The first polish (commit 8c58dca) added concurrency caveats but referenced a `next_chunk_on(&mut *tx)` method that doesn't exist — a reader copying the cookbook snippet verbatim hits a compile error. Three fixes: 1. Cookbook concurrent-write snippet replaced with an honest hand-rolled `select_rows_on` loop inside the snapshot-isolation transaction. Explicitly states "ChunkedIter takes `&Pool`, not a `&mut Transaction`, so the chunker API can't be used inside the tx directly." 2. Cookbook `select_for_update` redirect rewritten with the trade-off pair: `.fetch_on(&mut *tx)` for in-memory results, hand-rolled LIMIT/OFFSET for streamed ones. Both call out that you're outside the ChunkedIter API. 3. Same correction in the `.iterator()` rustdoc — points at [`select_rows_on`] as the actual primitive for tx-scoped chunked reads. Notes the future `iterator_on` companion as out-of-scope for issue #23. No behavior change. All gates green: 5 live + 3 validation tests pass, build clean across tri-feature + sqlite-tenancy litmus.

Django parity for `QuerySet.iterator(chunk_size=2000)`. Returns a `ChunkedIter<T>` that fetches `chunk_size` rows at a time via `LIMIT N OFFSET M`, never buffering the full result set. Right tool for million-row exports / ETL / batch processors that would OOM with `fetch_pool` returning `Vec<T>`. ```rust let mut iter = Post::objects() .order_by(&[("id", false)]) .iterator(2_000)?; // Whole-chunk loop: while let Some(chunk) = iter.next_chunk(&pool).await? { for post in chunk { /* … */ } } // Row-by-row loop (buffer one chunk internally via VecDeque): while let Some(post) = iter.next_row(&pool).await? { /* … */ } ``` - `ChunkedIter<T>` struct in `sql/executor.rs` — holds the compiled `SelectQuery`, rotating `OFFSET`, exhaustion flag, and a `VecDeque<T>` buffer for the row-by-row path. - `QuerySet::iterator(chunk_size)` entry point. Compiles eagerly so schema-validation errors surface here rather than mid-stream. - `next_chunk(&pool)` — `Option<Vec<T>>`. Returns `None` when exhausted. Drains internal buffer first if mixed with `next_row`. - `next_row(&pool)` — `Option<T>`. O(1) per row via `VecDeque::pop_front`. - `rows_seen()` / `is_exhausted()` — progress + termination state. Works on every backend — LIMIT/OFFSET is the portable common denominator. Documented trade-off: O(n²) total work on deep pagination (each chunk reseeks); for truly streaming PG reads, callers can drop into `transaction()` + raw `sqlx::query(...).fetch` Stream directly. 5 live PG tests against a 1000-row fixture: - `next_chunk_yields_all_rows_in_order` — 5 chunks of 200, row order preserved across chunks. - `next_row_yields_every_row_with_internal_buffering` — chunk_size that doesn't divide evenly (150 → 7 chunks: 6×150 + 100). - `mixed_next_row_and_next_chunk_preserve_order` — partial drain via next_row, then `next_chunk` yields buffered remainder before refetching. - `iterator_respects_where_clause_and_short_final_chunk` — narrowed filter, short final chunk (4 chunks: 30, 30, 30, 10). - `empty_result_yields_none_first_call` — no DB hit on subsequent calls after exhaustion. Cookbook section added under "Row-level locks".

Self-review tweaks on PR #113: 1. `iterator(chunk_size)` now asserts `chunk_size > 0`. Silently producing an immediately-exhausted iterator on a 0 or negative value was a footgun for `iterator(unchecked_input as i64)` style callers. Three new no-DB validation tests pin the panic message + the positive-path success. 2. Doc-comment + cookbook now flag two concurrency hazards: - **Concurrent-write hazard**: each chunk is its own query, so INSERT/DELETE between chunks can skip or duplicate rows. Recommend a snapshot-isolation transaction (REPEATABLE READ) on write-concurrent tables. - **select_for_update doesn't propagate**: row locks held by a `.select_for_update()` queryset are released between chunks because each runs in its own implicit transaction. Redirects to `pool.begin()` + `.fetch_on(&mut *tx)` for full-drain locking. 3. Extracted shared fetch path into private `fetch_next_chunk(pool)`. `next_chunk` + `next_row` both call it instead of duplicating the clone-query / set-limit / set-offset / check-exhaustion block. Caller-side `seen` accounting stays per-method because the counting semantics differ (next_chunk counts the whole chunk immediately; next_row counts rows as they pop out of the buffer). 4. Live test `mixed_next_row_and_next_chunk_preserve_order` now asserts `rows_seen() == 50` mid-iteration after 50 next_row calls on a chunk_size=100 iterator — pins the buffer-aware accounting. All gates green: 5 live + 3 validation tests pass, all-features build clean, sqlite-tenancy litmus clean.

Second polish pass on PR #113. The first polish (commit 8c58dca) added concurrency caveats but referenced a `next_chunk_on(&mut *tx)` method that doesn't exist — a reader copying the cookbook snippet verbatim hits a compile error. Three fixes: 1. Cookbook concurrent-write snippet replaced with an honest hand-rolled `select_rows_on` loop inside the snapshot-isolation transaction. Explicitly states "ChunkedIter takes `&Pool`, not a `&mut Transaction`, so the chunker API can't be used inside the tx directly." 2. Cookbook `select_for_update` redirect rewritten with the trade-off pair: `.fetch_on(&mut *tx)` for in-memory results, hand-rolled LIMIT/OFFSET for streamed ones. Both call out that you're outside the ChunkedIter API. 3. Same correction in the `.iterator()` rustdoc — points at [`select_rows_on`] as the actual primitive for tx-scoped chunked reads. Notes the future `iterator_on` companion as out-of-scope for issue #23. No behavior change. All gates green: 5 live + 3 validation tests pass, build clean across tri-feature + sqlite-tenancy litmus.

ujeenet added 3 commits May 15, 2026 19:45

ujeenet force-pushed the issue-23-iterator branch from 4dbd581 to 108ad74 Compare May 15, 2026 22:48

ujeenet merged commit 67baee1 into main May 15, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: QuerySet::iterator(chunk_size) — chunked streaming (closes #23)#113

feat: QuerySet::iterator(chunk_size) — chunked streaming (closes #23)#113
ujeenet merged 3 commits into
mainfrom
issue-23-iterator

ujeenet commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ujeenet commented May 15, 2026

Summary

API

Design notes

Tri-dialect

What landed

Tests

Verification

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant