Release Release v0.1.4 · thanhtham010891/agora-etl

Runtime Hardening

Hardened declarative pipeline assembly so configs without sinks now fail fast instead of silently falling back to StdoutSink
Fixed SQLite DLQ access patterns so offloaded SQLite I/O no longer depends on implicit thread affinity
Hardened file-backed source shutdown paths so early-stop and failure cases do not leave producer threads blocked on bounded queues
Restored file-source prefetch control so queue_maxsize once again maps to runtime prefetch depth instead of being silently ignored
Changed container singleton-factory startup to fail fast when eager resolution breaks instead of continuing in a partially initialized state
Added middleware startup rollback so previously started middleware hooks are stopped if a later on_start() fails during pipeline boot
Switched ParquetSource cleanup back to the public pyarrow close path instead of relying on an internal reader handle
Prevented WorkerPool from duplicating metrics and lease-release observers across repeated run() calls and ensured distributed coordinators are stopped after clean once-pipeline completion
Hardened buffered finalization so drain/flush failures now abort pending buffered work instead of leaving in-flight tasks running behind a failed pipeline

Performance And Benchmarking

Added local benchmark tooling under benchmarks/ for generating sample data and measuring end-to-end throughput across CSV, JSONL, and Parquet scenarios
Added a standalone benchmark matrix CLI under benchmarks/run.py so source, sink, and middleware scenarios can be compared without adding benchmark helpers to Agora's built-in runtime surface
Added benchmark-specific extras in packaging so the standalone matrix can be installed with agora-etl[benchmark]
Reduced per-record protocol and capability checks in buffered execution and sink fan-out hot paths by caching runtime invariants
Switched JSONL file paths to prefer orjson when available while preserving the previous default=str behavior for unknown values
Improved JsonLinesSink flush behavior by batching serialized output into a single write chunk and preserving buffered records on flush failure
Improved CsvSink throughput by keeping its writer open across flushes instead of reopening the file on every batch
Improved ParquetSink write throughput by batching rows into columnar pyarrow tables and reusing the initial schema across later flushes
Reduced extra Python copying in ParquetSource batch reads so PyArrow materialization is used more directly
Kept higher file-source prefetch tuning scoped to benchmark scenarios instead of changing the default runtime behavior for all file sources
Clarified benchmark memory reporting so the table now describes Python heap usage rather than implying full process RSS
Added per-scenario repeat/median reporting plus Markdown summaries for source cost, sink cost, and buffered-runtime overhead
Added best-effort environment reporting to benchmark exports so docs snapshots capture the host OS, CPU label, RAM, and Python version used for the run
Added explicit adaptive-backpressure coverage for slow writer flush paths so buffered-stage limits scale down under real writer pressure, not only slow checkpoint persistence
Added benchmark matrix rendering and export paths for Rich table output and Markdown docs snapshots

Security And Operator Clarity

Hardened built-in health endpoint responses with Cache-Control: no-store, X-Content-Type-Options: nosniff, and explicit Bearer auth challenges on 401 responses
Surfaced trusted-config warnings and plan output for declarative import references so operators can see when a config will import project Python objects
Expanded docs guidance for trusted config imports, replay modes, checkpoint fail-closed behavior, and the built-in health server's private-network scope
Tightened agora dlq replay failure handling so backend replay-metadata and acknowledge errors are reported as replay failures instead of crashing the command mid-run
Added explicit runtime guarantees and non-guarantees to architecture docs so ordering, fail-closed behavior, replay acknowledgment, and support boundaries are documented as contract-level expectations
Added a dedicated benchmarking guide so the standalone matrix workflow is documented alongside the core docs

Test Coverage

Added regression coverage for SQLite DLQ cross-thread access, file-source early-stop shutdown behavior, file-source prefetch bounds, config import warning/plan output, repeated worker start/stop observer wiring, buffered drain-failure cancellation, and DLQ replay backend failure boundaries
Added preservation coverage for buffered ordering, checkpoint fail-closed behavior, and DLQ replay lifecycle semantics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.1.4

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Runtime Hardening

Performance And Benchmarking

Security And Operator Clarity

Test Coverage

Uh oh!