Skip to content

Release v0.1.4

Choose a tag to compare

@thanhtham010891 thanhtham010891 released this 25 May 17:10
· 17 commits to main since this release

Runtime Hardening

  • Hardened declarative pipeline assembly so configs without sinks now fail fast instead of silently falling back to StdoutSink
  • Fixed SQLite DLQ access patterns so offloaded SQLite I/O no longer depends on implicit thread affinity
  • Hardened file-backed source shutdown paths so early-stop and failure cases do not leave producer threads blocked on bounded queues
  • Restored file-source prefetch control so queue_maxsize once again maps to runtime prefetch depth instead of being silently ignored
  • Changed container singleton-factory startup to fail fast when eager resolution breaks instead of continuing in a partially initialized state
  • Added middleware startup rollback so previously started middleware hooks are stopped if a later on_start() fails during pipeline boot
  • Switched ParquetSource cleanup back to the public pyarrow close path instead of relying on an internal reader handle
  • Prevented WorkerPool from duplicating metrics and lease-release observers across repeated run() calls and ensured distributed coordinators are stopped after clean once-pipeline completion
  • Hardened buffered finalization so drain/flush failures now abort pending buffered work instead of leaving in-flight tasks running behind a failed pipeline

Performance And Benchmarking

  • Added local benchmark tooling under benchmarks/ for generating sample data and measuring end-to-end throughput across CSV, JSONL, and Parquet scenarios
  • Added a standalone benchmark matrix CLI under benchmarks/run.py so source, sink, and middleware scenarios can be compared without adding benchmark helpers to Agora's built-in runtime surface
  • Added benchmark-specific extras in packaging so the standalone matrix can be installed with agora-etl[benchmark]
  • Reduced per-record protocol and capability checks in buffered execution and sink fan-out hot paths by caching runtime invariants
  • Switched JSONL file paths to prefer orjson when available while preserving the previous default=str behavior for unknown values
  • Improved JsonLinesSink flush behavior by batching serialized output into a single write chunk and preserving buffered records on flush failure
  • Improved CsvSink throughput by keeping its writer open across flushes instead of reopening the file on every batch
  • Improved ParquetSink write throughput by batching rows into columnar pyarrow tables and reusing the initial schema across later flushes
  • Reduced extra Python copying in ParquetSource batch reads so PyArrow materialization is used more directly
  • Kept higher file-source prefetch tuning scoped to benchmark scenarios instead of changing the default runtime behavior for all file sources
  • Clarified benchmark memory reporting so the table now describes Python heap usage rather than implying full process RSS
  • Added per-scenario repeat/median reporting plus Markdown summaries for source cost, sink cost, and buffered-runtime overhead
  • Added best-effort environment reporting to benchmark exports so docs snapshots capture the host OS, CPU label, RAM, and Python version used for the run
  • Added explicit adaptive-backpressure coverage for slow writer flush paths so buffered-stage limits scale down under real writer pressure, not only slow checkpoint persistence
  • Added benchmark matrix rendering and export paths for Rich table output and Markdown docs snapshots

Security And Operator Clarity

  • Hardened built-in health endpoint responses with Cache-Control: no-store, X-Content-Type-Options: nosniff, and explicit Bearer auth challenges on 401 responses
  • Surfaced trusted-config warnings and plan output for declarative import references so operators can see when a config will import project Python objects
  • Expanded docs guidance for trusted config imports, replay modes, checkpoint fail-closed behavior, and the built-in health server's private-network scope
  • Tightened agora dlq replay failure handling so backend replay-metadata and acknowledge errors are reported as replay failures instead of crashing the command mid-run
  • Added explicit runtime guarantees and non-guarantees to architecture docs so ordering, fail-closed behavior, replay acknowledgment, and support boundaries are documented as contract-level expectations
  • Added a dedicated benchmarking guide so the standalone matrix workflow is documented alongside the core docs

Test Coverage

  • Added regression coverage for SQLite DLQ cross-thread access, file-source early-stop shutdown behavior, file-source prefetch bounds, config import warning/plan output, repeated worker start/stop observer wiring, buffered drain-failure cancellation, and DLQ replay backend failure boundaries
  • Added preservation coverage for buffered ordering, checkpoint fail-closed behavior, and DLQ replay lifecycle semantics