chore(codecs): centralize events_dropped emission for batch encoding errors#25199
chore(codecs): centralize events_dropped emission for batch encoding errors#25199
events_dropped emission for batch encoding errors#25199Conversation
…rors Move events_dropped emission from individual internal events inside serializers to a single wrapper in (Transformer, BatchEncoder)::encode_input. This ensures all batch encoding error paths (Arrow IPC and Parquet) consistently emit events_dropped without requiring each new error path to remember to add it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers the build_record_batch ArrowJsonDecode error path where a schema expects int64 but the event contains a string value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2caa7428ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…nting Replace EncoderWriteError with a direct ComponentEventsDropped emission in the batch encode wrapper. EncoderWriteError was incrementing component_errors_total and logging "Failed writing bytes." which double-counted errors (codec-specific events already increment component_errors_total) and was misleading (the failure was encoding, not writing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a0def5bf25
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Move ComponentEventsDropped and UNINTENTIONAL imports inside the cfg(feature = "codecs-arrow") impl block to avoid unused import errors when the feature is disabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
events_dropped emission for batch encoding errors
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e66904204b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The new EncoderRecordBatchError fires from build_record_batch's RecordBatchCreation and ArrowJsonDecode paths, so type-mismatch and decoder-build failures emit a granular component_errors_total counter at stage="sending" with a specific error_code, instead of relying solely on the downstream SinkRequestBuildError at stage="processing". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atch Drives (Transformer, BatchEncoder)::encode_input through ArrowStreamSerializer with an Int64 schema field and a string-valued event to trigger the ArrowJsonDecode path in build_record_batch. Asserts both EncoderRecordBatchError and ComponentEventsDropped are recorded so a regression in either emission fails the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Motivation
After #25156 (Parquet encoding),
events_droppedemission across batch encoding error paths was inconsistent. Parquet covered most paths but missedbuild_record_batchfailures (e.g. type mismatch); Arrow IPC covered almost none. Every new error path required the author to remember to add emission manually, which is easy to miss.What this changes
events_droppedemission to a single wrapper at(Transformer, BatchEncoder)::encode_inputso all current and future codec error paths drop accounting "for free".EncoderRecordBatchErrorcodec event sobuild_record_batchfailures (RecordBatchCreation,ArrowJsonDecode) reportcomponent_errors_totalwith a granularerror_codeatstage="sending", instead of relying solely on the downstreamSinkRequestBuildErroratstage="processing".Notes
non_log_countpartial drop in Parquet (non-log events filtered before encoding) can double-count if the batch also fails to encode. Rare intersection; acceptable.ParquetSerializer::encodestill has no dedicated codec event. Tracking as a follow-up so this PR stays scoped to drop-emission centralization plus thebuild_record_batchgranularity gap.How did you test this PR?
make check-clippycleancargo nextest run -p codecs --features arrow,parquet278 passedcargo nextest run --package vector --lib sinks::util::encoding11 passed (includes newtest_encode_batch_arrow_emits_record_batch_error_on_type_mismatch)healthchecks.enabled=false, confirmingcomponent_discarded_events_totalandcomponent_errors_totalincrement as expected for each.Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References