Skip to content

chore(codecs): centralize events_dropped emission for batch encoding errors#25199

Merged
pront merged 12 commits intomasterfrom
pavlos/centralize-events-dropped-batch-encoding
Apr 28, 2026
Merged

chore(codecs): centralize events_dropped emission for batch encoding errors#25199
pront merged 12 commits intomasterfrom
pavlos/centralize-events-dropped-batch-encoding

Conversation

@pront
Copy link
Copy Markdown
Member

@pront pront commented Apr 15, 2026

Motivation

After #25156 (Parquet encoding), events_dropped emission across batch encoding error paths was inconsistent. Parquet covered most paths but missed build_record_batch failures (e.g. type mismatch); Arrow IPC covered almost none. Every new error path required the author to remember to add emission manually, which is easy to miss.

What this changes

  • Move events_dropped emission to a single wrapper at (Transformer, BatchEncoder)::encode_input so all current and future codec error paths drop accounting "for free".
  • Add a new EncoderRecordBatchError codec event so build_record_batch failures (RecordBatchCreation, ArrowJsonDecode) report component_errors_total with a granular error_code at stage="sending", instead of relying solely on the downstream SinkRequestBuildError at stage="processing".

Notes

  • The non_log_count partial drop in Parquet (non-log events filtered before encoding) can double-count if the batch also fails to encode. Rare intersection; acceptable.
  • Strict-mode rejection in ParquetSerializer::encode still has no dedicated codec event. Tracking as a follow-up so this PR stays scoped to drop-emission centralization plus the build_record_batch granularity gap.

How did you test this PR?

  • make check-clippy clean
  • cargo nextest run -p codecs --features arrow,parquet 278 passed
  • cargo nextest run --package vector --lib sinks::util::encoding 11 passed (includes new test_encode_batch_arrow_emits_record_batch_error_on_type_mismatch)
  • Local end-to-end run of three error scenarios (strict-mode rejection, null constraint, type mismatch) against an aws_s3 sink with healthchecks.enabled=false, confirming component_discarded_events_total and component_errors_total increment as expected for each.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

pront and others added 2 commits April 15, 2026 12:21
…rors

Move events_dropped emission from individual internal events inside
serializers to a single wrapper in (Transformer, BatchEncoder)::encode_input.
This ensures all batch encoding error paths (Arrow IPC and Parquet) consistently
emit events_dropped without requiring each new error path to remember to add it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the domain: sinks Anything related to the Vector's sinks label Apr 15, 2026
@pront pront added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Apr 15, 2026
Covers the build_record_batch ArrowJsonDecode error path where a schema
expects int64 but the event contains a string value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pront pront marked this pull request as ready for review April 15, 2026 16:51
@pront pront requested a review from a team as a code owner April 15, 2026 16:51
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2caa7428ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/sinks/util/encoding.rs Outdated
pront and others added 2 commits April 15, 2026 13:18
…nting

Replace EncoderWriteError with a direct ComponentEventsDropped emission
in the batch encode wrapper. EncoderWriteError was incrementing
component_errors_total and logging "Failed writing bytes." which
double-counted errors (codec-specific events already increment
component_errors_total) and was misleading (the failure was encoding,
not writing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a0def5bf25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/sinks/util/encoding.rs Outdated
pront and others added 2 commits April 15, 2026 14:24
Move ComponentEventsDropped and UNINTENTIONAL imports inside the
cfg(feature = "codecs-arrow") impl block to avoid unused import
errors when the feature is disabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pront pront changed the title fix(codecs): centralize events_dropped emission for batch encoding errors chore(codecs): centralize events_dropped emission for batch encoding errors Apr 15, 2026
@pront pront changed the title chore(codecs): centralize events_dropped emission for batch encoding errors chore(codecs): centralize events_dropped emission for batch encoding errors Apr 15, 2026
@pront pront enabled auto-merge April 28, 2026 14:04
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e66904204b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/sinks/util/encoding.rs
@pront pront disabled auto-merge April 28, 2026 15:51
pront and others added 4 commits April 28, 2026 11:54
The new EncoderRecordBatchError fires from build_record_batch's
RecordBatchCreation and ArrowJsonDecode paths, so type-mismatch and
decoder-build failures emit a granular component_errors_total counter at
stage="sending" with a specific error_code, instead of relying solely on
the downstream SinkRequestBuildError at stage="processing".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atch

Drives (Transformer, BatchEncoder)::encode_input through ArrowStreamSerializer
with an Int64 schema field and a string-valued event to trigger the
ArrowJsonDecode path in build_record_batch. Asserts both EncoderRecordBatchError
and ComponentEventsDropped are recorded so a regression in either emission
fails the test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pront pront enabled auto-merge April 28, 2026 19:35
@pront pront added this pull request to the merge queue Apr 28, 2026
Merged via the queue into master with commit ce6ca43 Apr 28, 2026
81 checks passed
@pront pront deleted the pavlos/centralize-events-dropped-batch-encoding branch April 28, 2026 20:13
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 28, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: sinks Anything related to the Vector's sinks no-changelog Changes in this PR do not need user-facing explanations in the release changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants