chore(datadog_metrics sink): switch to v2 endpoint#24842
chore(datadog_metrics sink): switch to v2 endpoint#24842vladimir-dd merged 15 commits intomasterfrom
Conversation
pront
left a comment
There was a problem hiding this comment.
Leaving some early feedback, this looks good overall.
It would be interesting to run some stress tests, it would be interesting to compare against v1.
There's an existing regression experiment here:
thanks, was about to ask for some guidance for performance testing here 🙏 |
8148cb3 to
9bb9b2b
Compare
Add regression test to validate datadog_metrics sink v2 endpoint performance under realistic high-throughput DogStatsD load. Test Configuration: - Load: Default lading dogstatsd settings (realistic ~2KB messages) - Throughput: 500 Mb/s → ~250k events/sec - Batch: Default settings (100k max_events, 2s timeout) - Validates batch splitting when payloads exceed v2 size limits This test ensures v2 endpoint correctly handles batch splitting with realistic high-cardinality DogStatsD metrics under load. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
9bb9b2b to
0cb05d2
Compare
Different series endpoints have different uncompressed payload limits (v2 is 12x smaller than v1). This ensures each batch fits in a single HTTP request without splitting, reducing memory overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Note: will unsubscribe from this PR until it's "ready for review". When it's ready, we will prioritize reviewing it over other PRs. |
… config - Remove references to DatadogMetricsCompression and request_compression from encoder.rs tests (those symbols don't exist in current codebase; they belong to an unmerged compression-options branch) - Fix batcher_user_max_bytes_is_preserved test to avoid struct update syntax with private PhantomData fields in BatchConfig Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…TRICS_SERIES_V2_API env var The old opt-in env var is now a no-op since v2 is the default. Emit a one-time warning so existing users know they can safely remove it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
55877a4 to
8ecf755
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8ecf75526a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… and Sketches Series v2 has a 5 MiB uncompressed payload limit while Sketches allows 60 MiB. Previously both used the same (Series-derived) cap, which over-fragmented sketch-heavy workloads into many small requests. To support per-partition batch configuration, `PartitionedBatcher` now passes the partition key to the batch config factory closure (`Fn(&Key) -> C` instead of `Fn() -> C`). The explicit `timeout: Duration` parameter replaces the previous extraction via `settings().timeout()`. All existing callers are updated mechanically. The `datadog_metrics` sink uses the new capability to select the appropriate byte size limit per endpoint partition, keeping Series at 5 MiB and Sketches at 60 MiB. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
104855e to
365d824
Compare
…refactor - Add `Batch = B` constraint to `with_timer` test constructor so the compiler can infer `B` from the batch config type - Replace `Box::new(move |_| ...)` with unboxed `move |_: &u8| ...` in tests to avoid HRTB inference issues with `Fn(&Key) -> C` - Run `make fmt` to fix formatting in several sink files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
474cced to
3988235
Compare
pront
left a comment
There was a problem hiding this comment.
The implementation looks good to me. I left a comment about a pre-existing issue, interesting to hear your thoughts.
pront
left a comment
There was a problem hiding this comment.
Thanks, excited about this change.
…zation strips interval_ms `into_absolute()` always sets `interval_ms: None`, so the encoded interval for gauges is always 0. The test expectation of 10 was dead code until v2 became the default endpoint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pront
left a comment
There was a problem hiding this comment.
Check failed: https://github.com/vectordotdev/vector/actions/runs/23143552898/job/67226322004
We need to update the DD metrics E2E tests: https://github.com/vectordotdev/vector/blob/master/tests/e2e/datadog-metrics/config/test.yaml#L14
I would add series_api_version: ['v1', 'v2'] entry to the matrix above.
|
@codex review |
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
b72c6d2 to
8635670
Compare
…in e2e tests - Add series_api_version matrix dimension ['v1', 'v2'] to test.yaml, expanding coverage from 3 to 6 environments - Pass CONFIG_SERIES_API_VERSION through docker compose environment to the Vector container - Parameterize series_api_version in vector.toml from the env var - Refactor series::validate() to dispatch to v1 or v2 fetch function based on CONFIG_SERIES_API_VERSION - Extract assertions into compare_intakes() helper (no new assertion logic) - Replace blocking std::thread::sleep with async tokio::time::sleep in the test entry point Rationale: Vector now supports configurable series_api_version (v1/v2) in the datadog_metrics sink. The e2e test previously hardcoded v1 for the vector pipeline. This change runs the same assertions against both API versions via the CI matrix, ensuring correctness is validated for each version independently without duplicating test logic.
8635670 to
e7b77ca
Compare
|
LGTM - feel free to enqueue whenever you want |
This PR switches the Datadog metrics sink to use the v2 series endpoint by default, automatically caps the batcher per endpoint to fix a memory issue and improve throughput, and adds a
statsd_to_datadog_metrics(highload metrics-only pipeline) regression test (run).Memory: Though this problem existed before on v1, the switch to v2's smaller payload limit (5 MiB vs 60 MiB) lowers the threshold at which unbounded batching causes memory pressure
to much lower loads. The fix automatically caps the batcher to each endpoint's payload limit so batches never need splitting, improving memory efficiency for both v1 and v2. Full benchmark results: #24874.
Throughput improvement after the memory fix detected by the regression tests:

Regression tests before the fix(switching from v1 to v2) showed no difference.
Correctness: End-to-end validated against the real DataDog API (#24879) — all metric types (counter, gauge, set, distribution, aggregated histogram, aggregated summary) pass for both v1 and v2. v1 and v2 produce identical aggregated values.