[Benchmarks] Migrate linux throughput tests #6765

faydef · 2025-03-13T16:18:22Z

Summary of changes

This PR enables running the Linux based throughput tests through the Benchmarking platform on Gitlab.
The tests are setup to run:

On commit to the master branch
On schedule if BENCHMARK_RUN is set to "true" for the scheduled pipeline.
On manual trigger (keep in mind that it requires build artifacts from the Azure Pipeline)

The results can be checked on the following dashboard

Reason for change

The throughput tests previously running on AzDo have been decommissioned.
BP introduces more repeatable results

Implementation details

We got rid of the dependency to Crank
Changed the load tester from Bombardier (Crank) to K6
Using dedicated CPU cores for the load tester yields repeatable results as opposed to running the load tester and system under test on the same set of cores.

andrewlock · 2025-03-13T16:52:46Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (70ms)  : 66, 73
     .   : milestone, 70,
    master - mean (69ms)  : 66, 72
     .   : milestone, 69,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (1,007ms)  : 987, 1028
     .   : milestone, 1007,
    master - mean (1,024ms)  : 958, 1090
     .   : milestone, 1024,

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (102ms)  : 100, 105
     .   : milestone, 102,
    master - mean (105ms)  : 101, 110
     .   : milestone, 105,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (685ms)  : 665, 706
     .   : milestone, 685,
    master - mean (699ms)  : 644, 755
     .   : milestone, 699,

gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (89ms)  : 86, 91
     .   : milestone, 89,
    master - mean (90ms)  : 87, 93
     .   : milestone, 90,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (641ms)  : 626, 656
     .   : milestone, 641,
    master - mean (645ms)  : 623, 667
     .   : milestone, 645,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (190ms)  : 186, 195
     .   : milestone, 190,
    master - mean (190ms)  : 187, 193
     .   : milestone, 190,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (1,108ms)  : 1080, 1137
     .   : milestone, 1108,
    master - mean (1,113ms)  : 1070, 1155
     .   : milestone, 1113,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (269ms)  : 265, 274
     .   : milestone, 269,
    master - mean (269ms)  : 266, 273
     .   : milestone, 269,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (873ms)  : 848, 898
     .   : milestone, 873,
    master - mean (877ms)  : 843, 911
     .   : milestone, 877,

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (262ms)  : 258, 266
     .   : milestone, 262,
    master - mean (261ms)  : 257, 266
     .   : milestone, 261,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (862ms)  : 831, 893
     .   : milestone, 862,
    master - mean (861ms)  : 827, 895
     .   : milestone, 861,

datadog-datadog-prod-us1 · 2025-03-13T16:55:46Z

Datadog Report

Branch report: fayssal/migrate-linux-throughput-tests
Commit report: 05304e4
Test service: dd-trace-dotnet

✅ 0 Failed, 247051 Passed, 2398 Skipped, 19h 41m 56.82s Total Time

andrewlock · 2025-03-13T18:31:57Z

Benchmarks Report for tracer 🐌

Benchmarks for #6765 compared to master:

1 benchmarks are faster, with geometric mean 1.125
2 benchmarks are slower, with geometric mean 1.178
All benchmarks have the same allocations

The following thresholds were used for comparing the benchmark speeds:

Mann–Whitney U test with statistical test for significance of 5%
Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`StartStopWithChild`	net6.0	8.02μs	43.4ns	288ns	0.0162	0.00812	0	5.61 KB
master	`StartStopWithChild`	netcoreapp3.1	10.3μs	53.8ns	280ns	0.0204	0.0102	0	5.8 KB
master	`StartStopWithChild`	net472	16.1μs	36.2ns	135ns	1.06	0.32	0.104	6.21 KB
#6765	`StartStopWithChild`	net6.0	8μs	44.4ns	291ns	0.015	0.00749	0	5.61 KB
#6765	`StartStopWithChild`	netcoreapp3.1	10.6μs	60.1ns	425ns	0.0202	0.0101	0	5.8 KB
#6765	`StartStopWithChild`	net472	17.3μs	87.5ns	401ns	1.02	0.296	0.0896	6.2 KB

Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	484μs	1.18μs	4.43μs	0	2.7 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	657μs	475ns	1.84μs	0	2.7 KB
master	`WriteAndFlushEnrichedTraces`	net472	868μs	1.3μs	5.02μs	0.428	3.3 KB
#6765	`WriteAndFlushEnrichedTraces`	net6.0	524μs	619ns	2.4μs	0	2.7 KB
#6765	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	664μs	963ns	3.73μs	0	2.7 KB
#6765	`WriteAndFlushEnrichedTraces`	net472	843μs	738ns	2.86μs	0.419	3.3 KB

Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendRequest`	net6.0	132μs	368ns	1.42μs	0.132	14.47 KB
master	`SendRequest`	netcoreapp3.1	147μs	179ns	647ns	0.218	17.27 KB
master	`SendRequest`	net472	0.000776ns	0.000338ns	0.00131ns	0	0 b
#6765	`SendRequest`	net6.0	130μs	538ns	2.08μs	0.192	14.47 KB
#6765	`SendRequest`	netcoreapp3.1	151μs	132ns	510ns	0.224	17.27 KB
#6765	`SendRequest`	net472	0.000168ns	0.000117ns	0.000421ns	0	0 b

Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	551μs	2.39μs	8.96μs	0.571	0	0	41.59 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	725μs	4.21μs	38.3μs	0.347	0	0	41.9 KB
master	`WriteAndFlushEnrichedTraces`	net472	857μs	3.99μs	15.5μs	8.08	2.55	0.425	53.29 KB
#6765	`WriteAndFlushEnrichedTraces`	net6.0	563μs	2.94μs	14.4μs	0.556	0	0	41.46 KB
#6765	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	690μs	3.84μs	33.5μs	0.324	0	0	41.7 KB
#6765	`WriteAndFlushEnrichedTraces`	net472	843μs	4.19μs	17.8μs	8.33	2.5	0.417	53.33 KB

Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`ExecuteNonQuery`	net6.0	1.21μs	1.27ns	4.94ns	0.0141	0	1.02 KB
master	`ExecuteNonQuery`	netcoreapp3.1	1.71μs	2.26ns	8.76ns	0.0138	0	1.02 KB
master	`ExecuteNonQuery`	net472	2.22μs	2.88ns	11.1ns	0.156	0.00111	987 B
#6765	`ExecuteNonQuery`	net6.0	1.3μs	1.74ns	6.73ns	0.0144	0	1.02 KB
#6765	`ExecuteNonQuery`	netcoreapp3.1	1.86μs	2.06ns	8ns	0.0139	0	1.02 KB
#6765	`ExecuteNonQuery`	net472	2.04μs	3.15ns	12.2ns	0.156	0.00103	987 B

Benchmarks.Trace.ElasticsearchBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #6765

Benchmark	diff/base	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync‑net6.0	1.133	1,217.99	1,379.73

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`CallElasticsearch`	net6.0	1.15μs	0.603ns	2.34ns	0.0138	976 B
master	`CallElasticsearch`	netcoreapp3.1	1.58μs	0.916ns	3.3ns	0.0133	976 B
master	`CallElasticsearch`	net472	2.59μs	1.09ns	4.09ns	0.157	995 B
master	`CallElasticsearchAsync`	net6.0	1.22μs	0.568ns	2.13ns	0.0135	952 B
master	`CallElasticsearchAsync`	netcoreapp3.1	1.63μs	1.23ns	4.78ns	0.014	1.02 KB
master	`CallElasticsearchAsync`	net472	2.77μs	1.45ns	5.41ns	0.167	1.05 KB
#6765	`CallElasticsearch`	net6.0	1.22μs	0.483ns	1.87ns	0.0134	976 B
#6765	`CallElasticsearch`	netcoreapp3.1	1.55μs	0.433ns	1.68ns	0.0131	976 B
#6765	`CallElasticsearch`	net472	2.64μs	1.97ns	7.38ns	0.158	995 B
#6765	`CallElasticsearchAsync`	net6.0	1.38μs	0.894ns	3.46ns	0.0131	952 B
#6765	`CallElasticsearchAsync`	netcoreapp3.1	1.64μs	0.844ns	3.04ns	0.014	1.02 KB
#6765	`CallElasticsearchAsync`	net472	2.77μs	1.15ns	4.46ns	0.166	1.05 KB

Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`ExecuteAsync`	net6.0	1.29μs	0.676ns	2.53ns	0.0135	952 B
master	`ExecuteAsync`	netcoreapp3.1	1.7μs	0.973ns	3.64ns	0.0126	952 B
master	`ExecuteAsync`	net472	1.76μs	0.488ns	1.89ns	0.145	915 B
#6765	`ExecuteAsync`	net6.0	1.28μs	0.988ns	3.7ns	0.0134	952 B
#6765	`ExecuteAsync`	netcoreapp3.1	1.63μs	1.02ns	3.52ns	0.0132	952 B
#6765	`ExecuteAsync`	net472	1.78μs	0.381ns	1.43ns	0.145	915 B

Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendAsync`	net6.0	4.5μs	1.36ns	4.91ns	0.0315	2.31 KB
master	`SendAsync`	netcoreapp3.1	5.39μs	2.4ns	8.97ns	0.0377	2.85 KB
master	`SendAsync`	net472	7.46μs	6.43ns	24.9ns	0.495	3.12 KB
#6765	`SendAsync`	net6.0	4.37μs	1.29ns	4.82ns	0.0328	2.31 KB
#6765	`SendAsync`	netcoreapp3.1	5.25μs	2.28ns	8.84ns	0.0366	2.85 KB
#6765	`SendAsync`	net472	7.54μs	1.51ns	5.65ns	0.493	3.12 KB

Benchmarks.Trace.ILoggerBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #6765

Benchmark	base/diff	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.ILoggerBenchmark.EnrichedLog‑net6.0	1.125	1,653.43	1,470.33

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	1.65μs	0.846ns	3.28ns	0.0231	1.64 KB
master	`EnrichedLog`	netcoreapp3.1	2.21μs	6.26ns	24.3ns	0.0225	1.64 KB
master	`EnrichedLog`	net472	2.52μs	0.951ns	3.68ns	0.25	1.57 KB
#6765	`EnrichedLog`	net6.0	1.47μs	3.31ns	12.8ns	0.0233	1.64 KB
#6765	`EnrichedLog`	netcoreapp3.1	2.28μs	2.71ns	10.5ns	0.0227	1.64 KB
#6765	`EnrichedLog`	net472	2.52μs	7.98ns	30.9ns	0.249	1.57 KB

Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`EnrichedLog`	net6.0	112μs	135ns	522ns	0.0556	0	4.28 KB
master	`EnrichedLog`	netcoreapp3.1	116μs	170ns	657ns	0.0578	0	4.28 KB
master	`EnrichedLog`	net472	148μs	139ns	539ns	0.668	0.223	4.46 KB
#6765	`EnrichedLog`	net6.0	111μs	120ns	463ns	0.0549	0	4.28 KB
#6765	`EnrichedLog`	netcoreapp3.1	117μs	154ns	574ns	0.0582	0	4.28 KB
#6765	`EnrichedLog`	net472	151μs	146ns	564ns	0.68	0.227	4.46 KB

Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	2.85μs	1.02ns	3.81ns	0.0314	2.2 KB
master	`EnrichedLog`	netcoreapp3.1	4.16μs	3.1ns	11.6ns	0.0292	2.2 KB
master	`EnrichedLog`	net472	4.8μs	1.56ns	6.02ns	0.319	2.02 KB
#6765	`EnrichedLog`	net6.0	3.03μs	2.7ns	10.5ns	0.0305	2.2 KB
#6765	`EnrichedLog`	netcoreapp3.1	4.05μs	2.21ns	8.28ns	0.0286	2.2 KB
#6765	`EnrichedLog`	net472	4.99μs	1.23ns	4.6ns	0.32	2.02 KB

Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendReceive`	net6.0	1.33μs	3.38ns	13.1ns	0.0165	1.14 KB
master	`SendReceive`	netcoreapp3.1	1.78μs	1.14ns	4.25ns	0.0151	1.14 KB
master	`SendReceive`	net472	2.01μs	0.995ns	3.85ns	0.183	1.16 KB
#6765	`SendReceive`	net6.0	1.34μs	0.85ns	3.29ns	0.0159	1.14 KB
#6765	`SendReceive`	netcoreapp3.1	1.76μs	1.96ns	7.58ns	0.0159	1.14 KB
#6765	`SendReceive`	net472	2.12μs	0.866ns	3.36ns	0.183	1.16 KB

Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	2.83μs	1.12ns	4.2ns	0.0227	1.6 KB
master	`EnrichedLog`	netcoreapp3.1	3.91μs	1.93ns	7.48ns	0.0215	1.65 KB
master	`EnrichedLog`	net472	4.38μs	2.81ns	10.5ns	0.322	2.04 KB
#6765	`EnrichedLog`	net6.0	2.83μs	1.47ns	5.68ns	0.0226	1.6 KB
#6765	`EnrichedLog`	netcoreapp3.1	4.01μs	3.74ns	14.5ns	0.0219	1.65 KB
#6765	`EnrichedLog`	net472	4.43μs	3.3ns	12.4ns	0.323	2.04 KB

Benchmarks.Trace.SpanBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #6765

Benchmark	diff/base	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0	1.224	396.89	485.96

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`StartFinishSpan`	net6.0	397ns	0.131ns	0.508ns	0.00815	576 B
master	`StartFinishSpan`	netcoreapp3.1	626ns	0.27ns	1.01ns	0.00782	576 B
master	`StartFinishSpan`	net472	675ns	0.129ns	0.466ns	0.0915	578 B
master	`StartFinishScope`	net6.0	495ns	0.172ns	0.645ns	0.00984	696 B
master	`StartFinishScope`	netcoreapp3.1	684ns	0.331ns	1.28ns	0.00957	696 B
master	`StartFinishScope`	net472	846ns	0.448ns	1.68ns	0.104	658 B
#6765	`StartFinishSpan`	net6.0	486ns	0.465ns	1.8ns	0.00807	576 B
#6765	`StartFinishSpan`	netcoreapp3.1	572ns	0.682ns	2.64ns	0.00793	576 B
#6765	`StartFinishSpan`	net472	688ns	0.14ns	0.541ns	0.0916	578 B
#6765	`StartFinishScope`	net6.0	523ns	1.56ns	6.03ns	0.00966	696 B
#6765	`StartFinishScope`	netcoreapp3.1	738ns	0.394ns	1.48ns	0.00924	696 B
#6765	`StartFinishScope`	net472	932ns	0.369ns	1.43ns	0.104	658 B

Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`RunOnMethodBegin`	net6.0	590ns	0.24ns	0.929ns	0.00974	696 B
master	`RunOnMethodBegin`	netcoreapp3.1	929ns	0.658ns	2.55ns	0.00923	696 B
master	`RunOnMethodBegin`	net472	1.16μs	0.39ns	1.51ns	0.104	658 B
#6765	`RunOnMethodBegin`	net6.0	641ns	0.237ns	0.919ns	0.00961	696 B
#6765	`RunOnMethodBegin`	netcoreapp3.1	980ns	0.265ns	0.99ns	0.00934	696 B
#6765	`RunOnMethodBegin`	net472	1.1μs	0.48ns	1.86ns	0.104	658 B

andrewlock

Thanks so much!

My main questions are that most of the logic for building the .NET images and pulling the artifacts from the consolidated-pipeline appears to be in the benchmarking repo (even though the code lives in our repo)... is that... normal? 😄

Additionally, in the dashboard, I only see baseline jobs? 🤔 And what are the jobs called e.g. baseline-matrix--i3 (for example)

Thanks for working on this, it can't come soon enough 😄

andrewlock · 2025-03-14T14:47:29Z

.gitlab-ci.yml

+  rules:
+    - if: '$CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH == "master"'
+      when: always
+    - if: '$CI_PIPELINE_SOURCE == "schedule" && $BENCHMARK_RUN == "true"'


Just to clarify

AFAIK, we don't have a scheduled gitlab run currently (just azure devops)

Presumably, if we want to add one, we could add it to this file? I don't think it's necessary if we're testing against master, unless we want to also schedule tests against older branches for comparison🤔

If we do a manual run, this will always run?

we don't have a scheduled gitlab run currently (just azure devops)
Presumably, if we want to add one, we could add it to this file?

I will add the scheduled Gitlab pipeline run once the PR is merged, the other Gitlab schedules for the dd-trace-dotnet repo live here.

If we do a manual run, this will always run?
You'll need to manually trigger the child pipeline.

andrewlock · 2025-03-14T14:48:47Z

.gitlab/benchmarks/macrobenchmarks.yml

+  - benchmarks
+
+variables:
+  MACROBENCHMARKS_CI_IMAGE: 486234852809.dkr.ecr.us-east-1.amazonaws.com/ci/benchmarking-platform:dotnet-throughput-2


Where/how is this image created? 🤔 How would we go about changing it if we need to?

It is created from the benchmarking-platform repo.
In order to change it, you just need to change the underlying Dockerfile and rerun the job to build the image!

btw, this is a great starting point for all things Benchmarking-platform related: Dogfluence doc.

andrewlock · 2025-03-14T14:49:57Z

.gitlab/benchmarks/macrobenchmarks.yml

+  stage: check-azure-pipeline
+  image: $MACROBENCHMARKS_CI_IMAGE
+  script:
+    - git clone --branch dd-trace-dotnet/macro https://gitlab-ci-token:${CI_JOB_TOKEN}@gitlab.ddbuild.io/DataDog/benchmarking-platform platform && cd platform


will we always be working in this branch? Or will this need to pull main in the future?

Each team has a dedicated benchmarking-branch in the benchmarking-platform repo, the dd-trace-dotnet/macro would be the branch hosting the 'prod' version of the macrobenchmarks, so these benchmarks will always use that one.

andrewlock · 2025-03-14T14:51:39Z

.gitlab/benchmarks/macrobenchmarks.yml

+  image: $MACROBENCHMARKS_CI_IMAGE
+  script:
+    - git clone --branch dd-trace-dotnet/macro https://gitlab-ci-token:${CI_JOB_TOKEN}@gitlab.ddbuild.io/DataDog/benchmarking-platform platform && cd platform
+    - ./wait-for-pipeline.sh


Personally I think this script should be in this repository, not in the benchmarking platform repo 🤔 If we change anything about our pipeline (which we 100% plan to do, a lot), then we have to try to synchronize the changes in the benchmarking repo too, which could be difficult

The general consensus is that benchmarks (and subsequent helper scripts) live in the benchmarking-platform repo (which is the case of all other teams).

Changes to the helper scripts might result in the need for changing the benchmarks themselves which is why they would need to live in the same place.

iirc, long term there is a plan to move everything to Gitlab so this part of the benchmark is more of a temporary setup.

If we change anything about our pipeline (which we 100% plan to do, a lot), then we have to try to synchronize the changes in the benchmarking repo too, which could be difficult

👋 Hi Andrew,

Thanks for feedback! We don't have any hard blockers for BP repo and you have write access in case you would like to move things. The current structure is easier for our intended setup, but we ofc won't be blocking if you decide to change things (as this is something you are going to own; plus the setup permissions that we have for dd-trace-dotnet allow a lot of flexibility, up to moving benchmarks completely from BP to dd-trace-dotnet repo, which I won't be recommending for macrobenchmarks, since we plan to setup black box testing environment in mid-term, where same test cases can be used across multiple lang teams - and having everything in one place would simplify setup for us all).

Maybe we should agree more clearly on ownership of different parts of benchmarking CI pipeline 🤔
For example, I don't think it makes sense for you to own bp-runner parts necessary for running benchmarks as we can scale it x-team.

If this is how all the other teams have it, then we should definitely stick with that 😄 I think my main feedback is that you could benefit from a little more separation for parts that are dependent on the dd-trace-dotnet CI pipeline.

As a concrete example, there's the libdatadog-build one-pipeline, which handles building the OCI images from binaries. The one-pipeline definition doesn't have concrete knowledge of each CI image, rather, it expects that the CI pipeline will run a stage to ensure the artifacts are placed in a specific location.

I think this separation would be nice - there's still a contract to maintain (the one-pipeline requires the artifacts to be in a known location) - but the contract is all about outputs, not the details of how you obtain those outputs. Currently the BMP design is highly intertwined.

I don't think any of this is a blocker, especially as this undoubtedly better than what we currently have, it's more a suggestion for improvements in maintainability 🙂

FWIW, I think it would benefit you too in that regard, as it gives a much clearer delineation of responsibilities - the BMP repo is purely responsible for taking some artifacts and running them, whereas currently it's also responsible for obtaining them too 🙂

I think this separation would be nice - there's still a contract to maintain (the one-pipeline requires the artifacts to be in a known location) - but the contract is all about outputs, not the details of how you obtain those outputs. Currently the BMP design is highly intertwined.

Hey,

Every team has slightly different preferences, e.g. dd-trace-py folks want to move shell scripts from BP repo directly inside dd-trace-py like you want. At the same time dd-trace-java, dd-trace-js, dd-trace-php ... all other folks are happy with current structure and don't ask to change it.

So while some teams want things structured differently, we still enforce to keep everything in one place (benchmarking-platform repo). Which is not ideal for some workflows, as you said, convention over configuration might serve us better.

I see benefits in moving most of things to lang repos directly, and we need to do it organized. In meanwhile, the structure we have for dd-trace-dotnet is the same like for other repos.

In dd-trace-dotnet - a thin layer of Gitlab jobs that specify different configuration options for benchmarks.

In benchmarking-platform repo - the "meat" of benchmarking pipeline, scripts that do actual work.

The benefits of having things in benchmarking-platform repo:

We hide all details of interactions with AWS SSM as it was originally suggested by security.

Changes to benchmarking scripts are applied across all pipelines, regardless of the state of main branch in dd-trace-dotnet (so if something breaks, e.g. Azure Pipelines API changes, we don't need to ask users to rebase dev branches in order to have latest fix).

Downsides:

Repos are too coupled, some teams would prefer having scripts directly in their lang repository instead of separate benchmarking-platform repo, mostly because they don't want to edit x-multiple repos when they need to change something in CI pipeline.

As about ownership contract that we currently have with other teams (and the one that we expect with .Net team):

Macrobenchmark application code, microbenchmarks code and Gitlab configuration options for CI jobs (like if tracer is enabled, how much RPS is sent, etc.) after initial install is maintained by you.

bp-runner, PR commenter, benchmark-analyzer, runners infrastructure and CI monitoring are owned by us.

Dashboards and alerts setups are owned by you, but we provide tooling to streamline / automatize setup.

Some CI shell scripts are owned by us and you, meaning if something breaks, either we will reach out to you asap or you should reach out to us and we will help.

As about artifacts downloading script, it is indeed fuzzy (as we currently needed something like it only for dd-trace-php due to involved Circle CI setup, and at some point for dd-trace-java, but after migration of build jobs for Java to Gitlab, not anymore), and falls into the territory of shared ownership. I would say you are in a better position to fix it if it breaks, so PRs are welcome, but you can also ping us and we look into it (as we initially setup it).

Yep, I totally understand, thanks 🙂

I think my main assertion is that the scripts to download the artifacts, and to build the throughput app, should be owned by us. And because of that, it makes sense to have that defined in our repo, not in the benchmarking platform repo.

There are 2 reasons for that

Making changes to these scripts with the current design (e.g. due to changes in our pipeline) is a massive pain without breaking everyone 😄 Doing it "properly" requires 3 separate PRs, instead of 1

Remembering that there are external scripts will basically not happen, and as we don't test on PRs, this will break master builds. I practically guarantee it 😉

Anyway, that's all non-blocking stuff that can be updated in the future, just my feedback on the setup 🙂

faydef · 2025-03-14T15:31:20Z

My main questions are that most of the logic for building the .NET images and pulling the artifacts from the consolidated-pipeline appears to be in the benchmarking repo (even though the code lives in our repo)... is that... normal? 😄

Yes the rest of the benchmarking scripts are intertwined with the script that does that so changing it would probably result in changing the benchmarking scripts themselves! 😄

Additionally, in the dashboard, I only see baseline jobs? 🤔 And what are the jobs called e.g. baseline-matrix--i3 (for example)

These are remnants from test runs (which shouldn't happen all the time 👀 )

ddyurchenko · 2025-03-14T16:47:04Z

.gitlab/benchmarks/macrobenchmarks.yml

+  variables:
+    FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY: "true"
+
+    K6_OPTIONS_WARMUP_RATE: 30


Just a comment, K6_OPTIONS_WARMUP_RATE seems to be too low, I would suggest raising to between K6_OPTIONS_NORMAL_OPERATION_RATE and K6_OPTIONS_HIGH_LOAD_RATE (400 to 600).

We will need to tweak RPS after there is some data in dashboards, so no action is needed right now.

andrewlock

I had some minor maintainability questions, but I don't think any of those are blockers.

Very minor point (unrelated to this PR) - the data is all sent to staging, right? we recently were asked to move all our CI data to org2 - should the BMP be doing the same? 🤔

Just to clarify, this PR does 1-4 of the docs, right? Do we need to do 5 + 6, or is that already done?

Thanks again for all your work on this!

faydef · 2025-03-17T12:56:00Z

I had some minor maintainability questions, but I don't think any of those are blockers.

Sure, let's discuss them tomorrow on the sync! As Dima said, most considerations here are recommendations and we allow a high level of flexibility (but we still need to take into account easy setup for future initiatives that require a certain degree of consistency across teams)

Very minor point (unrelated to this PR) - the data is all sent to staging, right? we recently were asked to move all our CI data to org2 - should the BMP be doing the same? 🤔

Yes results from the runs are being sent to staging as you can see from the dashboard! 😉

Just to clarify, this PR does 1-4 of the docs, right? Do we need to do 5 + 6, or is that already done?

We might need to tweak some of the parameters of the test depending on the results from the nightly runs, but that is easily done through changing some CI varibales!

Thanks again for all your work on this!

Thank you 🙇

…e, api call

…ak pipeline if fails

faydef · 2025-03-17T15:47:14Z

/merge

dd-devflow · 2025-03-17T15:47:19Z

View all feedbacks in Devflow UI.
2025-03-17 15:47:18 UTC ℹ️ Start processing command /merge

2025-03-17 15:47:26 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 0s (p90).

2025-03-17 16:08:57 UTC ⚠️ MergeQueue: This merge request build was cancelled

fayssal.defaa@datadoghq.com cancelled this merge request build

faydef · 2025-03-17T16:08:44Z

/merge --cancel

dd-devflow · 2025-03-17T16:08:49Z

View all feedbacks in Devflow UI.
2025-03-17 16:08:49 UTC ℹ️ Start processing command /merge --cancel

faydef requested a review from a team as a code owner March 13, 2025 16:18

ddyurchenko approved these changes Mar 14, 2025

View reviewed changes

andrewlock reviewed Mar 14, 2025

View reviewed changes

ddyurchenko reviewed Mar 14, 2025

View reviewed changes

andrewlock approved these changes Mar 17, 2025

View reviewed changes

faydef added 20 commits March 17, 2025 14:42

add trigger from dotnet repo

1e0d81c

szitch to fixed branch

5677df2

fix windows stage

f61092b

change artifact path

b5a03b9

omit unnecessary artifact

96949d3

change branch

e3c00ae

change aws parameter

df818b2

pass artifacts first

0652145

copy env file first

d9735f9

change ssm parameter

85d557a

stick to earlier parameter

da2b695

use new image and get the DD_API_KEY externally

5151d6a

only run manually for now

1376c4a

run simple test to spawn a windows instance

458c87e

fix typo

0c0dec9

expidite runs of macroebenchmarks

1f54bca

add a test stage

623b7e0

chnage image to earlier version of tooling

2470e32

switch to prod branch

78a3f86

only keep linux benchmarks

0979c38

faydef added 12 commits March 17, 2025 14:42

update rules to only run on commit on master, manual trigger, schedul…

6628900

…e, api call

allow failure of benchmark related jobs in order not to initially bre…

bf19a69

…ak pipeline if fails

allow manually triggering the jobs as well

0f6d8ae

change triggers for child pipeline

6fca030

change triggers for jobs

13aa352

only keep rules on child pipeline

ba27771

temporarely override branch

018bb97

change the override mechanism

2c0d6cf

delete override variable

169deb1

CI: only run scheduled benchmarks on their dedicated schedule

1f41a57

seperate setup for arm runners

58597dc

tweak: increase warm up rate

05304e4

faydef force-pushed the fayssal/migrate-linux-throughput-tests branch from eafd4de to 05304e4 Compare March 17, 2025 13:42

dd-devflow bot added mergequeue-status: queued mergequeue-status: in_progress and removed mergequeue-status: queued labels Mar 17, 2025

faydef changed the title ~~Fayssal/migrate linux throughput tests~~ [Benchmarks] Migrate linux throughput tests Mar 17, 2025

dd-devflow bot added mergequeue-status: removed and removed mergequeue-status: in_progress labels Mar 17, 2025

faydef merged commit d514fef into master Mar 17, 2025
128 of 131 checks passed

faydef deleted the fayssal/migrate-linux-throughput-tests branch March 17, 2025 16:09

github-actions bot added this to the vNext-v3 milestone Mar 17, 2025

[Benchmarks] Migrate linux throughput tests #6765

[Benchmarks] Migrate linux throughput tests #6765

Conversation

faydef commented Mar 13, 2025

Summary of changes

Reason for change

Implementation details

andrewlock commented Mar 13, 2025 • edited Loading

Execution-Time Benchmarks Report ⏱️

datadog-datadog-prod-us1 bot commented Mar 13, 2025 • edited Loading

Datadog Report

andrewlock commented Mar 13, 2025 • edited Loading

Benchmarks Report for tracer 🐌

Benchmark details

Raw results

Raw results

Raw results

Raw results

Raw results

Slower ⚠️ in #6765

Raw results

Raw results

Raw results

Faster 🎉 in #6765

Raw results

Raw results

Raw results

Raw results

Raw results

Slower ⚠️ in #6765

Raw results

Raw results

andrewlock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faydef Mar 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddyurchenko Mar 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddyurchenko Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faydef commented Mar 14, 2025

Choose a reason for hiding this comment

andrewlock left a comment

Choose a reason for hiding this comment

faydef commented Mar 17, 2025

faydef commented Mar 17, 2025

dd-devflow bot commented Mar 17, 2025 • edited Loading

faydef commented Mar 17, 2025

dd-devflow bot commented Mar 17, 2025 • edited Loading

andrewlock commented Mar 13, 2025 •

edited

Loading

datadog-datadog-prod-us1 bot commented Mar 13, 2025 •

edited

Loading

andrewlock commented Mar 13, 2025 •

edited

Loading

faydef Mar 14, 2025 •

edited

Loading

ddyurchenko Mar 14, 2025 •

edited

Loading

ddyurchenko Mar 17, 2025 •

edited

Loading

dd-devflow bot commented Mar 17, 2025 •

edited

Loading

dd-devflow bot commented Mar 17, 2025 •

edited

Loading