Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmarks] Migrate linux throughput tests #6765

Merged
merged 32 commits into from
Mar 17, 2025

Conversation

faydef
Copy link
Contributor

@faydef faydef commented Mar 13, 2025

Summary of changes

This PR enables running the Linux based throughput tests through the Benchmarking platform on Gitlab.
The tests are setup to run:

  • On commit to the master branch
  • On schedule if BENCHMARK_RUN is set to "true" for the scheduled pipeline.
  • On manual trigger (keep in mind that it requires build artifacts from the Azure Pipeline)

The results can be checked on the following dashboard

Reason for change

  • The throughput tests previously running on AzDo have been decommissioned.
  • BP introduces more repeatable results

Implementation details

  • We got rid of the dependency to Crank
  • Changed the load tester from Bombardier (Crank) to K6
  • Using dedicated CPU cores for the load tester yields repeatable results as opposed to running the load tester and system under test on the same set of cores.

@faydef faydef requested a review from a team as a code owner March 13, 2025 16:18
@andrewlock
Copy link
Member

andrewlock commented Mar 13, 2025

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (70ms)  : 66, 73
     .   : milestone, 70,
    master - mean (69ms)  : 66, 72
     .   : milestone, 69,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (1,007ms)  : 987, 1028
     .   : milestone, 1007,
    master - mean (1,024ms)  : 958, 1090
     .   : milestone, 1024,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (102ms)  : 100, 105
     .   : milestone, 102,
    master - mean (105ms)  : 101, 110
     .   : milestone, 105,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (685ms)  : 665, 706
     .   : milestone, 685,
    master - mean (699ms)  : 644, 755
     .   : milestone, 699,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (89ms)  : 86, 91
     .   : milestone, 89,
    master - mean (90ms)  : 87, 93
     .   : milestone, 90,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (641ms)  : 626, 656
     .   : milestone, 641,
    master - mean (645ms)  : 623, 667
     .   : milestone, 645,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (190ms)  : 186, 195
     .   : milestone, 190,
    master - mean (190ms)  : 187, 193
     .   : milestone, 190,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (1,108ms)  : 1080, 1137
     .   : milestone, 1108,
    master - mean (1,113ms)  : 1070, 1155
     .   : milestone, 1113,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (269ms)  : 265, 274
     .   : milestone, 269,
    master - mean (269ms)  : 266, 273
     .   : milestone, 269,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (873ms)  : 848, 898
     .   : milestone, 873,
    master - mean (877ms)  : 843, 911
     .   : milestone, 877,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6765) - mean (262ms)  : 258, 266
     .   : milestone, 262,
    master - mean (261ms)  : 257, 266
     .   : milestone, 261,

    section CallTarget+Inlining+NGEN
    This PR (6765) - mean (862ms)  : 831, 893
     .   : milestone, 862,
    master - mean (861ms)  : 827, 895
     .   : milestone, 861,

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Mar 13, 2025

Datadog Report

Branch report: fayssal/migrate-linux-throughput-tests
Commit report: 05304e4
Test service: dd-trace-dotnet

✅ 0 Failed, 247051 Passed, 2398 Skipped, 19h 41m 56.82s Total Time

@andrewlock
Copy link
Member

andrewlock commented Mar 13, 2025

Benchmarks Report for tracer 🐌

Benchmarks for #6765 compared to master:

  • 1 benchmarks are faster, with geometric mean 1.125
  • 2 benchmarks are slower, with geometric mean 1.178
  • All benchmarks have the same allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 8.02μs 43.4ns 288ns 0.0162 0.00812 0 5.61 KB
master StartStopWithChild netcoreapp3.1 10.3μs 53.8ns 280ns 0.0204 0.0102 0 5.8 KB
master StartStopWithChild net472 16.1μs 36.2ns 135ns 1.06 0.32 0.104 6.21 KB
#6765 StartStopWithChild net6.0 8μs 44.4ns 291ns 0.015 0.00749 0 5.61 KB
#6765 StartStopWithChild netcoreapp3.1 10.6μs 60.1ns 425ns 0.0202 0.0101 0 5.8 KB
#6765 StartStopWithChild net472 17.3μs 87.5ns 401ns 1.02 0.296 0.0896 6.2 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 484μs 1.18μs 4.43μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 657μs 475ns 1.84μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 868μs 1.3μs 5.02μs 0.428 0 0 3.3 KB
#6765 WriteAndFlushEnrichedTraces net6.0 524μs 619ns 2.4μs 0 0 0 2.7 KB
#6765 WriteAndFlushEnrichedTraces netcoreapp3.1 664μs 963ns 3.73μs 0 0 0 2.7 KB
#6765 WriteAndFlushEnrichedTraces net472 843μs 738ns 2.86μs 0.419 0 0 3.3 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 132μs 368ns 1.42μs 0.132 0 0 14.47 KB
master SendRequest netcoreapp3.1 147μs 179ns 647ns 0.218 0 0 17.27 KB
master SendRequest net472 0.000776ns 0.000338ns 0.00131ns 0 0 0 0 b
#6765 SendRequest net6.0 130μs 538ns 2.08μs 0.192 0 0 14.47 KB
#6765 SendRequest netcoreapp3.1 151μs 132ns 510ns 0.224 0 0 17.27 KB
#6765 SendRequest net472 0.000168ns 0.000117ns 0.000421ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 551μs 2.39μs 8.96μs 0.571 0 0 41.59 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 725μs 4.21μs 38.3μs 0.347 0 0 41.9 KB
master WriteAndFlushEnrichedTraces net472 857μs 3.99μs 15.5μs 8.08 2.55 0.425 53.29 KB
#6765 WriteAndFlushEnrichedTraces net6.0 563μs 2.94μs 14.4μs 0.556 0 0 41.46 KB
#6765 WriteAndFlushEnrichedTraces netcoreapp3.1 690μs 3.84μs 33.5μs 0.324 0 0 41.7 KB
#6765 WriteAndFlushEnrichedTraces net472 843μs 4.19μs 17.8μs 8.33 2.5 0.417 53.33 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.21μs 1.27ns 4.94ns 0.0141 0 0 1.02 KB
master ExecuteNonQuery netcoreapp3.1 1.71μs 2.26ns 8.76ns 0.0138 0 0 1.02 KB
master ExecuteNonQuery net472 2.22μs 2.88ns 11.1ns 0.156 0.00111 0 987 B
#6765 ExecuteNonQuery net6.0 1.3μs 1.74ns 6.73ns 0.0144 0 0 1.02 KB
#6765 ExecuteNonQuery netcoreapp3.1 1.86μs 2.06ns 8ns 0.0139 0 0 1.02 KB
#6765 ExecuteNonQuery net472 2.04μs 3.15ns 12.2ns 0.156 0.00103 0 987 B
Benchmarks.Trace.ElasticsearchBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #6765

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync‑net6.0 1.133 1,217.99 1,379.73

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.15μs 0.603ns 2.34ns 0.0138 0 0 976 B
master CallElasticsearch netcoreapp3.1 1.58μs 0.916ns 3.3ns 0.0133 0 0 976 B
master CallElasticsearch net472 2.59μs 1.09ns 4.09ns 0.157 0 0 995 B
master CallElasticsearchAsync net6.0 1.22μs 0.568ns 2.13ns 0.0135 0 0 952 B
master CallElasticsearchAsync netcoreapp3.1 1.63μs 1.23ns 4.78ns 0.014 0 0 1.02 KB
master CallElasticsearchAsync net472 2.77μs 1.45ns 5.41ns 0.167 0 0 1.05 KB
#6765 CallElasticsearch net6.0 1.22μs 0.483ns 1.87ns 0.0134 0 0 976 B
#6765 CallElasticsearch netcoreapp3.1 1.55μs 0.433ns 1.68ns 0.0131 0 0 976 B
#6765 CallElasticsearch net472 2.64μs 1.97ns 7.38ns 0.158 0 0 995 B
#6765 CallElasticsearchAsync net6.0 1.38μs 0.894ns 3.46ns 0.0131 0 0 952 B
#6765 CallElasticsearchAsync netcoreapp3.1 1.64μs 0.844ns 3.04ns 0.014 0 0 1.02 KB
#6765 CallElasticsearchAsync net472 2.77μs 1.15ns 4.46ns 0.166 0 0 1.05 KB
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.29μs 0.676ns 2.53ns 0.0135 0 0 952 B
master ExecuteAsync netcoreapp3.1 1.7μs 0.973ns 3.64ns 0.0126 0 0 952 B
master ExecuteAsync net472 1.76μs 0.488ns 1.89ns 0.145 0 0 915 B
#6765 ExecuteAsync net6.0 1.28μs 0.988ns 3.7ns 0.0134 0 0 952 B
#6765 ExecuteAsync netcoreapp3.1 1.63μs 1.02ns 3.52ns 0.0132 0 0 952 B
#6765 ExecuteAsync net472 1.78μs 0.381ns 1.43ns 0.145 0 0 915 B
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 4.5μs 1.36ns 4.91ns 0.0315 0 0 2.31 KB
master SendAsync netcoreapp3.1 5.39μs 2.4ns 8.97ns 0.0377 0 0 2.85 KB
master SendAsync net472 7.46μs 6.43ns 24.9ns 0.495 0 0 3.12 KB
#6765 SendAsync net6.0 4.37μs 1.29ns 4.82ns 0.0328 0 0 2.31 KB
#6765 SendAsync netcoreapp3.1 5.25μs 2.28ns 8.84ns 0.0366 0 0 2.85 KB
#6765 SendAsync net472 7.54μs 1.51ns 5.65ns 0.493 0 0 3.12 KB
Benchmarks.Trace.ILoggerBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #6765

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.ILoggerBenchmark.EnrichedLog‑net6.0 1.125 1,653.43 1,470.33

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 1.65μs 0.846ns 3.28ns 0.0231 0 0 1.64 KB
master EnrichedLog netcoreapp3.1 2.21μs 6.26ns 24.3ns 0.0225 0 0 1.64 KB
master EnrichedLog net472 2.52μs 0.951ns 3.68ns 0.25 0 0 1.57 KB
#6765 EnrichedLog net6.0 1.47μs 3.31ns 12.8ns 0.0233 0 0 1.64 KB
#6765 EnrichedLog netcoreapp3.1 2.28μs 2.71ns 10.5ns 0.0227 0 0 1.64 KB
#6765 EnrichedLog net472 2.52μs 7.98ns 30.9ns 0.249 0 0 1.57 KB
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 112μs 135ns 522ns 0.0556 0 0 4.28 KB
master EnrichedLog netcoreapp3.1 116μs 170ns 657ns 0.0578 0 0 4.28 KB
master EnrichedLog net472 148μs 139ns 539ns 0.668 0.223 0 4.46 KB
#6765 EnrichedLog net6.0 111μs 120ns 463ns 0.0549 0 0 4.28 KB
#6765 EnrichedLog netcoreapp3.1 117μs 154ns 574ns 0.0582 0 0 4.28 KB
#6765 EnrichedLog net472 151μs 146ns 564ns 0.68 0.227 0 4.46 KB
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.85μs 1.02ns 3.81ns 0.0314 0 0 2.2 KB
master EnrichedLog netcoreapp3.1 4.16μs 3.1ns 11.6ns 0.0292 0 0 2.2 KB
master EnrichedLog net472 4.8μs 1.56ns 6.02ns 0.319 0 0 2.02 KB
#6765 EnrichedLog net6.0 3.03μs 2.7ns 10.5ns 0.0305 0 0 2.2 KB
#6765 EnrichedLog netcoreapp3.1 4.05μs 2.21ns 8.28ns 0.0286 0 0 2.2 KB
#6765 EnrichedLog net472 4.99μs 1.23ns 4.6ns 0.32 0 0 2.02 KB
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 1.33μs 3.38ns 13.1ns 0.0165 0 0 1.14 KB
master SendReceive netcoreapp3.1 1.78μs 1.14ns 4.25ns 0.0151 0 0 1.14 KB
master SendReceive net472 2.01μs 0.995ns 3.85ns 0.183 0 0 1.16 KB
#6765 SendReceive net6.0 1.34μs 0.85ns 3.29ns 0.0159 0 0 1.14 KB
#6765 SendReceive netcoreapp3.1 1.76μs 1.96ns 7.58ns 0.0159 0 0 1.14 KB
#6765 SendReceive net472 2.12μs 0.866ns 3.36ns 0.183 0 0 1.16 KB
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.83μs 1.12ns 4.2ns 0.0227 0 0 1.6 KB
master EnrichedLog netcoreapp3.1 3.91μs 1.93ns 7.48ns 0.0215 0 0 1.65 KB
master EnrichedLog net472 4.38μs 2.81ns 10.5ns 0.322 0 0 2.04 KB
#6765 EnrichedLog net6.0 2.83μs 1.47ns 5.68ns 0.0226 0 0 1.6 KB
#6765 EnrichedLog netcoreapp3.1 4.01μs 3.74ns 14.5ns 0.0219 0 0 1.65 KB
#6765 EnrichedLog net472 4.43μs 3.3ns 12.4ns 0.323 0 0 2.04 KB
Benchmarks.Trace.SpanBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #6765

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 1.224 396.89 485.96

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 397ns 0.131ns 0.508ns 0.00815 0 0 576 B
master StartFinishSpan netcoreapp3.1 626ns 0.27ns 1.01ns 0.00782 0 0 576 B
master StartFinishSpan net472 675ns 0.129ns 0.466ns 0.0915 0 0 578 B
master StartFinishScope net6.0 495ns 0.172ns 0.645ns 0.00984 0 0 696 B
master StartFinishScope netcoreapp3.1 684ns 0.331ns 1.28ns 0.00957 0 0 696 B
master StartFinishScope net472 846ns 0.448ns 1.68ns 0.104 0 0 658 B
#6765 StartFinishSpan net6.0 486ns 0.465ns 1.8ns 0.00807 0 0 576 B
#6765 StartFinishSpan netcoreapp3.1 572ns 0.682ns 2.64ns 0.00793 0 0 576 B
#6765 StartFinishSpan net472 688ns 0.14ns 0.541ns 0.0916 0 0 578 B
#6765 StartFinishScope net6.0 523ns 1.56ns 6.03ns 0.00966 0 0 696 B
#6765 StartFinishScope netcoreapp3.1 738ns 0.394ns 1.48ns 0.00924 0 0 696 B
#6765 StartFinishScope net472 932ns 0.369ns 1.43ns 0.104 0 0 658 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 590ns 0.24ns 0.929ns 0.00974 0 0 696 B
master RunOnMethodBegin netcoreapp3.1 929ns 0.658ns 2.55ns 0.00923 0 0 696 B
master RunOnMethodBegin net472 1.16μs 0.39ns 1.51ns 0.104 0 0 658 B
#6765 RunOnMethodBegin net6.0 641ns 0.237ns 0.919ns 0.00961 0 0 696 B
#6765 RunOnMethodBegin netcoreapp3.1 980ns 0.265ns 0.99ns 0.00934 0 0 696 B
#6765 RunOnMethodBegin net472 1.1μs 0.48ns 1.86ns 0.104 0 0 658 B

Copy link
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much!

My main questions are that most of the logic for building the .NET images and pulling the artifacts from the consolidated-pipeline appears to be in the benchmarking repo (even though the code lives in our repo)... is that... normal? 😄

Additionally, in the dashboard, I only see baseline jobs? 🤔 And what are the jobs called e.g. baseline-matrix--i3 (for example)

Thanks for working on this, it can't come soon enough 😄

rules:
- if: '$CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH == "master"'
when: always
- if: '$CI_PIPELINE_SOURCE == "schedule" && $BENCHMARK_RUN == "true"'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify

  • AFAIK, we don't have a scheduled gitlab run currently (just azure devops)
  • Presumably, if we want to add one, we could add it to this file? I don't think it's necessary if we're testing against master, unless we want to also schedule tests against older branches for comparison🤔
  • If we do a manual run, this will always run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have a scheduled gitlab run currently (just azure devops)
Presumably, if we want to add one, we could add it to this file?

I will add the scheduled Gitlab pipeline run once the PR is merged, the other Gitlab schedules for the dd-trace-dotnet repo live here.

If we do a manual run, this will always run?
You'll need to manually trigger the child pipeline.

- benchmarks

variables:
MACROBENCHMARKS_CI_IMAGE: 486234852809.dkr.ecr.us-east-1.amazonaws.com/ci/benchmarking-platform:dotnet-throughput-2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where/how is this image created? 🤔 How would we go about changing it if we need to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is created from the benchmarking-platform repo.
In order to change it, you just need to change the underlying Dockerfile and rerun the job to build the image!

  • btw, this is a great starting point for all things Benchmarking-platform related: Dogfluence doc.

stage: check-azure-pipeline
image: $MACROBENCHMARKS_CI_IMAGE
script:
- git clone --branch dd-trace-dotnet/macro https://gitlab-ci-token:${CI_JOB_TOKEN}@gitlab.ddbuild.io/DataDog/benchmarking-platform platform && cd platform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we always be working in this branch? Or will this need to pull main in the future?

Copy link
Contributor Author

@faydef faydef Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each team has a dedicated benchmarking-branch in the benchmarking-platform repo, the dd-trace-dotnet/macro would be the branch hosting the 'prod' version of the macrobenchmarks, so these benchmarks will always use that one.

image: $MACROBENCHMARKS_CI_IMAGE
script:
- git clone --branch dd-trace-dotnet/macro https://gitlab-ci-token:${CI_JOB_TOKEN}@gitlab.ddbuild.io/DataDog/benchmarking-platform platform && cd platform
- ./wait-for-pipeline.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think this script should be in this repository, not in the benchmarking platform repo 🤔 If we change anything about our pipeline (which we 100% plan to do, a lot), then we have to try to synchronize the changes in the benchmarking repo too, which could be difficult

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The general consensus is that benchmarks (and subsequent helper scripts) live in the benchmarking-platform repo (which is the case of all other teams).
  • Changes to the helper scripts might result in the need for changing the benchmarks themselves which is why they would need to live in the same place.
  • iirc, long term there is a plan to move everything to Gitlab so this part of the benchmark is more of a temporary setup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change anything about our pipeline (which we 100% plan to do, a lot), then we have to try to synchronize the changes in the benchmarking repo too, which could be difficult

👋 Hi Andrew,

Thanks for feedback! We don't have any hard blockers for BP repo and you have write access in case you would like to move things. The current structure is easier for our intended setup, but we ofc won't be blocking if you decide to change things (as this is something you are going to own; plus the setup permissions that we have for dd-trace-dotnet allow a lot of flexibility, up to moving benchmarks completely from BP to dd-trace-dotnet repo, which I won't be recommending for macrobenchmarks, since we plan to setup black box testing environment in mid-term, where same test cases can be used across multiple lang teams - and having everything in one place would simplify setup for us all).

Copy link
Contributor

@ddyurchenko ddyurchenko Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should agree more clearly on ownership of different parts of benchmarking CI pipeline 🤔
For example, I don't think it makes sense for you to own bp-runner parts necessary for running benchmarks as we can scale it x-team.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is how all the other teams have it, then we should definitely stick with that 😄 I think my main feedback is that you could benefit from a little more separation for parts that are dependent on the dd-trace-dotnet CI pipeline.

As a concrete example, there's the libdatadog-build one-pipeline, which handles building the OCI images from binaries. The one-pipeline definition doesn't have concrete knowledge of each CI image, rather, it expects that the CI pipeline will run a stage to ensure the artifacts are placed in a specific location.

I think this separation would be nice - there's still a contract to maintain (the one-pipeline requires the artifacts to be in a known location) - but the contract is all about outputs, not the details of how you obtain those outputs. Currently the BMP design is highly intertwined.

I don't think any of this is a blocker, especially as this undoubtedly better than what we currently have, it's more a suggestion for improvements in maintainability 🙂

FWIW, I think it would benefit you too in that regard, as it gives a much clearer delineation of responsibilities - the BMP repo is purely responsible for taking some artifacts and running them, whereas currently it's also responsible for obtaining them too 🙂

Copy link
Contributor

@ddyurchenko ddyurchenko Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this separation would be nice - there's still a contract to maintain (the one-pipeline requires the artifacts to be in a known location) - but the contract is all about outputs, not the details of how you obtain those outputs. Currently the BMP design is highly intertwined.

Hey,

Every team has slightly different preferences, e.g. dd-trace-py folks want to move shell scripts from BP repo directly inside dd-trace-py like you want. At the same time dd-trace-java, dd-trace-js, dd-trace-php ... all other folks are happy with current structure and don't ask to change it.

So while some teams want things structured differently, we still enforce to keep everything in one place (benchmarking-platform repo). Which is not ideal for some workflows, as you said, convention over configuration might serve us better.

I see benefits in moving most of things to lang repos directly, and we need to do it organized. In meanwhile, the structure we have for dd-trace-dotnet is the same like for other repos.

  1. In dd-trace-dotnet - a thin layer of Gitlab jobs that specify different configuration options for benchmarks.
  2. In benchmarking-platform repo - the "meat" of benchmarking pipeline, scripts that do actual work.

The benefits of having things in benchmarking-platform repo:

  1. We hide all details of interactions with AWS SSM as it was originally suggested by security.
  2. Changes to benchmarking scripts are applied across all pipelines, regardless of the state of main branch in dd-trace-dotnet (so if something breaks, e.g. Azure Pipelines API changes, we don't need to ask users to rebase dev branches in order to have latest fix).

Downsides:

  1. Repos are too coupled, some teams would prefer having scripts directly in their lang repository instead of separate benchmarking-platform repo, mostly because they don't want to edit x-multiple repos when they need to change something in CI pipeline.

As about ownership contract that we currently have with other teams (and the one that we expect with .Net team):

  1. Macrobenchmark application code, microbenchmarks code and Gitlab configuration options for CI jobs (like if tracer is enabled, how much RPS is sent, etc.) after initial install is maintained by you.
  2. bp-runner, PR commenter, benchmark-analyzer, runners infrastructure and CI monitoring are owned by us.
  3. Dashboards and alerts setups are owned by you, but we provide tooling to streamline / automatize setup.
  4. Some CI shell scripts are owned by us and you, meaning if something breaks, either we will reach out to you asap or you should reach out to us and we will help.

As about artifacts downloading script, it is indeed fuzzy (as we currently needed something like it only for dd-trace-php due to involved Circle CI setup, and at some point for dd-trace-java, but after migration of build jobs for Java to Gitlab, not anymore), and falls into the territory of shared ownership. I would say you are in a better position to fix it if it breaks, so PRs are welcome, but you can also ping us and we look into it (as we initially setup it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I totally understand, thanks 🙂

I think my main assertion is that the scripts to download the artifacts, and to build the throughput app, should be owned by us. And because of that, it makes sense to have that defined in our repo, not in the benchmarking platform repo.

There are 2 reasons for that

  • Making changes to these scripts with the current design (e.g. due to changes in our pipeline) is a massive pain without breaking everyone 😄 Doing it "properly" requires 3 separate PRs, instead of 1
  • Remembering that there are external scripts will basically not happen, and as we don't test on PRs, this will break master builds. I practically guarantee it 😉

Anyway, that's all non-blocking stuff that can be updated in the future, just my feedback on the setup 🙂

@faydef
Copy link
Contributor Author

faydef commented Mar 14, 2025

My main questions are that most of the logic for building the .NET images and pulling the artifacts from the consolidated-pipeline appears to be in the benchmarking repo (even though the code lives in our repo)... is that... normal? 😄

Yes the rest of the benchmarking scripts are intertwined with the script that does that so changing it would probably result in changing the benchmarking scripts themselves! 😄

Additionally, in the dashboard, I only see baseline jobs? 🤔 And what are the jobs called e.g. baseline-matrix--i3 (for example)

These are remnants from test runs (which shouldn't happen all the time 👀 )

variables:
FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY: "true"

K6_OPTIONS_WARMUP_RATE: 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment, K6_OPTIONS_WARMUP_RATE seems to be too low, I would suggest raising to between K6_OPTIONS_NORMAL_OPERATION_RATE and K6_OPTIONS_HIGH_LOAD_RATE (400 to 600).

We will need to tweak RPS after there is some data in dashboards, so no action is needed right now.

Copy link
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some minor maintainability questions, but I don't think any of those are blockers.

Very minor point (unrelated to this PR) - the data is all sent to staging, right? we recently were asked to move all our CI data to org2 - should the BMP be doing the same? 🤔

Just to clarify, this PR does 1-4 of the docs, right? Do we need to do 5 + 6, or is that already done?

Thanks again for all your work on this!

@faydef
Copy link
Contributor Author

faydef commented Mar 17, 2025

I had some minor maintainability questions, but I don't think any of those are blockers.

Sure, let's discuss them tomorrow on the sync! As Dima said, most considerations here are recommendations and we allow a high level of flexibility (but we still need to take into account easy setup for future initiatives that require a certain degree of consistency across teams)

Very minor point (unrelated to this PR) - the data is all sent to staging, right? we recently were asked to move all our CI data to org2 - should the BMP be doing the same? 🤔

Yes results from the runs are being sent to staging as you can see from the dashboard! 😉

Just to clarify, this PR does 1-4 of the docs, right? Do we need to do 5 + 6, or is that already done?

We might need to tweak some of the parameters of the test depending on the results from the nightly runs, but that is easily done through changing some CI varibales!

Thanks again for all your work on this!

Thank you 🙇

@faydef faydef force-pushed the fayssal/migrate-linux-throughput-tests branch from eafd4de to 05304e4 Compare March 17, 2025 13:42
@faydef
Copy link
Contributor Author

faydef commented Mar 17, 2025

/merge

@dd-devflow
Copy link

dd-devflow bot commented Mar 17, 2025

View all feedbacks in Devflow UI.
2025-03-17 15:47:18 UTC ℹ️ Start processing command /merge


2025-03-17 15:47:26 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 0s (p90).


2025-03-17 16:08:57 UTC ⚠️ MergeQueue: This merge request build was cancelled

fayssal.defaa@datadoghq.com cancelled this merge request build

@faydef faydef changed the title Fayssal/migrate linux throughput tests [Benchmarks] Migrate linux throughput tests Mar 17, 2025
@faydef
Copy link
Contributor Author

faydef commented Mar 17, 2025

/merge --cancel

@dd-devflow
Copy link

dd-devflow bot commented Mar 17, 2025

View all feedbacks in Devflow UI.
2025-03-17 16:08:49 UTC ℹ️ Start processing command /merge --cancel

@faydef faydef merged commit d514fef into master Mar 17, 2025
128 of 131 checks passed
@faydef faydef deleted the fayssal/migrate-linux-throughput-tests branch March 17, 2025 16:09
@github-actions github-actions bot added this to the vNext-v3 milestone Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants