-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ci): add analysis from smp to regression workflow #15063
Conversation
All of the `smp` subcommands take a `--team-id` argument between `smp` and `job`, but this flag is missing in the `smp job cancel` invocation in the Vector Regression workflow, so this commit adds that flag. Signed-off-by: Geoffrey M. Oxberry <geoffrey.oxberry@datadoghq.com>
Flags for the `smp job submit` step have changed: as a result of passing in image tags to specify the baseline and comparison variants of a target in regression testing, `smp` no longer has access to Git commit SHA metadata via the `--baseline-image` and `--comparison-image` flag parameters. To enable `smp` to report this information during its analysis step, the `--baseline-sha` and `--comparison-sha` flags were added to `smp job submit`. This commit adds those flags and the corresponding commit hashes to the job submission step in the regression workflow. Signed-off-by: Geoffrey M. Oxberry <geoffrey.oxberry@datadoghq.com>
To add regression reports from `smp`, this commit uncomments and renames the analysis job, making minor tweaks so that the resulting job: * Downloads an `smp` binary with the right version. * Uses that binary to download an HTML report analyzing regression experiments on a Vector PR. * Posts that HTML report into a comment on that PR. * Uploads the HTML report to workflow run artifacts. Hat-tip and thanks to Brian L Troutwine & George Hahn for making these changes easy. Signed-off-by: Geoffrey M. Oxberry <geoffrey.oxberry@datadoghq.com>
✅ Deploy Preview for vector-project canceled.
|
✅ Deploy Preview for vrl-playground canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. All that remains to be seen is whether the analysis posts, which I have high confidence of.
It is a bit unclear to me, this doesn't address the secret access issue yet - right? |
This commit adds a missing task dependency to the analysis job so that it downloads the correct version of `smp`. Signed-off-by: Geoffrey M. Oxberry <geoffrey.oxberry@datadoghq.com>
GitHub Actions CI emits warnings about the `aws-actions/configure-aws-credentials@v1` action because it uses Node.js 12, and Node.js 12 actions are deprecated with removal set for Summer 2023. Migrating to Node.js-16-based GitHub Actions is the suggested mitigation, and the `aws-actions/configure-aws-credentials@v1-node16` is suggested as a replacement, so this commit makes that change. Signed-off-by: Geoffrey M. Oxberry <geoffrey.oxberry@datadoghq.com>
Should be able to fix the Node.js version warning here by changing |
@jszwedko @spencergilbert once this is merged we'll have two sets of
That's correct. In order to address that this workflow would need to be split up with the code building portion running per-PR and the job submission etc running in a triggered workflow sourced from |
In the Regression task, the `build-comparison` job, which builds a Vector container for the comparison variant of a regression experiment, uses the display name "Build baseline Vector container". This name confused me when I first saw the output in GitHub Actions because I thought I might have introduced a bug into the CI Regression workflow, but it appears that the `build-comparison` job is correctly configured aside from having a misleading name. To avoid this sort of confusion in the future, this commit updates the display name of the `build-comparison` job in the Regression workflow to reflect more accurately what that job does. Signed-off-by: Geoffrey M. Oxberry <geoffrey.oxberry@datadoghq.com>
Soak Test ResultsBaseline: f7f4b64 ExplanationA soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core. The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed. No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%: Fine details of change detection per experiment.
|
Hmm, it looks like |
Soak Test ResultsBaseline: 373e1c8 ExplanationA soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core. The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed. No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%: Fine details of change detection per experiment.
|
Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>
It turns out that lading will crash on the splunk_hec_route_s3 experiment because it expects acks. We have to intentionally inform Vector to ack. Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, but otherwise looks reasonable! I'll wait to see what the results look like from this.
Regression Test Results
Baseline: 0e68650 Explanation
A regression test is an integrated performance test for
The table below, if present, lists those experiments that have experienced a
statistically significant change in their No interesting changes in Fine details of change detection per experiment.
|
Signed-off-by: Brian L. Troutwine <brian.troutwine@datadoghq.com>
Regression Test Results
Baseline: 0e68650 Explanation
A regression test is an integrated performance test for
The table below, if present, lists those experiments that have experienced a
statistically significant change in their No interesting changes in Fine details of change detection per experiment.
|
Soak Test ResultsBaseline: 0e68650 ExplanationA soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core. The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed. No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%: Fine details of change detection per experiment.
|
@@ -36,6 +36,7 @@ address = "0.0.0.0:9090" | |||
type = "aws_s3" | |||
inputs = ["container_type.sidecar"] | |||
|
|||
region = "us-west-2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you find this to be needed too? Just trying to avoid the test cases falling out of sync. I think ideally we'd just have one set rather than this set in addition to soaks/tests
. Maybe we could just symlink the test case directory for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we run in a more restricted environment in the new setup with no AWS data for Vector to pick up. With regard to things falling out of sync, my hope is that the soak tests won't last much longer, TBH. Also if we did want to symlink we could do that but lading would need to be updated on the soak side.
This pull request updates the Regression workflow to analyze results from regression experiments using
smp
. To do so, it makes the following changes:smp
crate version from 0.3.0 to 0.5.0.smp job submit
andsmp job cancel
invocations.h/t and thanks go to @blt & @GeorgeHahn for making the setup process easy, and many thanks to y'all for your patience and for being early adopters!