-
Notifications
You must be signed in to change notification settings - Fork 174
Integration Test: upgrade Integrations Server in ECH #8417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration Test: upgrade Integrations Server in ECH #8417
Conversation
This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
|
0308d56
to
6ad7a18
Compare
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
What changes in a PR of ours would cause an integrations server upgrade to fail? We have broken integrations server starting up plenty of times but I've never seen us break upgrading it, which I think is just replacing the container version. Something should test this, but we may want it to happen on a schedule. This should probably just be driven by snapshots? |
Yeah, good point, there's no need to run this test on every PR or even every commit to |
This makes a lot of sense if we can set it up. I also suspect that there might not be much of a difference between Integrations server starting up for the first time and Integrations server upgrading in terms of what needs to work for it to succeed. TBH I don't know enough about how it works. Elastic Agent is stateless with no data dependencies, Fleet Server does depend on some Elasticsearch indices but it doesn't own the index templates or mappings for them (Fleet does). If we somehow broke the .fleet-* system indices I think Integrations Server would still upgrade just fine and Fleet wouldn't work. |
Hello folks, sorry for chiming in but I like what you discuss here 🙂 So, the PRs that bump the .upgrade-test-agent-versions.yml run daily and an automated PR opens against each active branch, given that there are actual changes. Maybe we could introduce a PR label that such PRs will have?! In this way, if we wanna test this under a normal PR we could just add the label and have it tested, wdyt? 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed only CI part, lgtm
Thanks for chiming in, @pkoutsovasilis
I'm not sure we will need to run this test on normal PRs (see #8417 (comment) and #8417 (comment)) but, sure, your label idea would give us the option to do that if/when we wanted. Adding that special label on the PRs that bump the .upgrade-test-agent-versions.yml so that our BK pipeline could check for it's existence (how is TBD), and then, if it exists, run this test, makes sense to me. So now the next question is: how can our BK pipeline get access to labels for the PR it's running CI for? Is this something we already know how to do? Maybe @v1v @oakrizan @pazone could shed some light here? |
We can access the labels in the pipelines, eg. https://github.com/elastic/beats/blob/main/.buildkite%2Fauditbeat%2Fauditbeat-pipeline.yml#L266 |
Thanks @oakrizan for the quick response and link! I'll update this PR with changes to the BK pipeline to conditionally run Now to decide what to name our special label. 🙂 We have a |
This pull request is now in conflicts. Could you fix it? 🙏
|
81654a1
to
895d397
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
1 similar comment
This pull request is now in conflicts. Could you fix it? 🙏
|
fedd6cb
to
6353428
Compare
d097b07
to
97ebfc3
Compare
PR for cleaning up these env vars: #8710 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for ci part
58b8f29
to
5928603
Compare
|
💚 Build Succeeded
History
cc @ycombinator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for ci part
* WIP: Implementing UpgradeDeployment method * Adding AvailableVersions method to StatefulProvisioner * WIP: adding test for upgrading integration server in ECH * Ensure test runs against FRH ECH region * WIP * WIP * Add test for generating upgrade deployment request * Flesh out test some more * Fixing filename in reference * Make DeploymentStatus contain status and health * Cleanup deployment after test * Add error handling * Make linter happy * Fix typo * Fix check * Adding logging * Filter by min/max versions * Fix call * Running non-sudo upgrade group tests * Running upgrade FIPS tests * Remove non-sudo upgrade group from regular integration tests * Fix env var for ECH region and default it to gcp-us-west2 * Use env var for ECH API key * Run ECH deployment upgrade test if PR label is present * Fix missing key check * Add label on PR that bumps integration test versions * Export EC_API_KEY * WIP: debugging * Append query string to URL after joining paths * Fix up logic in doGet to construct URL without escaping ? * WIP: Remove some debug logging * Add bounds check * Fix start index math * Removing debug logs * WIP: don't clean up deployment so we can investigate * Add doPut method on client * Remove debugging statements * Fix build tag * Change group for testing in PR * Cleanup * Add comment * Cleanup * Update deployment version after upgrade * Test group in PR * Fix unit test * Undo unintentional changes (cherry picked from commit 4344a11) # Conflicts: # .ci/updatecli/updatecli-bump-stack-version.yml # .mergify.yml # test_infra/ess/readme.md
… in ECH (#8725) * Integration Test: upgrade Integrations Server in ECH (#8417) * WIP: Implementing UpgradeDeployment method * Adding AvailableVersions method to StatefulProvisioner * WIP: adding test for upgrading integration server in ECH * Ensure test runs against FRH ECH region * WIP * WIP * Add test for generating upgrade deployment request * Flesh out test some more * Fixing filename in reference * Make DeploymentStatus contain status and health * Cleanup deployment after test * Add error handling * Make linter happy * Fix typo * Fix check * Adding logging * Filter by min/max versions * Fix call * Running non-sudo upgrade group tests * Running upgrade FIPS tests * Remove non-sudo upgrade group from regular integration tests * Fix env var for ECH region and default it to gcp-us-west2 * Use env var for ECH API key * Run ECH deployment upgrade test if PR label is present * Fix missing key check * Add label on PR that bumps integration test versions * Export EC_API_KEY * WIP: debugging * Append query string to URL after joining paths * Fix up logic in doGet to construct URL without escaping ? * WIP: Remove some debug logging * Add bounds check * Fix start index math * Removing debug logs * WIP: don't clean up deployment so we can investigate * Add doPut method on client * Remove debugging statements * Fix build tag * Change group for testing in PR * Cleanup * Add comment * Cleanup * Update deployment version after upgrade * Test group in PR * Fix unit test * Undo unintentional changes (cherry picked from commit 4344a11) # Conflicts: # .ci/updatecli/updatecli-bump-stack-version.yml # .mergify.yml # test_infra/ess/readme.md * Fixing conflicts --------- Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com>
* WIP: Implementing UpgradeDeployment method * Adding AvailableVersions method to StatefulProvisioner * WIP: adding test for upgrading integration server in ECH * Ensure test runs against FRH ECH region * WIP * WIP * Add test for generating upgrade deployment request * Flesh out test some more * Fixing filename in reference * Make DeploymentStatus contain status and health * Cleanup deployment after test * Add error handling * Make linter happy * Fix typo * Fix check * Adding logging * Filter by min/max versions * Fix call * Running non-sudo upgrade group tests * Running upgrade FIPS tests * Remove non-sudo upgrade group from regular integration tests * Fix env var for ECH region and default it to gcp-us-west2 * Use env var for ECH API key * Run ECH deployment upgrade test if PR label is present * Fix missing key check * Add label on PR that bumps integration test versions * Export EC_API_KEY * WIP: debugging * Append query string to URL after joining paths * Fix up logic in doGet to construct URL without escaping ? * WIP: Remove some debug logging * Add bounds check * Fix start index math * Removing debug logs * WIP: don't clean up deployment so we can investigate * Add doPut method on client * Remove debugging statements * Fix build tag * Change group for testing in PR * Cleanup * Add comment * Cleanup * Update deployment version after upgrade * Test group in PR * Fix unit test * Undo unintentional changes (cherry picked from commit 4344a11)
* WIP: Implementing UpgradeDeployment method * Adding AvailableVersions method to StatefulProvisioner * WIP: adding test for upgrading integration server in ECH * Ensure test runs against FRH ECH region * WIP * WIP * Add test for generating upgrade deployment request * Flesh out test some more * Fixing filename in reference * Make DeploymentStatus contain status and health * Cleanup deployment after test * Add error handling * Make linter happy * Fix typo * Fix check * Adding logging * Filter by min/max versions * Fix call * Running non-sudo upgrade group tests * Running upgrade FIPS tests * Remove non-sudo upgrade group from regular integration tests * Fix env var for ECH region and default it to gcp-us-west2 * Use env var for ECH API key * Run ECH deployment upgrade test if PR label is present * Fix missing key check * Add label on PR that bumps integration test versions * Export EC_API_KEY * WIP: debugging * Append query string to URL after joining paths * Fix up logic in doGet to construct URL without escaping ? * WIP: Remove some debug logging * Add bounds check * Fix start index math * Removing debug logs * WIP: don't clean up deployment so we can investigate * Add doPut method on client * Remove debugging statements * Fix build tag * Change group for testing in PR * Cleanup * Add comment * Cleanup * Update deployment version after upgrade * Test group in PR * Fix unit test * Undo unintentional changes (cherry picked from commit 4344a11)
…in ECH (#8745) * Integration Test: upgrade Integrations Server in ECH (#8417) * WIP: Implementing UpgradeDeployment method * Adding AvailableVersions method to StatefulProvisioner * WIP: adding test for upgrading integration server in ECH * Ensure test runs against FRH ECH region * WIP * WIP * Add test for generating upgrade deployment request * Flesh out test some more * Fixing filename in reference * Make DeploymentStatus contain status and health * Cleanup deployment after test * Add error handling * Make linter happy * Fix typo * Fix check * Adding logging * Filter by min/max versions * Fix call * Running non-sudo upgrade group tests * Running upgrade FIPS tests * Remove non-sudo upgrade group from regular integration tests * Fix env var for ECH region and default it to gcp-us-west2 * Use env var for ECH API key * Run ECH deployment upgrade test if PR label is present * Fix missing key check * Add label on PR that bumps integration test versions * Export EC_API_KEY * WIP: debugging * Append query string to URL after joining paths * Fix up logic in doGet to construct URL without escaping ? * WIP: Remove some debug logging * Add bounds check * Fix start index math * Removing debug logs * WIP: don't clean up deployment so we can investigate * Add doPut method on client * Remove debugging statements * Fix build tag * Change group for testing in PR * Cleanup * Add comment * Cleanup * Update deployment version after upgrade * Test group in PR * Fix unit test * Undo unintentional changes (cherry picked from commit 4344a11) * ci: revert deployment_csp_configuration.yaml to create_deployment_csp_configuration.yaml (#8746) (cherry picked from commit 2252fdd) --------- Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com> Co-authored-by: Panos Koutsovasilis <panos.koutsovasilis@elastic.co>
What does this PR do?
This PR adds an integration test,
TestUpgradeIntegrationsServer
, that spins up an ECH deployment that's at least of version8.19.0
and upgrades it to a later, randomly-chosen version available in that deployment's ECH region.The test will be run with FIPS-capable artifacts against the FRH ECH environment (when it's ready). The test will not run on every commit but only on PRs that have a special label on them,
Testing:run:TestUpgradeIntegrationsServer
. This label will be automatically added to PRs that bump the https://github.com/elastic/elastic-agent/blob/main/testing/integration/testdata/.upgrade-test-agent-versions.yml file, thereby ensuring the test runs whenever new versions of the stack are available for testing.Why is it important?
To ensure that Integrations Server in ECH (which runs Elastic Agent as a Fleet Server) can be upgraded without problems.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesI have added an entry in./changelog/fragments
using the changelog toolDisruptive User Impact
None; this PR adds a new integration test.