Skip to content

Integration Test: upgrade Integrations Server in ECH #8417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Jun 9, 2025

What does this PR do?

This PR adds an integration test, TestUpgradeIntegrationsServer, that spins up an ECH deployment that's at least of version 8.19.0 and upgrades it to a later, randomly-chosen version available in that deployment's ECH region.

The test will be run with FIPS-capable artifacts against the FRH ECH environment (when it's ready). The test will not run on every commit but only on PRs that have a special label on them, Testing:run:TestUpgradeIntegrationsServer. This label will be automatically added to PRs that bump the https://github.com/elastic/elastic-agent/blob/main/testing/integration/testdata/.upgrade-test-agent-versions.yml file, thereby ensuring the test runs whenever new versions of the stack are available for testing.

Why is it important?

To ensure that Integrations Server in ECH (which runs Elastic Agent as a Fleet Server) can be upgraded without problems.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

None; this PR adds a new integration test.

Copy link
Contributor

mergify bot commented Jun 9, 2025

This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@ycombinator ycombinator added skip-changelog backport-8.19 Automated backport to the 8.19 branch labels Jun 9, 2025
@ycombinator ycombinator force-pushed the it-fips-upgrade-integrations-server branch from 0308d56 to 6ad7a18 Compare June 10, 2025 16:38
@ycombinator ycombinator marked this pull request as ready for review June 10, 2025 16:38
@ycombinator ycombinator requested a review from a team as a code owner June 10, 2025 16:38
@ycombinator ycombinator added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jun 10, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@ycombinator ycombinator requested review from a team as code owners June 10, 2025 23:33
@cmacknz
Copy link
Member

cmacknz commented Jun 11, 2025

To ensure that Integrations Server in ECH (which runs Elastic Agent as a Fleet Server) can be upgraded without problems.

What changes in a PR of ours would cause an integrations server upgrade to fail? We have broken integrations server starting up plenty of times but I've never seen us break upgrading it, which I think is just replacing the container version.

Something should test this, but we may want it to happen on a schedule. This should probably just be driven by snapshots?

@ycombinator
Copy link
Contributor Author

To ensure that Integrations Server in ECH (which runs Elastic Agent as a Fleet Server) can be upgraded without problems.

What changes in a PR of ours would cause an integrations server upgrade to fail? We have broken integrations server starting up plenty of times but I've never seen us break upgrading it, which I think is just replacing the container version.

Something should test this, but we may want it to happen on a schedule. This should probably just be driven by snapshots?

Yeah, good point, there's no need to run this test on every PR or even every commit to main. Perhaps we could run it as part of the same automation that updates https://github.com/elastic/elastic-agent/blob/main/testing/integration/testdata/.upgrade-test-agent-versions.yml?

@cmacknz
Copy link
Member

cmacknz commented Jun 12, 2025

Yeah, good point, there's no need to run this test on every PR or even every commit to main. Perhaps we could run it as part of the same automation that updates https://github.com/elastic/elastic-agent/blob/main/testing/integration/testdata/.upgrade-test-agent-versions.yml?

This makes a lot of sense if we can set it up.

I also suspect that there might not be much of a difference between Integrations server starting up for the first time and Integrations server upgrading in terms of what needs to work for it to succeed. TBH I don't know enough about how it works.

Elastic Agent is stateless with no data dependencies, Fleet Server does depend on some Elasticsearch indices but it doesn't own the index templates or mappings for them (Fleet does). If we somehow broke the .fleet-* system indices I think Integrations Server would still upgrade just fine and Fleet wouldn't work.

@pkoutsovasilis
Copy link
Contributor

Hello folks, sorry for chiming in but I like what you discuss here 🙂 So, the PRs that bump the .upgrade-test-agent-versions.yml run daily and an automated PR opens against each active branch, given that there are actual changes. Maybe we could introduce a PR label that such PRs will have?! In this way, if we wanna test this under a normal PR we could just add the label and have it tested, wdyt? 🙂

oakrizan
oakrizan previously approved these changes Jun 13, 2025
Copy link
Contributor

@oakrizan oakrizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed only CI part, lgtm

@ycombinator
Copy link
Contributor Author

Thanks for chiming in, @pkoutsovasilis

Maybe we could introduce a PR label that such PRs will have?! In this way, if we wanna test this under a normal PR we could just add the label and have it tested, wdyt? 🙂

I'm not sure we will need to run this test on normal PRs (see #8417 (comment) and #8417 (comment)) but, sure, your label idea would give us the option to do that if/when we wanted. Adding that special label on the PRs that bump the .upgrade-test-agent-versions.yml so that our BK pipeline could check for it's existence (how is TBD), and then, if it exists, run this test, makes sense to me. So now the next question is: how can our BK pipeline get access to labels for the PR it's running CI for? Is this something we already know how to do? Maybe @v1v @oakrizan @pazone could shed some light here?

@oakrizan
Copy link
Contributor

oakrizan commented Jun 13, 2025

how can our BK pipeline get access to labels for the PR it's running CI for

We can access the labels in the pipelines, eg. https://github.com/elastic/beats/blob/main/.buildkite%2Fauditbeat%2Fauditbeat-pipeline.yml#L266

@ycombinator
Copy link
Contributor Author

ycombinator commented Jun 13, 2025

Thanks @oakrizan for the quick response and link! I'll update this PR with changes to the BK pipeline to conditionally run TestUpgradeIntegrationsServer only when the special label is present and to also add the special label automatically on PRs that update .upgrade-test-agent-versions.yml.

Now to decide what to name our special label. 🙂

We have a Testing label in this repo already. How about "extending" it to indicate we want CI to run the test in this PR, i.e. we introduce a new label named Testing:run:TestUpgradeIntegrationsServer? Too verbose? I'm open to other suggestions.

Copy link
Contributor

mergify bot commented Jun 13, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b it-fips-upgrade-integrations-server upstream/it-fips-upgrade-integrations-server
git merge upstream/main
git push upstream it-fips-upgrade-integrations-server

@ycombinator ycombinator force-pushed the it-fips-upgrade-integrations-server branch 2 times, most recently from 81654a1 to 895d397 Compare June 17, 2025 02:43
Copy link
Contributor

mergify bot commented Jun 17, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b it-fips-upgrade-integrations-server upstream/it-fips-upgrade-integrations-server
git merge upstream/main
git push upstream it-fips-upgrade-integrations-server

1 similar comment
Copy link
Contributor

mergify bot commented Jun 17, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b it-fips-upgrade-integrations-server upstream/it-fips-upgrade-integrations-server
git merge upstream/main
git push upstream it-fips-upgrade-integrations-server

@ycombinator ycombinator force-pushed the it-fips-upgrade-integrations-server branch 2 times, most recently from fedd6cb to 6353428 Compare June 17, 2025 20:31
@ycombinator ycombinator force-pushed the it-fips-upgrade-integrations-server branch from d097b07 to 97ebfc3 Compare June 27, 2025 10:13
@ycombinator
Copy link
Contributor Author

... since we have to have one either way to remove redundant env vars from FIPS-related CI steps

PR for cleaning up these env vars: #8710

oakrizan
oakrizan previously approved these changes Jun 27, 2025
Copy link
Contributor

@oakrizan oakrizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for ci part

Copy link

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @ycombinator

Copy link
Contributor

@pkoutsovasilis pkoutsovasilis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🙂

Copy link
Contributor

@oakrizan oakrizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for ci part

@ycombinator ycombinator merged commit 4344a11 into elastic:main Jun 27, 2025
20 checks passed
@ycombinator ycombinator deleted the it-fips-upgrade-integrations-server branch June 27, 2025 20:42
mergify bot pushed a commit that referenced this pull request Jun 27, 2025
* WIP: Implementing UpgradeDeployment method

* Adding AvailableVersions method to StatefulProvisioner

* WIP: adding test for upgrading integration server in ECH

* Ensure test runs against FRH ECH region

* WIP

* WIP

* Add test for generating upgrade deployment request

* Flesh out test some more

* Fixing filename in reference

* Make DeploymentStatus contain status and health

* Cleanup deployment after test

* Add error handling

* Make linter happy

* Fix typo

* Fix check

* Adding logging

* Filter by min/max versions

* Fix call

* Running non-sudo upgrade group tests

* Running upgrade FIPS tests

* Remove non-sudo upgrade group from regular integration tests

* Fix env var for ECH region and default it to gcp-us-west2

* Use env var for ECH API key

* Run ECH deployment upgrade test if PR label is present

* Fix missing key check

* Add label on PR that bumps integration test versions

* Export EC_API_KEY

* WIP: debugging

* Append query string to URL after joining paths

* Fix up logic in doGet to construct URL without escaping ?

* WIP: Remove some debug logging

* Add bounds check

* Fix start index math

* Removing debug logs

* WIP: don't clean up deployment so we can investigate

* Add doPut method on client

* Remove debugging statements

* Fix build tag

* Change group for testing in PR

* Cleanup

* Add comment

* Cleanup

* Update deployment version after upgrade

* Test group in PR

* Fix unit test

* Undo unintentional changes

(cherry picked from commit 4344a11)

# Conflicts:
#	.ci/updatecli/updatecli-bump-stack-version.yml
#	.mergify.yml
#	test_infra/ess/readme.md
ycombinator added a commit that referenced this pull request Jun 27, 2025
… in ECH (#8725)

* Integration Test: upgrade Integrations Server in ECH (#8417)

* WIP: Implementing UpgradeDeployment method

* Adding AvailableVersions method to StatefulProvisioner

* WIP: adding test for upgrading integration server in ECH

* Ensure test runs against FRH ECH region

* WIP

* WIP

* Add test for generating upgrade deployment request

* Flesh out test some more

* Fixing filename in reference

* Make DeploymentStatus contain status and health

* Cleanup deployment after test

* Add error handling

* Make linter happy

* Fix typo

* Fix check

* Adding logging

* Filter by min/max versions

* Fix call

* Running non-sudo upgrade group tests

* Running upgrade FIPS tests

* Remove non-sudo upgrade group from regular integration tests

* Fix env var for ECH region and default it to gcp-us-west2

* Use env var for ECH API key

* Run ECH deployment upgrade test if PR label is present

* Fix missing key check

* Add label on PR that bumps integration test versions

* Export EC_API_KEY

* WIP: debugging

* Append query string to URL after joining paths

* Fix up logic in doGet to construct URL without escaping ?

* WIP: Remove some debug logging

* Add bounds check

* Fix start index math

* Removing debug logs

* WIP: don't clean up deployment so we can investigate

* Add doPut method on client

* Remove debugging statements

* Fix build tag

* Change group for testing in PR

* Cleanup

* Add comment

* Cleanup

* Update deployment version after upgrade

* Test group in PR

* Fix unit test

* Undo unintentional changes

(cherry picked from commit 4344a11)

# Conflicts:
#	.ci/updatecli/updatecli-bump-stack-version.yml
#	.mergify.yml
#	test_infra/ess/readme.md

* Fixing conflicts

---------

Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com>
@pkoutsovasilis pkoutsovasilis added the backport-9.1 Automated backport to the 9.1 branch label Jun 30, 2025
mergify bot pushed a commit that referenced this pull request Jun 30, 2025
* WIP: Implementing UpgradeDeployment method

* Adding AvailableVersions method to StatefulProvisioner

* WIP: adding test for upgrading integration server in ECH

* Ensure test runs against FRH ECH region

* WIP

* WIP

* Add test for generating upgrade deployment request

* Flesh out test some more

* Fixing filename in reference

* Make DeploymentStatus contain status and health

* Cleanup deployment after test

* Add error handling

* Make linter happy

* Fix typo

* Fix check

* Adding logging

* Filter by min/max versions

* Fix call

* Running non-sudo upgrade group tests

* Running upgrade FIPS tests

* Remove non-sudo upgrade group from regular integration tests

* Fix env var for ECH region and default it to gcp-us-west2

* Use env var for ECH API key

* Run ECH deployment upgrade test if PR label is present

* Fix missing key check

* Add label on PR that bumps integration test versions

* Export EC_API_KEY

* WIP: debugging

* Append query string to URL after joining paths

* Fix up logic in doGet to construct URL without escaping ?

* WIP: Remove some debug logging

* Add bounds check

* Fix start index math

* Removing debug logs

* WIP: don't clean up deployment so we can investigate

* Add doPut method on client

* Remove debugging statements

* Fix build tag

* Change group for testing in PR

* Cleanup

* Add comment

* Cleanup

* Update deployment version after upgrade

* Test group in PR

* Fix unit test

* Undo unintentional changes

(cherry picked from commit 4344a11)
ycombinator added a commit that referenced this pull request Jul 1, 2025
* WIP: Implementing UpgradeDeployment method

* Adding AvailableVersions method to StatefulProvisioner

* WIP: adding test for upgrading integration server in ECH

* Ensure test runs against FRH ECH region

* WIP

* WIP

* Add test for generating upgrade deployment request

* Flesh out test some more

* Fixing filename in reference

* Make DeploymentStatus contain status and health

* Cleanup deployment after test

* Add error handling

* Make linter happy

* Fix typo

* Fix check

* Adding logging

* Filter by min/max versions

* Fix call

* Running non-sudo upgrade group tests

* Running upgrade FIPS tests

* Remove non-sudo upgrade group from regular integration tests

* Fix env var for ECH region and default it to gcp-us-west2

* Use env var for ECH API key

* Run ECH deployment upgrade test if PR label is present

* Fix missing key check

* Add label on PR that bumps integration test versions

* Export EC_API_KEY

* WIP: debugging

* Append query string to URL after joining paths

* Fix up logic in doGet to construct URL without escaping ?

* WIP: Remove some debug logging

* Add bounds check

* Fix start index math

* Removing debug logs

* WIP: don't clean up deployment so we can investigate

* Add doPut method on client

* Remove debugging statements

* Fix build tag

* Change group for testing in PR

* Cleanup

* Add comment

* Cleanup

* Update deployment version after upgrade

* Test group in PR

* Fix unit test

* Undo unintentional changes

(cherry picked from commit 4344a11)
ycombinator added a commit that referenced this pull request Jul 2, 2025
…in ECH (#8745)

* Integration Test: upgrade Integrations Server in ECH (#8417)

* WIP: Implementing UpgradeDeployment method

* Adding AvailableVersions method to StatefulProvisioner

* WIP: adding test for upgrading integration server in ECH

* Ensure test runs against FRH ECH region

* WIP

* WIP

* Add test for generating upgrade deployment request

* Flesh out test some more

* Fixing filename in reference

* Make DeploymentStatus contain status and health

* Cleanup deployment after test

* Add error handling

* Make linter happy

* Fix typo

* Fix check

* Adding logging

* Filter by min/max versions

* Fix call

* Running non-sudo upgrade group tests

* Running upgrade FIPS tests

* Remove non-sudo upgrade group from regular integration tests

* Fix env var for ECH region and default it to gcp-us-west2

* Use env var for ECH API key

* Run ECH deployment upgrade test if PR label is present

* Fix missing key check

* Add label on PR that bumps integration test versions

* Export EC_API_KEY

* WIP: debugging

* Append query string to URL after joining paths

* Fix up logic in doGet to construct URL without escaping ?

* WIP: Remove some debug logging

* Add bounds check

* Fix start index math

* Removing debug logs

* WIP: don't clean up deployment so we can investigate

* Add doPut method on client

* Remove debugging statements

* Fix build tag

* Change group for testing in PR

* Cleanup

* Add comment

* Cleanup

* Update deployment version after upgrade

* Test group in PR

* Fix unit test

* Undo unintentional changes

(cherry picked from commit 4344a11)

* ci: revert deployment_csp_configuration.yaml to create_deployment_csp_configuration.yaml (#8746)

(cherry picked from commit 2252fdd)

---------

Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com>
Co-authored-by: Panos Koutsovasilis <panos.koutsovasilis@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.19 Automated backport to the 8.19 branch backport-9.1 Automated backport to the 9.1 branch skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Testing:run:TestUpgradeIntegrationsServer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants