Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QE: Add step to wait for salt jobs to finish on buildhost as part of cleanup #7082

Conversation

vandabarata
Copy link
Contributor

@vandabarata vandabarata commented Jun 1, 2023

What does this PR change?

One of the likely reasons for issues on the flaky tests on the buildhost, is that we're trying to do a build action while an image profile is still being deleted, as evidenced by the presence of this in the rhn_web_ui logs of the server

[salt-event-thread-6] WARN  com.suse.manager.utils.SaltUtils - Image Profile ID not found while performing: Image Build suse_real_key in handleImageBuildData
[salt-event-thread-6] WARN  com.suse.manager.utils.SaltUtils - Image Profile ID not found while performing: Image Build suse_real_key in handleImageBuildData

These happened after the supposed cleanup and image deletion was done, and we were already trying to build the auth_registry_profile image, in the feature that ran after this.
Looking at the code, it gives us a hint of what is going on

            Long imageProfileId = details.getImageProfileId();
            if (imageProfileId == null) { // It happens when the image profile is deleted during a build action
                LOG.error("Image Profile ID not found while performing: {} in handleImageBuildData", action.getName());
                return;
            }

This means that the image profile deletion wasn't done in time, likely due to performance issues. Instead of increasing the timeout, we are opting to wait for all salt jobs to be done on the buildhost before moving to the next features, to avoid issues from one feature to another. (Discussion can be seen in this slack thread)

Links

Changelogs

Make sure the changelogs entries you are adding are compliant with https://github.com/uyuni-project/uyuni/wiki/Contributing#changelogs and https://github.com/uyuni-project/uyuni/wiki/Contributing#uyuni-projectuyuni-repository

If you don't need a changelog check, please mark this checkbox:

  • No changelog needed

If you uncheck the checkbox after the PR is created, you will need to re-run changelog_test (see below)

Re-run a test

If you need to re-run a test, please mark the related checkbox, it will be unchecked automatically once it has re-run:

  • Re-run test "changelog_test"
  • Re-run test "backend_unittests_pgsql"
  • Re-run test "java_pgsql_tests"
  • Re-run test "schema_migration_test_pgsql"
  • Re-run test "susemanager_unittests"
  • Re-run test "javascript_lint"
  • Re-run test "spacecmd_unittests"

@vandabarata vandabarata self-assigned this Jun 1, 2023
@vandabarata vandabarata requested a review from a team as a code owner June 1, 2023 15:35
@vandabarata vandabarata force-pushed the qe-vb-improve-idempotency-buildhost-tests branch from 6735ea7 to 411f999 Compare June 1, 2023 15:50
@vandabarata vandabarata changed the title QE: Add step to stop all salt jobs on buildhost as part of cleanup QE: Add step to wait for salt jobs to finish on buildhost as part of cleanup Jun 1, 2023
Copy link
Member

@nodeg nodeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Thank you Vanda. LGTM.

@srbarrios
Copy link
Member

I think we need to find a way to identify exactly the Salt job we want to monitor, not just all the Salt jobs.
If we wait for all of them, that will fit on a sequential pass of all the tests, but that will not work when we try to run multiple tests in parallel even if there is no relation between the components in use and it can work at same time in a real use case.

@vandabarata
Copy link
Contributor Author

I think we need to find a way to identify exactly the Salt job we want to monitor, not just all the Salt jobs. If we wait for all of them, that will fit on a sequential pass of all the tests, but that will not work when we try to run multiple tests in parallel even if there is no relation between the components in use and it can work at same time in a real use case.

but do we use the buildhost for anything else other than this purpose? I think it's fine to do it like this in this case exactly because the buildhost only has one purpose 🤔

@nodeg
Copy link
Member

nodeg commented Jun 2, 2023

I think we need to find a way to identify exactly the Salt job we want to monitor, not just all the Salt jobs. If we wait for all of them, that will fit on a sequential pass of all the tests, but that will not work when we try to run multiple tests in parallel even if there is no relation between the components in use and it can work at same time in a real use case.

Couldn't we take care of all the parallelization tasks and tests in a future PR? I mean we still do a lot of sequential things and this one here clearly does not work as it should. So for now we have a working solution although it is not yet future proof with respect to more parallelization. Furthermore I think there are not that much other things running (in parallel) on the buildhost, yet.

@srbarrios
Copy link
Member

srbarrios commented Jun 2, 2023

@nodeg @vandabarata We has been always postponing coding our tests thinking in parallel tests, that's the main reason we still not support it. And trying to do all in once is impossible, I already tried, and that's an Odissey.
That's why I think that if we have the chance to code everything we do in a way that will be more precise, and not go the "all jobs, all processes, all whatever" strategy.

@mcalmer Gave a smarter approach in the thread that Vanda shared in the description, where you can know the state of salt jobs, and we can even try to identify the correct id for that salt job.

So, if we can do it better today... why to wait doing better next year?

 salt-run jobs.last_run metadata='{"suma-action-id": 71}'

@vandabarata vandabarata force-pushed the qe-vb-improve-idempotency-buildhost-tests branch from 411f999 to 59f3047 Compare June 2, 2023 09:39
@vandabarata vandabarata merged commit 268ea8a into uyuni-project:master Jun 2, 2023
7 checks passed
@vandabarata vandabarata deleted the qe-vb-improve-idempotency-buildhost-tests branch June 2, 2023 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants