Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-control-service: fix race condition in test #1227

Merged
merged 5 commits into from
Oct 17, 2022

Conversation

murphp15
Copy link
Collaborator

@murphp15 murphp15 commented Oct 7, 2022

Why?

There is an intermittent failure in the integration tests for the control plane.
The stacktrace can be found here #1129.
It happens because the job watcher sees a job in kubernetes before a row exists in the database.
So it creates a row from nothing instead of updating the existing row.
When the watcher creates a row from scratch it doesn't have the startedBy value.
Because deploying the job and updating the database can't be an atomic operation we need to make sure that the job isn't submitted unless the row exists in the database.

What

We now add the row into the database before deploying to k8s to prevent the race condition.
If there is a failure during the deployment we roll back the commit to the database.

How has this been tested?

I added a test to make sure that we rollback correctly.
I also added back in the assertion to check that the started by is set correctly.

What type of change are you making?

bug fix.

Signed-off-by: murphp15 murphp15@tcd.ie

murphp15 and others added 2 commits October 7, 2022 10:24
Signed-off-by: murphp15 <murphp15@tcd.ie>
@antoniivanov
Copy link
Collaborator

I will try to provide a review by end of the day or Monday morning.

But please fix the title of the PR to follow the recommended format: https://github.com/vmware/versatile-data-kit/blob/main/support/git-commit-template.txt (add component and move the link to the ticket somewhere else).

That format is used later when making release documentation so it's kinda important.

@murphp15 murphp15 changed the title fix race condition in test. closes: https://github.com/vmware/versatile-data-kit/issues/1129 vdk-control-service: fix race condition in test Oct 7, 2022
@murphp15
Copy link
Collaborator Author

murphp15 commented Oct 7, 2022

But please fix the title of the PR to follow the recommended format: https://github.com/vmware/versatile-data-kit/blob/main/support/git-commit-template.txt (add component and move the link to the ticket somewhere else).

Done.

@mivanov1988
Copy link
Contributor

Good catch.

@antoniivanov antoniivanov linked an issue Oct 10, 2022 that may be closed by this pull request
Copy link
Collaborator

@antoniivanov antoniivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me. If you can explain the unit test.

… flaxey tests.

Signed-off-by: murphp15 <murphp15@tcd.ie>
@murphp15 murphp15 enabled auto-merge (squash) October 10, 2022 09:27
@murphp15 murphp15 force-pushed the person/murphp15/fix_race_condition_in_test branch from e022304 to 3254b21 Compare October 17, 2022 10:28
@murphp15 murphp15 enabled auto-merge (squash) October 17, 2022 10:28
@murphp15 murphp15 merged commit 60eb9ea into main Oct 17, 2022
@murphp15 murphp15 deleted the person/murphp15/fix_race_condition_in_test branch October 17, 2022 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

control-service: Investigate flaky integration test
4 participants