-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdk-control-service: fix race condition in test #1227
Conversation
Signed-off-by: murphp15 <murphp15@tcd.ie>
I will try to provide a review by end of the day or Monday morning. But please fix the title of the PR to follow the recommended format: https://github.com/vmware/versatile-data-kit/blob/main/support/git-commit-template.txt (add component and move the link to the ticket somewhere else). That format is used later when making release documentation so it's kinda important. |
Done. |
...s_control_service/src/main/java/com/vmware/taurus/service/execution/JobExecutionService.java
Show resolved
Hide resolved
Good catch. |
..._control_service/src/test/java/com/vmware/taurus/datajobs/DataJobsExecutionControllerIT.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me. If you can explain the unit test.
… flaxey tests. Signed-off-by: murphp15 <murphp15@tcd.ie>
e022304
to
3254b21
Compare
Why?
There is an intermittent failure in the integration tests for the control plane.
The stacktrace can be found here #1129.
It happens because the job watcher sees a job in kubernetes before a row exists in the database.
So it creates a row from nothing instead of updating the existing row.
When the watcher creates a row from scratch it doesn't have the startedBy value.
Because deploying the job and updating the database can't be an atomic operation we need to make sure that the job isn't submitted unless the row exists in the database.
What
We now add the row into the database before deploying to k8s to prevent the race condition.
If there is a failure during the deployment we roll back the commit to the database.
How has this been tested?
I added a test to make sure that we rollback correctly.
I also added back in the assertion to check that the started by is set correctly.
What type of change are you making?
bug fix.
Signed-off-by: murphp15 murphp15@tcd.ie