Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: testground instability due to race condition on outcomes #1406

Conversation

laurentsenta
Copy link
Contributor

@laurentsenta laurentsenta commented Jul 27, 2022

When the docker runner starts a test run it does a few things:

  1. Wait for the instance containers to complete,
  2. Setup a "pretty printer" that forwards instance's stdout, BUT ALSO tries to decode events from stdout and update the outcomes,
  3. Listen to the events sent by test instances and update the outcomes on success.

When the containers exits (1) testground stops everything related to collecting the outcomes (3).
This is fine-ish with the go-sdk because it dumps events to stdout as well, so the outcomes will be read thanks to (2).

Logging events on stdout and trying to decode the JSON looked like a legacy hack: this was not implemented in the rust-sdk. See my notes on the pretty printer. The rust-sdk triggers a race condition:

  1. test instance sends "success event" to the sync service
  2. test instance exits
  3. Testgound shuts down the test (and the outcome collection part)
  4. Testground receives the "success event", but does not listen to it anymore.

I believe we've seen this error on go-sdk, less often, see this comment.

This PR rework the local docker runner so that a test is complete if and only if:

  • All the instances have exited AND all the outcomes have been received
  • OR the test timeout has been reached.

See runs:

Todo

  • Spike the patch,
  • Verify that the patch doesn't break other components,
  • Go through all other TODOs (dealing with errors, etc),
  • Double check the outcome outputs and the integration tests results,
  • Look into the testing situation.

@laurentsenta laurentsenta marked this pull request as draft July 27, 2022 16:29
@laurentsenta
Copy link
Contributor Author

Closing in favor of #1407 (moved to the testground repo to get CircleCI's feedbacks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant