-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swap order of starting up RELAY/SOCKET processes in multi-master tests #66
Conversation
Hm, I put the tests in this order in 96b6712 on purpose. Maybe we need a second set of tests that boots the servers in the other order? |
|
The only tricky thing with checking the exit code of both processes is that you don't know which master will run the failing test. If the central master runs the failing test, then the remote master will exit 0. If the remote master runs the failing test, the remote master will exit 1. |
f54c406
to
e17718f
Compare
Modified to test in both orders. |
export TEST_QUEUE_RELAY_TOKEN=$(date | cksum | cut -d' ' -f1) | ||
TEST_QUEUE_RELAY=0.0.0.0:12345 bundle exec minitest-queue ./test/samples/sample_minitest5.rb || true & | ||
TEST_QUEUE_RELAY=0.0.0.0:12345 bundle exec minitest-queue ./test/samples/sample_minitest5.rb 2>&1 | tee /tmp/out.txt || true & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to leave this debug output in?
TEST_QUEUE_RELAY=0.0.0.0:12345 run bundle exec minitest-queue ./test/samples/sample_minitest5.rb | ||
wait | ||
|
||
assert_status 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will sometimes fail due to the issue I mentioned. There's no guarantee that the remote master will run the failing test. If it doesn't, its exit code will be 0 (and it won't contain any failure output). We need a way to ensure that the right master fails. Something like this might work inside a test:
def test_foo
whoami =
if ENV["TEST_QUEUE_SOCKET"]
"central"
elsif ENV["TEST_QUEUE_RELAY"]
"remote"
end
refute ENV["MULTI_MASTER_FAIL"] == whoami
end
I.e., if you set MULTI_MASTER_FAIL=central
, the central master will fail. If you set MULTI_MASTER_FAIL=remote
, the remote master will fail. If you put this new test inside the MiniTestSleep
suites then it will be highly likely to run in both masters and fail or pass appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ben suggested an alternate approach, sleeping the order of the masters to induce the desired one to pick up the test. Pushed that version.
6ae459d
to
87a836b
Compare
sleep 5 | ||
elsif ENV['SLEEP_AS_MASTER'] && !relay? | ||
sleep 5 | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will guarantee that a particular master's workers will run the failing suite. around_filter
gets called inside each worker just after receiving the suite to run, so it seems like workers will be handed an initial suite to run and then will sleep. I.e., imagine the following execution when we're trying to get the remote master to run the failing suite:
SLEEP_AS_MASTER=1
- Central master boots and spawns workers
- Central master hands the failing suite to one of its own workers
- Central master's workers sleep
- Remote master boots and connects to central master
- Central master hands all the rest of the suites to the remote master's workers
- Remote master's workers run those suites and remote master exits with success
- Central master's workers wake up and run their suites, including the failing one
For the sleeping to be effective I think it needs to happen before suites are assigned, i.e., before the workers send their first POP
messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sleeping to be effective I think it needs to happen before suites are assigned, i.e., before the workers send their first POP messages.
Nuts, this makes total sense. My bad.
Do we need to make changes to the runner itself to give us a place to inject this behavior for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the sleep in Runner#after_fork
should do the trick I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. 😅 Swapped.
87a836b
to
303d356
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Thank you!
CI is red, but that is what we want. #65 will fix it. |
The tests should fail here since they're missing the fix from #65. cc @aroben