Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap order of starting up RELAY/SOCKET processes in multi-master tests #66

Merged
merged 1 commit into from
Dec 13, 2016

Conversation

mistydemeo
Copy link
Contributor

The tests should fail here since they're missing the fix from #65. cc @aroben

@aroben
Copy link
Collaborator

aroben commented Dec 8, 2016

Hm, I put the tests in this order in 96b6712 on purpose. Maybe we need a second set of tests that boots the servers in the other order?

@mistydemeo
Copy link
Contributor Author

mistydemeo commented Dec 8, 2016

After looking at it more in #65 (comment), we probably want to make sure we're checking the exit status/stdout of both processes? Sorry, that's exactly what the commit you linked addressed. So, yeah, we might want to add extra tests to cover both cases.

@aroben
Copy link
Collaborator

aroben commented Dec 8, 2016

The only tricky thing with checking the exit code of both processes is that you don't know which master will run the failing test. If the central master runs the failing test, then the remote master will exit 0. If the remote master runs the failing test, the remote master will exit 1.

@mistydemeo
Copy link
Contributor Author

Modified to test in both orders.

export TEST_QUEUE_RELAY_TOKEN=$(date | cksum | cut -d' ' -f1)
TEST_QUEUE_RELAY=0.0.0.0:12345 bundle exec minitest-queue ./test/samples/sample_minitest5.rb || true &
TEST_QUEUE_RELAY=0.0.0.0:12345 bundle exec minitest-queue ./test/samples/sample_minitest5.rb 2>&1 | tee /tmp/out.txt || true &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to leave this debug output in?

TEST_QUEUE_RELAY=0.0.0.0:12345 run bundle exec minitest-queue ./test/samples/sample_minitest5.rb
wait

assert_status 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will sometimes fail due to the issue I mentioned. There's no guarantee that the remote master will run the failing test. If it doesn't, its exit code will be 0 (and it won't contain any failure output). We need a way to ensure that the right master fails. Something like this might work inside a test:

def test_foo
  whoami =
    if ENV["TEST_QUEUE_SOCKET"]
      "central"
    elsif ENV["TEST_QUEUE_RELAY"]
      "remote"
    end
  refute ENV["MULTI_MASTER_FAIL"] == whoami
end

I.e., if you set MULTI_MASTER_FAIL=central, the central master will fail. If you set MULTI_MASTER_FAIL=remote, the remote master will fail. If you put this new test inside the MiniTestSleep suites then it will be highly likely to run in both masters and fail or pass appropriately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ben suggested an alternate approach, sleeping the order of the masters to induce the desired one to pick up the test. Pushed that version.

@mistydemeo mistydemeo force-pushed the fix_relay_in_multi-master branch 2 times, most recently from 6ae459d to 87a836b Compare December 13, 2016 00:50
sleep 5
elsif ENV['SLEEP_AS_MASTER'] && !relay?
sleep 5
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will guarantee that a particular master's workers will run the failing suite. around_filter gets called inside each worker just after receiving the suite to run, so it seems like workers will be handed an initial suite to run and then will sleep. I.e., imagine the following execution when we're trying to get the remote master to run the failing suite:

  1. SLEEP_AS_MASTER=1
  2. Central master boots and spawns workers
  3. Central master hands the failing suite to one of its own workers
  4. Central master's workers sleep
  5. Remote master boots and connects to central master
  6. Central master hands all the rest of the suites to the remote master's workers
  7. Remote master's workers run those suites and remote master exits with success
  8. Central master's workers wake up and run their suites, including the failing one

For the sleeping to be effective I think it needs to happen before suites are assigned, i.e., before the workers send their first POP messages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sleeping to be effective I think it needs to happen before suites are assigned, i.e., before the workers send their first POP messages.

Nuts, this makes total sense. My bad.

Do we need to make changes to the runner itself to give us a place to inject this behavior for testing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the sleep in Runner#after_fork should do the trick I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. 😅 Swapped.

Copy link
Collaborator

@aroben aroben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thank you!

@aroben
Copy link
Collaborator

aroben commented Dec 13, 2016

CI is red, but that is what we want. #65 will fix it.

@aroben aroben merged commit be472d8 into tmm1:master Dec 13, 2016
@mistydemeo mistydemeo deleted the fix_relay_in_multi-master branch December 13, 2016 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants