Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disables / improves several integration tests #1225

Merged
merged 16 commits into from Oct 9, 2019

Conversation

@dposada
Copy link
Member

commented Oct 3, 2019

Changes proposed in this PR

  • marking several tests with xfail
  • in trigger_preemption, moving the share- and quota-lowering to after the "large job" is running

Why are we making these changes?

The reason the preemption tests, test_pool_scheduling, and test_queue_quota_filtering are being marked xfail is that we're running into an issue in some test environments where Mesos is not killing tasks that Cook asks it to. This results in the job being marked Failed, but the task staying in the Running state for hours. When this happens, jobs in these tests don't get scheduled because the "stuck" running tasks are eating up the test user's quota and share. Until we can get to the bottom of that Mesos issue, these tests will need to be allowed to fail.

For the trigger_preemption change, when the "large job" is subject to the already-lowered share, it's less likely to get scheduled in a congested cluster.

@dposada dposada self-assigned this Oct 3, 2019
@dposada dposada requested a review from pschorf Oct 7, 2019
with self.user_factory.admin():
# Reset the user's share and quota
util.reset_limit(self.cook_url, 'share', user.name, reason=self.current_name(), pool=pool)
util.reset_limit(self.cook_url, 'quota', user.name, reason=self.current_name(), pool=pool)

This comment has been minimized.

Copy link
@pschorf

pschorf Oct 7, 2019

Contributor

Can we assert on these responses as well?

This comment has been minimized.

Copy link
@dposada

dposada Oct 7, 2019

Author Member

Good call, done

This comment has been minimized.

Copy link
@dposada

dposada Oct 8, 2019

Author Member

Turns out that if you try to delete share for a user that doesn't have it defined, you get a 500. Will need to revisit this tomorrow.

This comment has been minimized.

Copy link
@pschorf

pschorf Oct 8, 2019

Contributor

Sigh

This comment has been minimized.

Copy link
@dposada

dposada Oct 8, 2019

Author Member

I changed this to set the limit to the default limit instead of deleting it.

This comment has been minimized.

Copy link
@dposada

dposada Oct 9, 2019

Author Member

I also ended up marking the preemption tests as xfail, see the PR description

@dposada dposada requested a review from pschorf Oct 7, 2019
@pschorf
pschorf approved these changes Oct 7, 2019
@pschorf

This comment has been minimized.

Copy link
Contributor

commented Oct 7, 2019

Feel free to merge when the build is green.

dposada added 2 commits Oct 8, 2019
@dposada dposada added wip and removed internal-green labels Oct 8, 2019
dposada added 7 commits Oct 8, 2019
@dposada dposada changed the title Makes trigger_preemption more reliable Disables / improves several integration tests Oct 9, 2019
@pschorf
pschorf approved these changes Oct 9, 2019
@pschorf pschorf merged commit 244bc07 into twosigma:master Oct 9, 2019
1 of 2 checks passed
1 of 2 checks passed
Mergeable Mergeable run returned Status ***FAIL***
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@dposada dposada deleted the dposada:trigger-preemption branch Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.