-
Notifications
You must be signed in to change notification settings - Fork 1.3k
test concurrent setCurrent/setRamping requests #7512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tests/worker_deployment_test.go
Outdated
| }) | ||
| } | ||
|
|
||
| func (s *WorkerDeploymentSuite) TestSetCurrent_NoDeploymentVersion() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just moved this test to an area which had all other setCurrent tests - no need to review this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you had a chance to test these new tests for flakiness? (ie. by running it through the CI action that runs it 50x in cassandra or something)
carlydf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding of the ticket, I think that these tests should not override the MaxInFlightUpdates -- we should test against the default because that's what customers will have. Ideally that will only cause throttling or timeout errors and nothing "weird." If any other error type comes up, that's a bug that should be addressed.
tests/worker_deployment_test.go
Outdated
|
|
||
| for i := 0; i < versions; i++ { | ||
| err := <-errChan | ||
| s.NoError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of these requests might be expected to get throttling error or context timeout error. If you decrease the # of WorkflowExecutionMaxInFlightUpdates to the default (which is 10), do you see anything weird, or just throttling + timeout errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of s.NoError you could keep the rest of this test the same, and just assert that the error is either:
- nil
- timeout
- throttling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just bringing this up since the ticket said "confirm that concurrent updates with different inputs don't result in any weird errors, only throttling or timeout or success"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there is one important point to consider - when a user sends in a setCurrent request (for example), there are going to be 2 in-flight updates (inflight updates are those updates that have been admitted but not completed). For each set request (1 in-flight update since it's not completed), we have a sync call request (another in-flight update). Thus, for 10 setCurrent requests, we have to have the limit set to 20. So, setting the limit to 10 shall give you ResourceExhaustedErrors.
Looking at the ticket again, I took a step back and tried understanding what it is asking us to test - It seems like the main motivation behind the ticket is that when we do send in 10 concurrent requests, we should not get any "weird" errors - however, I think we should slightly change what the ticket is asking to achieve it's main purpose.
We should be testing that given enough time for our system to process these 10 concurrent updates with the system being ready to handle these 10 concurrent updates, does anything weird at all happen? This makes it easy to catch for things we wanted to catch from the get-go: any weird errors.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I do see the other side of things now too - this ticket could just be asking how the system reacts to these large number of concurrent requests if we don't change the original state of the system! (like increasing the number of concurrent updates)
In that case, a user should not be seeing any of those "weird" errors except ResourceExhaustedErrors or Timeout.
I think this is what you had meant earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it the same way as you put it in your second comment! Are you on board with writing the test that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on board and have written them like this! also tested each one of them in the CI and they all pass.
tests/worker_deployment_test.go
Outdated
| } | ||
|
|
||
| func (s *WorkerDeploymentSuite) TestSetCurrentVersion_Concurrent_DifferentVersions() { | ||
| s.OverrideDynamicConfig(dynamicconfig.WorkflowExecutionMaxInFlightUpdates, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test should use the default value for this (which is 10). What do you think?
tests/worker_deployment_test.go
Outdated
| } | ||
|
|
||
| func (s *WorkerDeploymentSuite) TestSetCurrentVersion_Concurrent_SameVersion() { | ||
| s.OverrideDynamicConfig(dynamicconfig.WorkflowExecutionMaxInFlightUpdates, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, we should be testing this with default config
tests/worker_deployment_test.go
Outdated
|
|
||
| for i := 0; i < 10; i++ { | ||
| err := <-errChan | ||
| s.NoError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, throttling or timeout is acceptable
|
❌ We are unable to process any of the uploaded JUnit XML files. Please ensure your files are in the right format. |
carlydf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth stating that your override is simply ensuring that the MaxUpdates value is the default for these tests! Other than that looks great
What changed?
Why?
How did you test it?
Potential risks
Documentation
Is hotfix candidate?