Conversation
|
Oh, and don't worry about this failing on travis... I think it's due to network issues... sometimes it takes 4 mins on travis, sometimes it takes 30 min and times out left and right... Note locally it only takes 90s to run |
routes/v1.js
Outdated
There was a problem hiding this comment.
Would we not want to publish to task-exception for a particular run that was caused by a worker shutdown?
There was a problem hiding this comment.
Not if we schedule another run to retry. At the moment we don't publish task-exception if the claim expired and we still have retries left, then we schedule a new run with reasonCreated: 'retry'. I think the same logic applies here, ie. we do what we normally would do when a claim is expired.
Note, IMO, task-completed, task-exception and task-failed should only come once for each task. Specifically when the queue is done processing the task. No more retries, etc. If some day we move auto-reruns into the queue it should do the same.
If we need run-level events, let's add them later as run-exception, etc...
There was a problem hiding this comment.
Makes sense, thanks for clearing that up. Right now in a few different places, we rely on those pulse message to indicate a run of a task is complete, rather than information about the task itself. Having a way to listen for when a run has been completed could be useful later, but at least I know this is the behavior for now. Thanks.
There was a problem hiding this comment.
Well, a run being completed also results in the task being resolved as completed.. We don't schedule retries for successful task yet -- I'm sure someone will ask for this feature later, he he :)
|
Okay, the resolvers/reapers... still have the potential timeout issues you mentioned. I think we should solve it at azure library level with timeouts... Note, https://github.com/gluwer/azure-table-node the library we use for table storage already defaults to 30s timeouts... IMO, we should still set this somewhere and make sure that And implement producer consumer pattern for better more stable performance... |
This is a pretty big rewrite... I don't recommend trying to read the diff... Just look at the result..
Please, don't hit the merge button... We shouldn't land this before we've ported both provisioning logic and docker-worker. I'll backport non-deprecated APIs so that we can do this before we deploy this.
Anyways, please leave comments if you have questions... Or if anything looks not right, sketchy just ask...