-
Notifications
You must be signed in to change notification settings - Fork 130
Add error path test #1469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add error path test #1469
Conversation
NB This test isn't expected to work until #1465 & tensorlakeai/tensorlake#175 land. |
gm = graph.metadata() | ||
self.assertEqual("Ready", gm.state.status) | ||
im = graph.invocation_metadata(invocation_id) | ||
self.assertEqual("Failed", im.state.status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is no difference in Invocation State between an invocation failed by raising the InvocationError
exception and a runtime Exception
- how does the user know the difference when they are getting the invocation response? In the former case the graph work as expected, it handled bad input and raised an exception, and in the later case the invocation failed because of a runtime error (which is not expected).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception info coming back makes it clear that it was an InvocationError, and the behavior is different (we won't retry the task, and it doesn't count as either an invocation success or a failure for the graph breakage check).
Is there anything we can/should add to make things more clear?
self.assertEqual("Failed", im.state.status) | ||
|
||
def test_permanent_graph_failure(self): | ||
graph = Graph( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of this feature needs to change - we want the graph to be in a broken state if we couldn't load it. Invocations can be failing all the time, but we wouldn't still mark the graph as failed.
This feature is a contract between the platform and the Graph developers to not ship graphs that the platform can't load. Once the graph is loaded and the platform can call it, if invocation fails they are retried unless the exception type is InvocationException
(as described by the test - test_permanent_invocation_failure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right way to write this test would be to -
- Create a tensorlake compute class which throws an exception in the constructor
- Create a Tensorlake compute class which sleeps for 5 minutes and so the health check by executor fails
Show that the graph gets to broken state after we retry for 3 times.
99e1036
to
033d7c7
Compare
fyi InvocationError testing was implemented in this PR https://github.com/tensorlakeai/tensorlake/pull/205/files#diff-981a21c39fc2ff52556d3ba0c2de2a4ccccbd0fc52b6ca02f642ab5640ff87e0 |
Context
Adds a test for the graph error path logic
Contribution Checklist
make fmt
in the package directory.make fmt
inserver/
.