[codex] add flame system tests#466
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive suite of system and integration tests for the Flame platform. Key changes include the addition of opt-in stress, longevity, and runner tests in a new test_system.py file, supported by new Makefile targets and pytest markers. The BasicTestService was enhanced to simulate failures and delays via request flags and common data, enabling the consolidation of error-handling tests into test_core.py and the removal of dedicated error services. Feedback from the reviewer highlights an opportunity to improve efficiency by using direct session lookups instead of list filtering and advises against accessing protected SDK internals for cluster snapshots, suggesting the implementation of public SDK methods instead.
| sessions = flamepy.list_sessions() | ||
| session_status = next((s for s in sessions if s.id == session.id), None) |
There was a problem hiding this comment.
Fetching the entire list of sessions and filtering it in a loop is inefficient, especially in clusters with many active sessions. Since the session ID is already known, it is better to use flamepy.get_session() to retrieve the specific session's status directly.
session_status = flamepy.get_session(session.id)| nodes_response = conn._frontend.ListNodes(ListNodesRequest()) | ||
| executors_response = conn._frontend.ListExecutor(ListExecutorRequest()) |
There was a problem hiding this comment.
Accessing the protected _frontend attribute and using low-level gRPC request objects directly bypasses the SDK's abstraction layer. This creates a tight coupling with internal implementation details that may change. If the flamepy SDK does not yet provide public methods for listing nodes and executors, consider adding them to the SDK rather than accessing internals in the test suite.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary
Validation
Notes