Skip to content

[codex] add flame system tests#466

Merged
k82cn merged 1 commit into
xflops:mainfrom
k82cn:flm_sys_test
May 20, 2026
Merged

[codex] add flame system tests#466
k82cn merged 1 commit into
xflops:mainfrom
k82cn:flm_sys_test

Conversation

@k82cn
Copy link
Copy Markdown
Contributor

@k82cn k82cn commented May 20, 2026

Summary

  • add opt-in Python system tests for stress, longevity, and Runner workloads
  • fuzz session/task input, expected output, common data, payload sizes, service sleep, and session shapes
  • generate structured stress, longevity, and Runner reports with Flame cluster node/executor snapshots
  • consolidate failure recovery coverage into test_core and reuse basic_svc for failure modes

Validation

  • uv run --extra dev ruff check src/e2e/api.py src/e2e/basic_svc.py src/e2e/helpers.py tests/test_core.py tests/test_system.py
  • uv run --extra dev ruff format --check src/e2e/api.py src/e2e/basic_svc.py src/e2e/helpers.py tests/test_core.py tests/test_system.py
  • python3 -m py_compile e2e/src/e2e/api.py e2e/src/e2e/basic_svc.py e2e/src/e2e/helpers.py e2e/tests/test_core.py e2e/tests/test_system.py
  • uv run --extra dev --with ../sdk/python pytest --collect-only tests/test_core.py tests/test_system.py -q
  • uv run --extra dev --with ../sdk/python pytest tests/test_system.py -q
  • direct fuzzer validation for stress and Runner workloads
  • make -n e2e-py-system-runner
  • make -n e2e-py-system-local E2E_SYSTEM_PROFILE=runner E2E_SYSTEM_PYTEST_ARGS='-m runner -q'

Notes

  • live opt-in system tests were not executed locally because they require an active Flame cluster; Runner also requires the flmrun template app and cache/package storage

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive suite of system and integration tests for the Flame platform. Key changes include the addition of opt-in stress, longevity, and runner tests in a new test_system.py file, supported by new Makefile targets and pytest markers. The BasicTestService was enhanced to simulate failures and delays via request flags and common data, enabling the consolidation of error-handling tests into test_core.py and the removal of dedicated error services. Feedback from the reviewer highlights an opportunity to improve efficiency by using direct session lookups instead of list filtering and advises against accessing protected SDK internals for cluster snapshots, suggesting the implementation of public SDK methods instead.

Comment thread e2e/tests/test_core.py Outdated
Comment on lines +980 to +981
sessions = flamepy.list_sessions()
session_status = next((s for s in sessions if s.id == session.id), None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Fetching the entire list of sessions and filtering it in a loop is inefficient, especially in clusters with many active sessions. Since the session ID is already known, it is better to use flamepy.get_session() to retrieve the specific session's status directly.

                session_status = flamepy.get_session(session.id)

Comment thread e2e/tests/test_system.py Outdated
Comment on lines +203 to +204
nodes_response = conn._frontend.ListNodes(ListNodesRequest())
executors_response = conn._frontend.ListExecutor(ListExecutorRequest())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Accessing the protected _frontend attribute and using low-level gRPC request objects directly bypasses the SDK's abstraction layer. This creates a tight coupling with internal implementation details that may change. If the flamepy SDK does not yet provide public methods for listing nodes and executors, consider adding them to the SDK rather than accessing internals in the test suite.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@k82cn k82cn merged commit be8aaee into xflops:main May 20, 2026
6 of 7 checks passed
@k82cn k82cn deleted the flm_sys_test branch May 20, 2026 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant