Conversation
BREAKING CHANGES:
- Room creation is now separate from joining:
- POST /api/rooms creates a new room (returns 201)
- POST /api/rooms/{id}/join joins existing rooms only (returns 404 if not exists)
- Password hashing upgraded from SHA-256 to Argon2id (existing passwords invalidated)
- JWT tokens now have 7-day expiration
Key changes:
- Add ensure_user_exists() to UserService for idempotent user creation
- Users are created on socket connect from JWT claims, not during login
- JWT is now the single source of truth for identity
- Simplified login endpoint - guests get JWT without Redis entry until socket connect
- Added argon2-cffi dependency for secure password hashing
- Frontend: Removed redundant role state, use getRoleFromToken() only
- Updated all tests to use create-then-join pattern
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This refactoring enforces strict single responsibility per endpoint: - /api/user/register: ONLY place for user creation (guest or registered) - /api/login: ONLY authenticates users (never creates) - Socket connect and room join validate user exists Key changes: - Add password validation (minimum 8 characters) - Add get_authoritative_role() for Redis-based role checks - Add is_admin_username() to AdminService for proper error messages - Replace deprecated datetime.utcnow() with time utilities - Remove redundant USERNAME_KEY from frontend localStorage - Update Python client to use register-then-login flow All users must now register before login. Guest flow: 1. POST /api/user/register (creates user) 2. POST /api/login (authenticates, returns JWT) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Security fixes: - Unify all auth error messages to "Authentication failed" to prevent username enumeration attacks - Add username validation (1-64 chars, alphanumeric + underscore/hyphen) to prevent path injection, XSS, and DoS attacks Bug fixes: - Fix 409 handling in frontend auth.ts - now properly handles case where registerData.userName is undefined on conflict response Code cleanup: - Remove unused get_authoritative_role() function (YAGNI) - Update get_current_user_role() docstring to reference AdminService Test updates: - Update tests to expect generic "Authentication failed" message - Fix test usernames to comply with new validation (John Doe → john-doe) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Upgrade argon2-cffi from >=23.1.0 to >=25.1.0 (current stable) - Use datetime.timezone.utc instead of datetime.UTC for Python 3.11.x compatibility - Replace deprecated datetime.utcnow() with utc_now() utility in tests - Add clarifying comment for username regex pattern (1 + 0-63 = 1-64 chars) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Sanitize worker_id to valid username by replacing ':' with '-'
(celery:{uuid} format violates username pattern)
- Remove unused 'public' parameter from _get_extension_class (YAGNI)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move PasswordValidationError import to module level - Remove redundant exception object from log.exception calls - Add specific error message for 409 Conflict in room creation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change PID file from single `~/.zndraw/server.pid` to port-specific
`~/.zndraw/server-{port}.pid` files
- Remove `--force-new-server` flag (no longer needed with port-based detection)
- Add `find_running_server(port)` function for auto-discovery:
- If port specified, only checks that port
- If port is None, checks default port (5000) first, then smallest port
- Update all dependent code to use new API:
- cli.py: Use find_running_server for server detection
- zndraw.py: Use find_running_server for auto-discovery
- zndraw-mcp: Use find_running_server for status checks
- tests: Remove --force-new-server, pass port to remove_server_info
This enables running multiple ZnDraw servers simultaneously on different
ports, each with their own PID file for proper lifecycle management.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds port-aware server discovery and management with per-port PID files under ~/.zndraw, DEFAULT_PORT (5000), new port-scoped server_manager APIs (find_running_server, get_server_status, shutdown_server), and updates to CLI, ZnDraw MCP, and tests to use port semantics. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI
participant ServerMgr as Server Manager
participant FS as Filesystem\n(~/.zndraw)
participant Process
participant Server
note over CLI,ServerMgr: Auto-discovery (port-aware)
CLI->>ServerMgr: find_running_server(port=None)
alt DEFAULT_PORT file exists (server-5000.pid)
ServerMgr->>FS: read server-5000.pid
FS-->>ServerMgr: ServerInfo or None
alt ServerInfo found
ServerMgr->>Process: is PID alive?
Process-->>ServerMgr: alive / dead
alt alive
ServerMgr-->>CLI: ServerInfo (port, pid, url)
else dead
ServerMgr->>FS: remove server-5000.pid
FS-->>ServerMgr: removed
ServerMgr->>FS: list_all_pid_files()
FS-->>ServerMgr: [server-<port>.pid...]
end
else no PID
ServerMgr->>FS: list_all_pid_files()
FS-->>ServerMgr: [server-<port>.pid...]
ServerMgr->>Process: check smallest port PID
end
else no DEFAULT_PORT file
ServerMgr->>FS: list_all_pid_files()
FS-->>ServerMgr: [server-<port>.pid...]
ServerMgr->>Process: check smallest port PID
end
CLI->>Server: connect to server at server_info.port
sequenceDiagram
participant CLI
participant ServerMgr as Server Manager
participant Server
participant FS as Filesystem\n(~/.zndraw)
participant Process
note over CLI,ServerMgr: Shutdown flow (port-aware)
CLI->>ServerMgr: find_running_server(port=specified)
alt ServerInfo found
ServerMgr-->>CLI: ServerInfo
CLI->>ServerMgr: shutdown_server(server_info)
ServerMgr->>Server: POST /shutdown (include token if present)
Server-->>ServerMgr: ack / no ack
ServerMgr->>Process: wait for process exit
Process-->>ServerMgr: exited / timeout
alt exited
ServerMgr->>FS: remove server-{port}.pid
FS-->>ServerMgr: removed
ServerMgr-->>CLI: success
else timeout/failure
ServerMgr-->>CLI: failure
end
else no server
ServerMgr-->>CLI: no server found
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR implements port-based server isolation by replacing a single shared PID file with port-specific PID files, enabling multiple ZnDraw servers to run simultaneously on different ports. The --force-new-server flag is removed as unique ports now naturally ensure new server instances.
- Port-specific PID files replace single shared PID file (e.g.,
server-5000.pidinstead ofserver.pid) - Server discovery logic updated with
find_running_server()that checks default port first, then scans all PID files - Removed
--force-new-serverflag and updated--portflag behavior for better server management
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/zndraw/server_manager.py | Core refactoring to support port-specific PID files; adds find_running_server() for auto-discovery, updates all functions to require port parameter |
| src/zndraw/cli.py | Removes --force-new-server flag, updates server discovery logic to use find_running_server(), improves help text for --port, --status, and --shutdown |
| src/zndraw/zndraw.py | Updates auto-discovery to use new find_running_server() function |
| src/zndraw-mcp/init.py | Updates MCP server integration to use find_running_server() instead of get_server_status() |
| tests/conftest.py | Updates test fixtures to call remove_server_info(port) with port parameter |
| tests/test_zndraw_admin.py | Updates test fixtures to remove --force-new-server flag and call remove_server_info(port) |
| tests/test_auto_discovery.py | Updates tests to mock find_running_server() instead of get_server_status() |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| server_info = find_running_server() | ||
| if server_info is None: | ||
| return ( | ||
| False, | ||
| f"Cannot upload: ZnDraw server is not running. Status: {status_message}", | ||
| "Cannot upload: ZnDraw server is not running.", |
There was a problem hiding this comment.
The error message "Cannot upload: ZnDraw server is not running. Status: {status_message}" was changed to "Cannot upload: ZnDraw server is not running." While this is more concise, the removal of the status_message information reduces debuggability. If the server status check failed for a specific reason (e.g., server exists but not responsive), that context is lost. Consider retaining useful diagnostic information in error messages.
| file_port = int(port_str) | ||
| except (IndexError, ValueError): | ||
| continue | ||
|
|
There was a problem hiding this comment.
The find_running_server function may check the default port twice when it's the only server running. On line 295, it checks get_server_status(DEFAULT_PORT). Then in the loop starting at line 301, if a PID file for the default port exists, it will call get_server_status again for that same port on line 308. This results in redundant process checks and health endpoint requests. Consider tracking which ports have already been checked or skipping the default port in the loop iteration.
| # Skip default port here; it was already checked explicitly above | |
| if file_port == DEFAULT_PORT: | |
| continue |
| def get_current_state() -> tuple[bool, ServerInfo | None, str]: | ||
| def get_current_state() -> tuple[bool, ServerInfo | None]: | ||
| """ | ||
| Returns the current status of the ZnDraw server. |
There was a problem hiding this comment.
The function signature changed from returning a tuple of three values (bool, ServerInfo | None, str) to a tuple of two values (bool, ServerInfo | None). However, the function's docstring still says "Returns the current status of the ZnDraw server" without documenting the updated return type. The return type annotation should be added and the docstring should clarify what the boolean represents (whether a server is running).
| Returns the current status of the ZnDraw server. | |
| Get the current status of the ZnDraw server. | |
| Returns: | |
| tuple[bool, ServerInfo | None]: A pair ``(is_running, server_info)`` where | |
| - ``is_running`` is ``True`` if a ZnDraw server is currently running, | |
| otherwise ``False``. | |
| - ``server_info`` is the corresponding :class:`ServerInfo` if a server | |
| is running, otherwise ``None``. |
| timeout=5.0, | ||
| ) | ||
| return response.status_code == 200 | ||
| except (requests.RequestException, requests.ConnectionError): |
There was a problem hiding this comment.
Catching both requests.RequestException and requests.ConnectionError is redundant because ConnectionError is a subclass of RequestException. The except clause should only catch requests.RequestException to avoid confusion.
| except (requests.RequestException, requests.ConnectionError): | |
| except requests.RequestException: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #832 +/- ##
==========================================
+ Coverage 78.88% 79.42% +0.53%
==========================================
Files 154 156 +2
Lines 18818 19075 +257
==========================================
+ Hits 14845 15150 +305
+ Misses 3973 3925 -48 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Fix redundant exception handling: remove ConnectionError (subclass of RequestException) - Skip default port in find_running_server loop to avoid double-checking - Update docstring for get_current_state with proper return type docs - Improve error messages with actionable guidance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add unit tests for server_manager.py (22 tests): - PID file path generation (port-specific paths) - Read/write/remove server info operations - list_all_pid_files discovery - find_running_server auto-discovery logic - get_server_status with stale file cleanup - Add CLI tests for --status and --shutdown flags (14 tests): - Mock-based tests for CLI behavior - Port-specific vs auto-discovery behavior - Integration tests with real server fixture Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
|
@coderabbitai help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
CodeRabbit configuration file (
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @tests/test_cli_port_features.py:
- Around line 167-174: The help output is missing the --port option; update the
CLI command definition (the function registered with @app.command() in your CLI
module, e.g., the function that builds the main command where app is defined) to
include a port parameter declared like port: int = typer.Option(..., help="...")
and provide concise help text mentioning auto-discovery behavior; ensure the
parameter name is exactly port so Typer generates the --port flag and that the
help string explains the auto-discovery behavior so the test asserting "--port"
in help passes.
🧹 Nitpick comments (1)
src/zndraw/cli.py (1)
437-447: Consider addingstrict=Truetozip()for explicit length checking.While the current code is correct (since
room_namesis derived frompathwith a 1:1 mapping), addingstrict=Truewould make the length invariant explicit and catch potential bugs if the implementation changes in the future.🛡️ Defensive improvement
- for p, room_name in zip(path, room_names): + for p, room_name in zip(path, room_names, strict=True):Apply the same change to other
zip()calls in the file (lines 370, 533).
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/zndraw-mcp/__init__.pysrc/zndraw/cli.pysrc/zndraw/server_manager.pysrc/zndraw/zndraw.pytests/conftest.pytests/test_auto_discovery.pytests/test_cli_port_features.pytests/test_server_manager.pytests/test_zndraw_admin.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: If sensible, implement collections.abc interfaces for classes, such as MutableMapping or MutableSequence
Use numpy style docstrings
Docstrings must be concise and to the point
Use type hints wherever possible. Importtyping as tif necessary, but uselist[int|float] | Noneinstead oft.Optional[t.List[int|float]]
Imports should always be at the top of the file
Files:
tests/conftest.pysrc/zndraw-mcp/__init__.pytests/test_auto_discovery.pytests/test_zndraw_admin.pysrc/zndraw/zndraw.pytests/test_cli_port_features.pysrc/zndraw/server_manager.pytests/test_server_manager.pysrc/zndraw/cli.py
**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/test_*.py: Usepytest.mark.parametrizeto avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class
Files:
tests/test_auto_discovery.pytests/test_zndraw_admin.pytests/test_cli_port_features.pytests/test_server_manager.py
🧬 Code graph analysis (8)
tests/conftest.py (1)
src/zndraw/server_manager.py (1)
remove_server_info(120-131)
src/zndraw-mcp/__init__.py (1)
src/zndraw/server_manager.py (2)
ServerInfo(18-46)find_running_server(273-320)
tests/test_auto_discovery.py (2)
src/zndraw/zndraw.py (1)
ZnDraw(383-1836)tests/conftest.py (1)
server(300-306)
tests/test_zndraw_admin.py (1)
src/zndraw/server_manager.py (1)
remove_server_info(120-131)
src/zndraw/zndraw.py (1)
src/zndraw/server_manager.py (1)
find_running_server(273-320)
tests/test_cli_port_features.py (2)
tests/conftest.py (2)
app(179-189)get_free_port(155-157)src/zndraw/server_manager.py (1)
ServerInfo(18-46)
tests/test_server_manager.py (1)
src/zndraw/server_manager.py (9)
ServerInfo(18-46)find_running_server(273-320)get_pid_file_path(63-77)get_server_status(236-270)get_zndraw_dir(49-60)list_all_pid_files(134-144)read_server_info(80-103)remove_server_info(120-131)write_server_info(106-117)
src/zndraw/cli.py (1)
src/zndraw/server_manager.py (3)
ServerInfo(18-46)find_running_server(273-320)remove_server_info(120-131)
🪛 GitHub Actions: tests
tests/test_cli_port_features.py
[error] 173-173: test_port_help_text failed: the help output does not contain the string "--port" as expected.
🪛 Ruff (0.14.10)
tests/test_cli_port_features.py
193-193: Unused function argument: server
(ARG001)
tests/test_server_manager.py
39-39: Unused function argument: mock_zndraw_dir
(ARG001)
51-51: Unused function argument: mock_zndraw_dir
(ARG001)
53-53: Possible hardcoded password assigned to argument: "shutdown_token"
(S106)
61-61: Possible hardcoded password assigned to: "shutdown_token"
(S105)
64-64: Unused function argument: mock_zndraw_dir
(ARG001)
100-100: Unused function argument: mock_zndraw_dir
(ARG001)
109-109: Unused function argument: mock_zndraw_dir
(ARG001)
115-115: Unused function argument: mock_zndraw_dir
(ARG001)
149-149: Unused function argument: mock_zndraw_dir
(ARG001)
155-155: Unused function argument: mock_zndraw_dir
(ARG001)
170-170: Unused function argument: mock_zndraw_dir
(ARG001)
187-187: Unused function argument: mock_zndraw_dir
(ARG001)
194-194: Unused function argument: pid
(ARG001)
198-198: Unused function argument: port
(ARG001)
217-217: Unused function argument: mock_zndraw_dir
(ARG001)
243-243: Unused function argument: mock_zndraw_dir
(ARG001)
269-269: Unused function argument: mock_zndraw_dir
(ARG001)
286-286: Unused function argument: mock_zndraw_dir
(ARG001)
src/zndraw/cli.py
437-437: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🔇 Additional comments (36)
src/zndraw/zndraw.py (1)
36-36: LGTM! Auto-discovery updated to use port-aware API.The changes correctly replace
get_server_statuswithfind_running_server()and construct the URL fromserver_info.port. The error message properly guides users when no server is found.Also applies to: 453-464
tests/conftest.py (1)
246-246: LGTM! Teardown updated for port-based cleanup.The comment accurately reflects that unique ports ensure fresh servers without requiring
--force-new-server. The teardown correctly passes the port toremove_server_info(port), aligning with the per-port PID management introduced in this PR.Also applies to: 296-296
tests/test_zndraw_admin.py (1)
36-36: LGTM! Admin test fixtures updated for port-based management.Comments and teardown logic properly updated to use
remove_server_info(port), consistent with the per-port cleanup pattern introduced across the test suite.Also applies to: 72-72, 93-93, 129-129
tests/test_auto_discovery.py (2)
22-29: LGTM! Test correctly verifies explicit URL bypasses discovery.The patch target and assertions are properly updated to use
find_running_serverinstead of the deprecatedget_server_status.
34-49: LGTM! Tests properly mock the new discovery API.Both no-server and stale-server scenarios correctly mock
find_running_serverreturning None, which aligns with the new port-aware discovery behavior where stale PID files are automatically cleaned up.tests/test_cli_port_features.py (3)
19-76: LGTM! Status tests are comprehensive and well-structured.The tests properly cover all scenarios: no server, running server, port-specific checks, and auto-discovery behavior. Mocking is correctly applied and assertions verify both exit codes and output messages.
83-160: LGTM! Shutdown tests comprehensively cover all scenarios.The tests properly verify shutdown behavior across multiple scenarios including success, failure, port-specific operations, and auto-discovery. Mock assertions ensure the correct functions are called with expected arguments.
181-201: LGTM! Integration tests verify real server behavior.The tests properly verify status checks against a real running server, including checking both the correct port and an incorrect port.
Note: The static analysis warning about unused
serverargument on line 193 is a false positive. Theserverfixture is intentionally used to ensure a server is running on one port while the test verifies that checking a different port correctly reports no server. This validates that port-specific checks work as intended.src/zndraw-mcp/__init__.py (3)
53-64: LGTM! Clean migration to port-aware discovery.The function correctly uses
find_running_server()and derives the running state from the returnedServerInfo. The simplified return signature (removing the status string) aligns with the PR's goal of using port-aware discovery.
71-84: LGTM! Error handling is clear and concise.The migration to
find_running_server()is correct, and the error message clearly communicates the issue to the user.
88-149: LGTM! Consistent with the port-aware API migration.The function correctly checks for a running server using
find_running_server()and provides a clear error message when none is found.tests/test_server_manager.py (7)
23-27: LGTM! Proper use of pytest fixtures for isolation.The fixture correctly uses
monkeypatchto override the directory path, ensuring tests are isolated and don't affect the user's actual~/.zndrawdirectory.
33-45: LGTM! Clear and focused tests.These tests properly validate that different ports produce different PID file paths, which is critical for the port-based isolation feature.
51-103: LGTM! Comprehensive coverage of file operations.The tests thoroughly cover the happy path, error cases, and edge cases for server info persistence. The error handling tests ensure robustness against invalid data.
109-143: LGTM! Tests validate sorting and filtering.The tests correctly verify that the function returns sorted results and properly filters non-PID files, which is important for reliable server discovery.
149-237: LGTM! Thorough testing of discovery logic.These tests comprehensively validate the server discovery behavior, including the priority order (default port first, then smallest non-default port) and automatic cleanup of stale PID files. The use of mocking is appropriate for isolating the discovery logic.
243-299: LGTM! Comprehensive status checking tests.The tests properly validate different server states, including the important distinction between a process running but the server being unresponsive. The automatic cleanup of stale PID files is correctly tested.
305-307: LGTM! Documents the expected default port.While simple, this test documents the default port value and would catch unintentional changes.
src/zndraw/server_manager.py (9)
14-14: LGTM! Well-chosen default port.Port 5000 is a common default for development servers and is well-documented across the codebase.
49-60: LGTM! Standard config directory pattern.The function correctly creates the directory if needed and uses
exist_ok=Truefor idempotency.
63-77: LGTM! Port-specific file naming ensures isolation.The
server-{port}.pidnaming scheme allows multiple servers to run simultaneously without conflicts.
80-103: LGTM! Robust error handling.The function correctly handles all potential JSON parsing and deserialization errors, including
TypeErrorfor structural mismatches.
106-131: LGTM! Consistent port-aware file operations.Both functions correctly use the port to identify the target file, maintaining consistency across the API.
134-144: LGTM! Sorted results enable deterministic discovery.The sorted return ensures consistent behavior when multiple servers are running, which is critical for the "smallest port" fallback logic in
find_running_server().
236-270: LGTM! Port-aware status checking with automatic cleanup.The function correctly reports port-specific status and automatically cleans up stale PID files, which is important for maintaining accurate state.
273-320: LGTM! Well-designed discovery algorithm.The function implements a sensible discovery strategy:
- If a specific port is requested, check only that port
- Otherwise, check the default port first (most common case)
- Fall back to the smallest non-default port
The port extraction from filenames (line 303) is correct and protected by error handling. Automatic cleanup of stale PID files during discovery helps maintain system hygiene.
323-359: LGTM! Graceful shutdown with cleanup.The function correctly sends a shutdown request with optional token authentication, polls for process exit, and cleans up the PID file on success. The timeout logic (5 seconds) is reasonable, and not cleaning up the PID file on timeout is the right choice since the process may still be running.
src/zndraw/cli.py (9)
17-24: LGTM! Clean API migration.The CLI now uses the simplified
find_running_server()API instead of lower-level server status functions, which makes the code cleaner and more maintainable.
174-179: LGTM! Improved user documentation.The expanded help text clearly explains the port-driven behavior and auto-discovery logic, which will help users understand how to use the CLI effectively.
223-232: LGTM! Consistent help documentation.The help text for status and shutdown options correctly references the port parameter and auto-discovery behavior, maintaining consistency with the
--portoption documentation.
259-270: LGTM! Clear documentation of CLI behavior.The docstring accurately describes the port-aware discovery behavior and provides helpful guidance for users.
297-312: LGTM! Clear status reporting.The status command correctly uses
find_running_server(port)and provides appropriate messages based on whether a specific port was requested or auto-discovery was used.
315-333: LGTM! Consistent shutdown flow.The shutdown command correctly discovers the server and provides appropriate feedback, with port-aware messaging that matches the status command.
395-459: LGTM! Consistent port-aware connection logic.The server discovery and connection flow correctly uses
find_running_server()and consistently constructs URLs usingserver_info.portthroughout. The version compatibility warning is a nice touch.
469-527: LGTM! Clear startup flow with port awareness.The new server startup logic correctly determines the effective port (explicit or default), creates the server info, and provides informative logging with PID and port details.
556-572: LGTM! Proper cleanup on shutdown.The cleanup correctly calls
remove_server_info(config.server_port)with the port parameter, ensuring the correct PID file is removed when the server stops.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/zndraw/cli.py (1)
438-438: Consider addingstrict=Truetozip()for explicitness.While the code is safe (since
room_names = get_room_names(path)always returns a list of the same length aspath), addingstrict=Truewould make the invariant explicit and catch any future refactoring mistakes.♻️ Proposed fix
- for p, room_name in zip(path, room_names): + for p, room_name in zip(path, room_names, strict=True):Note: Apply the same change to lines 371, 438, and 534 for consistency.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
AGENTS.mdsrc/zndraw/cli.pysrc/zndraw/server_manager.py
✅ Files skipped from review due to trivial changes (1)
- AGENTS.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: If sensible, implement collections.abc interfaces for classes, such as MutableMapping or MutableSequence
Use numpy style docstrings
Docstrings must be concise and to the point
Use type hints wherever possible. Importtyping as tif necessary, but uselist[int|float] | Noneinstead oft.Optional[t.List[int|float]]
Imports should always be at the top of the file
Files:
src/zndraw/cli.pysrc/zndraw/server_manager.py
🧬 Code graph analysis (1)
src/zndraw/cli.py (1)
src/zndraw/server_manager.py (5)
ServerInfo(18-46)find_running_server(273-320)shutdown_server(323-359)write_server_info(106-117)remove_server_info(120-131)
🪛 Ruff (0.14.10)
src/zndraw/cli.py
438-438: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pytest (3.11, ubuntu-latest)
- GitHub Check: pytest (3.13, ubuntu-latest)
- GitHub Check: pytest (3.12, ubuntu-latest)
🔇 Additional comments (14)
src/zndraw/cli.py (7)
18-24: LGTM! Imports align with new port-aware API.The import changes correctly bring in
DEFAULT_PORTandfind_running_serverfrom the updatedserver_managermodule, supporting the port-based server isolation feature.
174-180: LGTM! Clear and comprehensive port option help text.The help text effectively explains the three behaviors:
- Connect to existing server on specified port
- Start new server on specified port if none exists
- Auto-discover when port is not specified (default port first, then smallest)
298-313: LGTM! Status command correctly uses port-aware discovery.The status command properly handles both explicit port and auto-discovery cases, with clear messaging that distinguishes between port-specific and general discovery failures.
316-334: LGTM! Shutdown command correctly uses port-aware discovery.The shutdown flow properly discovers the target server (port-specific or auto-discovery) and includes port information in all user-facing messages.
396-461: LGTM! Server discovery and connection logic is correct.The flow properly:
- Uses
find_running_server(port)for discovery- Checks version compatibility and warns on mismatch
- Handles file uploads to existing server with port-specific URLs
- Opens browser at correct port-based URLs
The version compatibility warning is helpful for preventing subtle bugs when CLI and server versions diverge.
470-528: LGTM! Server startup logic correctly uses effective port.The startup flow properly:
- Determines effective port (explicit or DEFAULT_PORT)
- Passes effective port to config builder
- Creates and writes ServerInfo with correct port
- Includes port in user-facing messages
572-572: LGTM! Cleanup correctly uses port-aware removal.The cleanup in the finally block properly removes the port-specific PID file using
remove_server_info(config.server_port).src/zndraw/server_manager.py (7)
14-14: LGTM! DEFAULT_PORT constant enables consistent default behavior.Setting the default port to 5000 provides a well-known starting point for auto-discovery and is consistent with common web server conventions.
49-78: LGTM! Filesystem helpers correctly implement port-specific paths.The helpers properly:
- Create
~/.zndrawdirectory on demand- Generate port-specific PID file paths (
server-{port}.pid)- Use consistent naming convention
80-104: LGTM! Defensive error handling improved.Adding
TypeErrorto the exception handling at line 102 is good defensive coding that handles edge cases where the JSON structure might be valid but the data types are incorrect.
106-145: LGTM! Port-aware CRUD operations are correct.The write, remove, and list operations properly handle port-specific PID files:
write_server_infoderives port fromserver_info.portremove_server_infotakes port parameterlist_all_pid_filesreturns sorted list for deterministic scanning
236-271: LGTM! Port-specific status check includes cleanup.The status function properly:
- Checks port-specific PID file
- Cleans up stale PID files when process is not running
- Validates server responsiveness
- Includes port information in status messages
273-321: LGTM! Auto-discovery logic is well-designed.The discovery function implements the correct priority:
- If port specified, check only that port
- If None, check DEFAULT_PORT first (fastest path for typical case)
- Scan all other PID files (excluding DEFAULT_PORT to avoid duplicate check)
- Return server with smallest port number (deterministic selection)
The port parsing at lines 303-306 includes proper error handling for malformed filenames.
323-359: LGTM! Shutdown flow includes proper cleanup and polling.The shutdown function properly:
- Checks process existence before attempting shutdown
- Includes shutdown token in request headers when available
- Polls for process exit (25 attempts × 0.2s = 5s total)
- Cleans up port-specific PID file on successful shutdown
- Handles request exceptions gracefully
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @tests/test_cli_port_features.py:
- Around line 196-204: The test function test_status_wrong_port_with_real_server
has an unused fixture parameter server; remove server from the function
signature so it only accepts get_free_port, e.g., change def
test_status_wrong_port_with_real_server(server, get_free_port): to def
test_status_wrong_port_with_real_server(get_free_port): and run tests to ensure
no other references rely on server in that test.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/test_cli_port_features.pytests/test_job_errors.pytests/test_vis_figures.pytests/test_workers.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: If sensible, implement collections.abc interfaces for classes, such as MutableMapping or MutableSequence
Use numpy style docstrings
Docstrings must be concise and to the point
Use type hints wherever possible. Importtyping as tif necessary, but uselist[int|float] | Noneinstead oft.Optional[t.List[int|float]]
Imports should always be at the top of the file
Files:
tests/test_job_errors.pytests/test_workers.pytests/test_vis_figures.pytests/test_cli_port_features.py
**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/test_*.py: Usepytest.mark.parametrizeto avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class
Files:
tests/test_job_errors.pytests/test_workers.pytests/test_vis_figures.pytests/test_cli_port_features.py
🧬 Code graph analysis (3)
tests/test_job_errors.py (2)
src/zndraw/zndraw.py (2)
run(1644-1689)wait(917-928)src/zndraw/job.py (2)
wait(91-119)status(55-65)
tests/test_workers.py (2)
src/zndraw/zndraw.py (1)
wait(917-928)src/zndraw/job.py (1)
wait(91-119)
tests/test_cli_port_features.py (2)
tests/conftest.py (2)
app(179-189)get_free_port(155-157)src/zndraw/server_manager.py (1)
ServerInfo(18-46)
🪛 Ruff (0.14.10)
tests/test_cli_port_features.py
196-196: Unused function argument: server
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pytest (3.12, ubuntu-latest)
- GitHub Check: pytest (3.11, ubuntu-latest)
- GitHub Check: pytest (3.13, ubuntu-latest)
🔇 Additional comments (9)
tests/test_workers.py (2)
873-874: Timeout increase addresses CI flakiness.The timeout increase from 5 to 15 seconds aligns with the failing test mentioned in the PR comments. This is reasonable for concurrent job execution with multiple workers in CI environments.
885-887: LGTM - consistent timeout adjustment.The 15-second timeout is applied consistently across all three jobs in the concurrent execution scenario.
tests/test_job_errors.py (2)
21-21: Test refactored to use client API.The removal of
get_jwt_auth_headersparameter and switch to using the client API (instead of direct HTTP calls) simplifies the test. This aligns with the Job.wait() pattern used throughout the test suite.
30-40: Improved test reliability with Job.wait().The refactored test flow using
client.run()andjob.wait(timeout=10)addresses the flaky test behavior mentioned in the PR comments where jobs remained in "assigned" status. The in-memory status check (job.status == "completed") is more robust than HTTP polling.tests/test_vis_figures.py (1)
319-322: More robust wait pattern addresses test failure.Replacing the fixed 4-second sleep with
job.wait(timeout=30)addresses the failing test mentioned in PR comments where "Properties2D" was not present in vis.figures. The new approach waits for actual job completion rather than assuming a fixed duration.tests/test_cli_port_features.py (4)
1-12: Well-structured test file for port-based CLI features.The test file provides comprehensive coverage of the new port-aware CLI behavior introduced in this PR, including status checks, shutdown operations, and help text verification.
19-76: Thorough coverage of --status flag scenarios.The tests cover all critical scenarios: no server, server running, explicit port, and auto-discovery. The use of mocking for
find_running_serverenables isolated unit testing of the CLI logic.
83-160: Comprehensive --shutdown flag testing.The shutdown tests mirror the structure of status tests, covering both success and failure scenarios with proper exit code verification.
167-177: CI-compatible help text verification.The
TERM=dumbandCOLUMNS=200environment variables ensure consistent Rich output in CI environments where TTY may not be available. This addresses the test failure mentioned in PR comments.
| def test_status_wrong_port_with_real_server(server, get_free_port): | ||
| """--status --port with wrong port shows no server.""" | ||
| # Get a different port that's not running | ||
| other_port = get_free_port() | ||
|
|
||
| result = runner.invoke(app, ["--status", "--port", str(other_port)]) | ||
|
|
||
| assert result.exit_code == 1 | ||
| assert f"No ZnDraw server running on port {other_port}" in result.output |
There was a problem hiding this comment.
Remove unused server fixture parameter.
The server parameter is unused in this test function. The test only needs get_free_port to obtain an alternative port for verification.
🔧 Proposed fix
-def test_status_wrong_port_with_real_server(server, get_free_port):
+def test_status_wrong_port_with_real_server(get_free_port):
"""--status --port with wrong port shows no server."""
# Get a different port that's not running
other_port = get_free_port()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_status_wrong_port_with_real_server(server, get_free_port): | |
| """--status --port with wrong port shows no server.""" | |
| # Get a different port that's not running | |
| other_port = get_free_port() | |
| result = runner.invoke(app, ["--status", "--port", str(other_port)]) | |
| assert result.exit_code == 1 | |
| assert f"No ZnDraw server running on port {other_port}" in result.output | |
| def test_status_wrong_port_with_real_server(get_free_port): | |
| """--status --port with wrong port shows no server.""" | |
| # Get a different port that's not running | |
| other_port = get_free_port() | |
| result = runner.invoke(app, ["--status", "--port", str(other_port)]) | |
| assert result.exit_code == 1 | |
| assert f"No ZnDraw server running on port {other_port}" in result.output |
🧰 Tools
🪛 Ruff (0.14.10)
196-196: Unused function argument: server
(ARG001)
🤖 Prompt for AI Agents
In @tests/test_cli_port_features.py around lines 196 - 204, The test function
test_status_wrong_port_with_real_server has an unused fixture parameter server;
remove server from the function signature so it only accepts get_free_port,
e.g., change def test_status_wrong_port_with_real_server(server, get_free_port):
to def test_status_wrong_port_with_real_server(get_free_port): and run tests to
ensure no other references rely on server in that test.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/test_workers.py (2)
863-893: Same timeout issue as test_submit_task_twice_via_vis_run.This test exhibits the same pattern: increased timeouts from 5s to 15s and added registration sleep. Given the failure in the previous test with similar logic, this test may also be at risk of flakiness.
The underlying job pickup issue needs to be resolved rather than relying on longer timeouts. Based on coding guidelines, the tests should verify specific behavior quickly and reliably.
895-910: Same timeout pattern - part of systemic issue.This test follows the same pattern of increased timeouts and added sleeps. As with the other tests, the root cause of job pickup failure should be investigated rather than masking with longer timeouts.
🤖 Fix all issues with AI agents
In @tests/test_workers.py:
- Line 840: Replace the fixed vis1.socket.sio.sleep(0.5) pauses with a polling
helper that queries the room extension workers endpoint until the test client
appears as an idle worker; create a utility (e.g., wait_for_registration(vis,
extension_name, category, timeout=5.0)) that repeatedly GETs
f"{vis.url}/api/rooms/{vis.room}/extensions/{category}/{extension_name}/workers",
checks response.json() for vis.sid in "idleWorkers", sleeps briefly between
polls, and raises on timeout, then call wait_for_registration(vis1,
<extension_name>, <category>) in place of the hard sleep (apply same change for
occurrences at the other lines mentioned).
🧹 Nitpick comments (3)
tests/test_dual_namespace_extensions.py (1)
177-177: Consider replacing arbitrary sleep with explicit synchronization.The 0.5-second sleep after registration suggests that extension registration is asynchronous without a proper completion signal. This can lead to flaky tests if the duration is insufficient under varying conditions.
Consider implementing one of the following:
- Add a synchronous confirmation endpoint or mechanism to verify registration completion
- Poll the schema endpoint until the extension appears (with timeout)
- Make
register_extensionblock until registration is confirmed💡 Example: Poll for registration completion
# Instead of arbitrary sleep, poll until registration is confirmed vis.register_extension(DualExtension, public=False) vis.register_extension(DualExtension, public=True) # Poll for registration completion import time timeout = 5 start = time.time() while time.time() - start < timeout: response = requests.get(f"{server}/api/rooms/test_room/schema/modifiers") if response.status_code == 200: schemas = response.json() extension_names = [ext.get("name") for ext in schemas] if extension_names.count("DualExtension") >= 2: # Both registrations complete break vis.socket.sio.sleep(0.1) else: raise TimeoutError("Extension registration did not complete in time")tests/test_vis_figures.py (1)
319-322: LGTM! Consider adding timeout error handling.The switch to
job.wait(timeout=30)properly leverages the new asynchronous Job API. The 30-second timeout is appropriate for this integration test.However, consider wrapping the wait in a try-except to provide clearer test failure messages if the timeout is reached.
♻️ Optional: Add explicit timeout handling
job = vis.run(analysis) - # Wait for celery worker to process the job using Job.wait() - job.wait(timeout=30) + # Wait for celery worker to process the job using Job.wait() + try: + job.wait(timeout=30) + except TimeoutError as e: + pytest.fail(f"Job failed to complete in time. Status: {job.status}, Error: {e}")tests/test_workers.py (1)
837-893: Optional: Consider splitting complex test scenarios.While these tests follow pytest best practices with parametrization, they verify multiple scenarios in a single test (e.g., submitting once, twice, then checking both completions). Per coding guidelines, tests should be very specific and test only one thing.
Consider splitting into focused tests:
test_submit_single_job_via_vis_runtest_submit_failed_job_via_vis_runtest_submit_multiple_jobs_queuingThis improves:
- Test failure diagnosis (clearer which scenario failed)
- Test isolation and reliability
- Readability and maintainability
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/test_dual_namespace_extensions.pytests/test_job_errors.pytests/test_vis_figures.pytests/test_workers.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: If sensible, implement collections.abc interfaces for classes, such as MutableMapping or MutableSequence
Use numpy style docstrings
Docstrings must be concise and to the point
Use type hints wherever possible. Importtyping as tif necessary, but uselist[int|float] | Noneinstead oft.Optional[t.List[int|float]]
Imports should always be at the top of the file
Files:
tests/test_job_errors.pytests/test_vis_figures.pytests/test_dual_namespace_extensions.pytests/test_workers.py
**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/test_*.py: Usepytest.mark.parametrizeto avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class
Files:
tests/test_job_errors.pytests/test_vis_figures.pytests/test_dual_namespace_extensions.pytests/test_workers.py
🧬 Code graph analysis (4)
tests/test_job_errors.py (2)
src/zndraw/zndraw.py (2)
run(1644-1689)wait(917-928)src/zndraw/job.py (2)
wait(91-119)status(55-65)
tests/test_vis_figures.py (1)
src/zndraw/zndraw.py (2)
run(1644-1689)wait(917-928)
tests/test_dual_namespace_extensions.py (1)
src/zndraw/job.py (1)
wait(91-119)
tests/test_workers.py (2)
src/zndraw/job.py (2)
wait(91-119)status(55-65)app/src/types/jobs.ts (1)
JobStatus(4-9)
🪛 GitHub Actions: tests
tests/test_job_errors.py
[error] 36-36: TimeoutError: Job 0a8b98dd-8969-4ce6-aef3-e3786dd76b0a did not complete within 10 seconds
tests/test_workers.py
[error] 845-845: TimeoutError: Job fb2d6e1c-e0b6-4baf-a897-9373a2bc53d6 did not complete within 15 seconds
🔇 Additional comments (4)
tests/test_dual_namespace_extensions.py (1)
185-185: Investigate root cause of timeout increase.The timeout was tripled from 5 to 15 seconds. Given that the PR introduces port-based server isolation and there are 2 other failing tests with similar timeout issues (noted in PR comments), this suggests potential performance regression or race conditions introduced by the changes.
Verify whether:
- The port-based isolation introduces legitimate processing overhead that requires longer waits
- There are race conditions in server discovery/startup that aren't properly synchronized
- 15 seconds is sufficient or if tests could still be flaky under load
If the increased timeout is purely due to timing issues in the test environment rather than actual processing time, consider adding explicit synchronization or event-based waiting instead of relying on longer timeouts.
tests/test_vis_figures.py (1)
363-366: Same pattern as test_properties_2d_analysis.The implementation correctly uses the new Job API. Same optional suggestion applies: consider adding explicit timeout error handling for clearer test diagnostics.
tests/test_job_errors.py (1)
21-42: Verify if test is actually failing in CI and provide error logs.The review references a TimeoutError with a job stuck in "assigned" state, but provides no CI logs or error messages as evidence. Before attributing the failure to specific causes like insufficient registration time or auto-pickup issues, confirm:
- The actual CI failure output and error message
- Whether this test consistently fails or is intermittent
- The actual duration of the timeout (verify it's really ~10 seconds)
- Whether other tests with similar registration patterns also fail
Once you provide the CI logs and implementation context, the specific causes (registration timing, race conditions, or auto-pickup mechanism issues) can be properly diagnosed.
tests/test_workers.py (1)
840-860: The review comment is based on incorrect assumptions about the code changes. The test file is newly created rather than modified, and timeout values were set to 15 seconds from the start—not increased from 5s. Without evidence of actual CI failures or job assignment issues, the claims about systemic problems in the auto-pickup job flow cannot be substantiated. The existing implementation appears sound: the job assignment mechanism properly emits socket events, workers fetch job details via REST API, and status transitions are correctly validated.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @tests/test_job_errors.py:
- Around line 28-29: The test has a race where
vis.register_extension(TestExtension) is followed by only
vis.socket.sio.sleep(0.5), so the worker may not be ready before the job is
submitted; increase the wait to a longer duration (e.g., 2-3 seconds) or replace
the fixed sleep with a polling readiness check that inspects the extension
registration state (e.g., loop until TestExtension appears in vis.extensions or
a vis.socket readiness flag is true) before submitting the job, ensuring
Job.wait() no longer times out.
In @tests/test_workers.py:
- Line 840: The test uses hardcoded vis1.socket.sio.sleep(0.5) to wait for
extension worker registration (and similar sleeps at other places) which is
flaky; implement a polling helper (e.g., wait_for_registration(vis,
extension_name, category, timeout=...)) that repeatedly GETs the
/api/rooms/{vis.room}/extensions/{category}/{extension_name}/workers endpoint
via vis.url until response.status_code == 200 and
response.json()["totalWorkers"] > 0 (or timeout), then replace calls to
vis1.socket.sio.sleep(...) after register_extension with calls to
wait_for_registration(vis1, "<extension_name>", "<category>") and apply the same
replacement for other sleeps (tests referencing vis, register_extension, and the
workers endpoint).
🧹 Nitpick comments (2)
tests/test_job_errors.py (2)
7-7: Remove unused import.The
requestsmodule is no longer used after refactoring from HTTP POST to theclient.run()pattern.♻️ Remove the unused import
-import requests - from zndraw import ZnDraw
34-42: Add proper cleanup on timeout.If
job.wait()raises aTimeoutError, the disconnect calls won't execute, potentially leaving connections open. Consider using a try/finally block or pytest fixtures to ensure cleanup happens even when assertions fail.♻️ Add try/finally for reliable cleanup
# Submit and wait for job using the Job.wait() pattern -job = client.run(TestExtension(parameter=42)) -job.wait(timeout=30) - -# Check job completed -assert job.status == "completed" - -client.disconnect() -vis.disconnect() +try: + job = client.run(TestExtension(parameter=42)) + job.wait(timeout=30) + + # Check job completed + assert job.status == "completed" +finally: + client.disconnect() + vis.disconnect()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/test_dual_namespace_extensions.pytests/test_job_errors.pytests/test_workers.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_dual_namespace_extensions.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: If sensible, implement collections.abc interfaces for classes, such as MutableMapping or MutableSequence
Use numpy style docstrings
Docstrings must be concise and to the point
Use type hints wherever possible. Importtyping as tif necessary, but uselist[int|float] | Noneinstead oft.Optional[t.List[int|float]]
Imports should always be at the top of the file
Files:
tests/test_job_errors.pytests/test_workers.py
**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/test_*.py: Usepytest.mark.parametrizeto avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class
Files:
tests/test_job_errors.pytests/test_workers.py
🧬 Code graph analysis (2)
tests/test_job_errors.py (2)
src/zndraw/zndraw.py (2)
run(1644-1689)wait(917-928)src/zndraw/job.py (2)
wait(91-119)status(55-65)
tests/test_workers.py (1)
src/zndraw/job.py (2)
wait(91-119)status(55-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pytest (3.13, ubuntu-latest)
- GitHub Check: pytest (3.11, ubuntu-latest)
- GitHub Check: pytest (3.12, ubuntu-latest)
🔇 Additional comments (2)
tests/test_job_errors.py (1)
17-18: Verify that the emptyrun()method is intentional.The
run()method currently does nothing, which means the test only verifies that the job framework completes without checking whether the extension logic actually executes. If this is intentional for testing the framework itself, consider adding a comment to clarify. Otherwise, consider adding minimal work (e.g., setting a flag or modifying state) to verify execution.tests/test_workers.py (1)
845-845: Investigate why timeout values were increased to 30 seconds instead of addressing root causes.The timeout increases to 30s at lines 845, 849, 856, 858, 875-876, 887-889, and 907, 909 need investigation to determine if they represent a lasting solution or a temporary workaround. Without clear documentation of what failures prompted these changes, it's difficult to assess whether:
- The underlying cause (job assignment delays, worker pickup issues, etc.) has been addressed
- These timeouts will cause test suite slowdown or mask performance regressions
- A more targeted fix to the job coordination mechanism would be more appropriate
Please clarify the specific test failures that necessitated increasing timeouts and confirm that the root cause has been identified and addressed.
| vis.register_extension(TestExtension) | ||
| vis.socket.sio.sleep(0.5) # Give registration time to complete |
There was a problem hiding this comment.
Address the race condition causing test failures.
This test is currently failing in CI with "job remained in assigned state; Job.wait() timed out after 10s" (per PR comments). The 0.5-second sleep after registration is likely insufficient to ensure the extension is fully registered and ready to pick up jobs. The job is being submitted before the worker is ready to process it.
Consider one of these solutions:
- Increase the sleep duration significantly (e.g., 2-3 seconds)
- Implement a polling check that waits for the extension to be confirmed as registered
- Add an explicit ready/health check endpoint to verify worker readiness
💡 Suggested fix: Increase sleep duration
vis.register_extension(TestExtension)
-vis.socket.sio.sleep(0.5) # Give registration time to complete
+vis.socket.sio.sleep(3.0) # Give registration time to complete and worker to be readyNote: This is a temporary fix. A more robust solution would be to implement an explicit readiness check.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| vis.register_extension(TestExtension) | |
| vis.socket.sio.sleep(0.5) # Give registration time to complete | |
| vis.register_extension(TestExtension) | |
| vis.socket.sio.sleep(3.0) # Give registration time to complete and worker to be ready |
🤖 Prompt for AI Agents
In @tests/test_job_errors.py around lines 28 - 29, The test has a race where
vis.register_extension(TestExtension) is followed by only
vis.socket.sio.sleep(0.5), so the worker may not be ready before the job is
submitted; increase the wait to a longer duration (e.g., 2-3 seconds) or replace
the fixed sleep with a polling readiness check that inspects the extension
registration state (e.g., loop until TestExtension appears in vis.extensions or
a vis.socket readiness flag is true) before submitting the job, ensuring
Job.wait() no longer times out.
| def test_submit_task_twice_via_vis_run(server, public): | ||
| vis1 = ZnDraw(url=server, room="testroom", user="testuser", auto_pickup_jobs=True) | ||
| vis1.register_extension(RaiseOnParameterExtension, public=public) | ||
| vis1.socket.sio.sleep(0.5) # Give registration time to complete |
There was a problem hiding this comment.
Replace hardcoded sleeps with proper synchronization.
The hardcoded sleep(0.5) calls after extension registration are a test anti-pattern that can lead to flaky tests and unnecessarily slow test execution. If registration is now asynchronous due to port-based server isolation, the tests should poll for registration completion rather than rely on arbitrary delays.
Consider implementing a helper that polls the /api/rooms/{room}/extensions/{category}/{extension}/workers endpoint until the worker appears, or expose a registration completion event through the socket connection.
♻️ Suggested polling-based approach
def wait_for_registration(vis, extension_name, category, timeout=5.0):
"""Wait for extension to be registered with an available worker."""
import time
start = time.time()
while time.time() - start < timeout:
response = requests.get(
f"{vis.url}/api/rooms/{vis.room}/extensions/{category}/{extension_name}/workers"
)
if response.status_code == 200 and response.json()["totalWorkers"] > 0:
return
time.sleep(0.1)
raise TimeoutError(f"Extension {extension_name} did not register within {timeout}s")Then replace the sleep calls:
vis1.register_extension(RaiseOnParameterExtension, public=public)
-vis1.socket.sio.sleep(0.5) # Give registration time to complete
+wait_for_registration(vis1, "RaiseOnParameterExtension", "modifiers")Note: The registration sleep is applied inconsistently. Line 823 (test_submit_task_via_vis_run) also calls register_extension but has no sleep, suggesting these changes were made reactively rather than systematically.
Also applies to: 868-868, 900-900
🤖 Prompt for AI Agents
In @tests/test_workers.py at line 840, The test uses hardcoded
vis1.socket.sio.sleep(0.5) to wait for extension worker registration (and
similar sleeps at other places) which is flaky; implement a polling helper
(e.g., wait_for_registration(vis, extension_name, category, timeout=...)) that
repeatedly GETs the
/api/rooms/{vis.room}/extensions/{category}/{extension_name}/workers endpoint
via vis.url until response.status_code == 200 and
response.json()["totalWorkers"] > 0 (or timeout), then replace calls to
vis1.socket.sio.sleep(...) after register_extension with calls to
wait_for_registration(vis1, "<extension_name>", "<category>") and apply the same
replacement for other sleeps (tests referencing vis, register_extension, and the
workers endpoint).
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.