Skip to content

Feat/port based server isolation#832

Merged
PythonFZ merged 22 commits intomainfrom
feat/port-based-server-isolation
Jan 8, 2026
Merged

Feat/port based server isolation#832
PythonFZ merged 22 commits intomainfrom
feat/port-based-server-isolation

Conversation

@PythonFZ
Copy link
Copy Markdown
Member

@PythonFZ PythonFZ commented Jan 7, 2026

Summary by CodeRabbit

  • New Features

    • Port-aware server discovery with a default port, per-port server tracking, and CLI that shows PID/port and connects to specified ports.
  • Bug Fixes

    • Simplified, consistent error/status messages and improved cleanup of stale per-port server state.
  • Chores

    • Removed force-new-server CLI flag; prefer explicit port selection.
  • Tests

    • Added comprehensive port/CLI and server-manager tests; updated tests for async jobs and increased timeouts.

✏️ Tip: You can customize this high-level summary in your review settings.

PythonFZ and others added 10 commits January 7, 2026 19:29
BREAKING CHANGES:
- Room creation is now separate from joining:
  - POST /api/rooms creates a new room (returns 201)
  - POST /api/rooms/{id}/join joins existing rooms only (returns 404 if not exists)
- Password hashing upgraded from SHA-256 to Argon2id (existing passwords invalidated)
- JWT tokens now have 7-day expiration

Key changes:
- Add ensure_user_exists() to UserService for idempotent user creation
- Users are created on socket connect from JWT claims, not during login
- JWT is now the single source of truth for identity
- Simplified login endpoint - guests get JWT without Redis entry until socket connect
- Added argon2-cffi dependency for secure password hashing
- Frontend: Removed redundant role state, use getRoleFromToken() only
- Updated all tests to use create-then-join pattern

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This refactoring enforces strict single responsibility per endpoint:
- /api/user/register: ONLY place for user creation (guest or registered)
- /api/login: ONLY authenticates users (never creates)
- Socket connect and room join validate user exists

Key changes:
- Add password validation (minimum 8 characters)
- Add get_authoritative_role() for Redis-based role checks
- Add is_admin_username() to AdminService for proper error messages
- Replace deprecated datetime.utcnow() with time utilities
- Remove redundant USERNAME_KEY from frontend localStorage
- Update Python client to use register-then-login flow

All users must now register before login. Guest flow:
1. POST /api/user/register (creates user)
2. POST /api/login (authenticates, returns JWT)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Security fixes:
- Unify all auth error messages to "Authentication failed" to prevent
  username enumeration attacks
- Add username validation (1-64 chars, alphanumeric + underscore/hyphen)
  to prevent path injection, XSS, and DoS attacks

Bug fixes:
- Fix 409 handling in frontend auth.ts - now properly handles case where
  registerData.userName is undefined on conflict response

Code cleanup:
- Remove unused get_authoritative_role() function (YAGNI)
- Update get_current_user_role() docstring to reference AdminService

Test updates:
- Update tests to expect generic "Authentication failed" message
- Fix test usernames to comply with new validation (John Doe → john-doe)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Upgrade argon2-cffi from >=23.1.0 to >=25.1.0 (current stable)
- Use datetime.timezone.utc instead of datetime.UTC for Python 3.11.x compatibility
- Replace deprecated datetime.utcnow() with utc_now() utility in tests
- Add clarifying comment for username regex pattern (1 + 0-63 = 1-64 chars)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Sanitize worker_id to valid username by replacing ':' with '-'
  (celery:{uuid} format violates username pattern)
- Remove unused 'public' parameter from _get_extension_class (YAGNI)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move PasswordValidationError import to module level
- Remove redundant exception object from log.exception calls
- Add specific error message for 409 Conflict in room creation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change PID file from single `~/.zndraw/server.pid` to port-specific
  `~/.zndraw/server-{port}.pid` files
- Remove `--force-new-server` flag (no longer needed with port-based detection)
- Add `find_running_server(port)` function for auto-discovery:
  - If port specified, only checks that port
  - If port is None, checks default port (5000) first, then smallest port
- Update all dependent code to use new API:
  - cli.py: Use find_running_server for server detection
  - zndraw.py: Use find_running_server for auto-discovery
  - zndraw-mcp: Use find_running_server for status checks
  - tests: Remove --force-new-server, pass port to remove_server_info

This enables running multiple ZnDraw servers simultaneously on different
ports, each with their own PID file for proper lifecycle management.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Adds port-aware server discovery and management with per-port PID files under ~/.zndraw, DEFAULT_PORT (5000), new port-scoped server_manager APIs (find_running_server, get_server_status, shutdown_server), and updates to CLI, ZnDraw MCP, and tests to use port semantics.

Changes

Cohort / File(s) Summary
Server manager (port-aware core)
src/zndraw/server_manager.py
Adds DEFAULT_PORT (5000); per-port PID/file helpers (get_zndraw_dir(), get_pid_file_path(port), read_server_info(port), write_server_info(), remove_server_info(port), list_all_pid_files()); implements get_server_status(port), find_running_server(port=None), and shutdown_server(server_info); refactors status/shutdown to be port-scoped and improves JSON/error handling.
CLI and high-level callers
src/zndraw/cli.py, src/zndraw/zndraw.py
Replace get_server_status usages with find_running_server; import/use DEFAULT_PORT; remove --force-new-server; make status/startup/shutdown flows, URL construction, server-info read/write, and cleanup port-aware; update runtime/help messages to include port and PID.
ZnDraw MCP integration
src/zndraw-mcp/__init__.py
Change get_current_state() signature to return `tuple[bool, ServerInfo
Tests — startup, mocks, teardown updates
tests/conftest.py, tests/test_auto_discovery.py, tests/test_zndraw_admin.py
Replace mocks/assertions targeting get_server_status with find_running_server; remove --force-new-server from server startup commands; update cleanup calls to remove_server_info(port) and adjust test comments/fixtures for per-port startup.
Tests — new port-focused coverage
tests/test_cli_port_features.py, tests/test_server_manager.py
Add extensive CLI port tests (status, shutdown, explicit/auto port behavior, help) and unit/integration tests for server_manager (per-port PID lifecycle, read/write/remove, stale-PID cleanup, DEFAULT_PORT semantics, discovery logic).
Tests — job/run behavior & timeouts
tests/test_job_errors.py, tests/test_vis_figures.py, tests/test_workers.py, tests/test_dual_namespace_extensions.py
Update flows to use Job objects (job.wait(timeout=30)); remove some auth header usage; add short registration sleeps after extension registration; increase various wait timeouts from 5s to 30s.
Misc / Docs / Policies
AGENTS.md, various tests
Minor textual policy addition and small test signature/teardown updates reflecting API signature changes.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant ServerMgr as Server Manager
    participant FS as Filesystem\n(~/.zndraw)
    participant Process
    participant Server

    note over CLI,ServerMgr: Auto-discovery (port-aware)
    CLI->>ServerMgr: find_running_server(port=None)
    alt DEFAULT_PORT file exists (server-5000.pid)
        ServerMgr->>FS: read server-5000.pid
        FS-->>ServerMgr: ServerInfo or None
        alt ServerInfo found
            ServerMgr->>Process: is PID alive?
            Process-->>ServerMgr: alive / dead
            alt alive
                ServerMgr-->>CLI: ServerInfo (port, pid, url)
            else dead
                ServerMgr->>FS: remove server-5000.pid
                FS-->>ServerMgr: removed
                ServerMgr->>FS: list_all_pid_files()
                FS-->>ServerMgr: [server-<port>.pid...]
            end
        else no PID
            ServerMgr->>FS: list_all_pid_files()
            FS-->>ServerMgr: [server-<port>.pid...]
            ServerMgr->>Process: check smallest port PID
        end
    else no DEFAULT_PORT file
        ServerMgr->>FS: list_all_pid_files()
        FS-->>ServerMgr: [server-<port>.pid...]
        ServerMgr->>Process: check smallest port PID
    end
    CLI->>Server: connect to server at server_info.port
Loading
sequenceDiagram
    participant CLI
    participant ServerMgr as Server Manager
    participant Server
    participant FS as Filesystem\n(~/.zndraw)
    participant Process

    note over CLI,ServerMgr: Shutdown flow (port-aware)
    CLI->>ServerMgr: find_running_server(port=specified)
    alt ServerInfo found
        ServerMgr-->>CLI: ServerInfo
        CLI->>ServerMgr: shutdown_server(server_info)
        ServerMgr->>Server: POST /shutdown (include token if present)
        Server-->>ServerMgr: ack / no ack
        ServerMgr->>Process: wait for process exit
        Process-->>ServerMgr: exited / timeout
        alt exited
            ServerMgr->>FS: remove server-{port}.pid
            FS-->>ServerMgr: removed
            ServerMgr-->>CLI: success
        else timeout/failure
            ServerMgr-->>CLI: failure
        end
    else no server
        ServerMgr-->>CLI: no server found
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I hopped through ports with a curious stare,

PID files snug in ~/.zndraw's care,
No more force-new, just pick a port,
Auto-find servers, report by report,
A tiny carrot cheer for the cleaner affair!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat/port based server isolation' clearly and specifically describes the main change: implementing port-based server isolation for ZnDraw, which is the core feature added across server_manager.py, cli.py, and related modules.
Docstring Coverage ✅ Passed Docstring coverage is 90.28% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@PythonFZ PythonFZ changed the base branch from main to feat/auth-refactoring January 7, 2026 22:19
@PythonFZ PythonFZ requested a review from Copilot January 7, 2026 22:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements port-based server isolation by replacing a single shared PID file with port-specific PID files, enabling multiple ZnDraw servers to run simultaneously on different ports. The --force-new-server flag is removed as unique ports now naturally ensure new server instances.

  • Port-specific PID files replace single shared PID file (e.g., server-5000.pid instead of server.pid)
  • Server discovery logic updated with find_running_server() that checks default port first, then scans all PID files
  • Removed --force-new-server flag and updated --port flag behavior for better server management

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/zndraw/server_manager.py Core refactoring to support port-specific PID files; adds find_running_server() for auto-discovery, updates all functions to require port parameter
src/zndraw/cli.py Removes --force-new-server flag, updates server discovery logic to use find_running_server(), improves help text for --port, --status, and --shutdown
src/zndraw/zndraw.py Updates auto-discovery to use new find_running_server() function
src/zndraw-mcp/init.py Updates MCP server integration to use find_running_server() instead of get_server_status()
tests/conftest.py Updates test fixtures to call remove_server_info(port) with port parameter
tests/test_zndraw_admin.py Updates test fixtures to remove --force-new-server flag and call remove_server_info(port)
tests/test_auto_discovery.py Updates tests to mock find_running_server() instead of get_server_status()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/zndraw-mcp/__init__.py Outdated
Comment on lines +67 to +71
server_info = find_running_server()
if server_info is None:
return (
False,
f"Cannot upload: ZnDraw server is not running. Status: {status_message}",
"Cannot upload: ZnDraw server is not running.",
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message "Cannot upload: ZnDraw server is not running. Status: {status_message}" was changed to "Cannot upload: ZnDraw server is not running." While this is more concise, the removal of the status_message information reduces debuggability. If the server status check failed for a specific reason (e.g., server exists but not responsive), that context is lost. Consider retaining useful diagnostic information in error messages.

Copilot uses AI. Check for mistakes.
file_port = int(port_str)
except (IndexError, ValueError):
continue

Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_running_server function may check the default port twice when it's the only server running. On line 295, it checks get_server_status(DEFAULT_PORT). Then in the loop starting at line 301, if a PID file for the default port exists, it will call get_server_status again for that same port on line 308. This results in redundant process checks and health endpoint requests. Consider tracking which ports have already been checked or skipping the default port in the loop iteration.

Suggested change
# Skip default port here; it was already checked explicitly above
if file_port == DEFAULT_PORT:
continue

Copilot uses AI. Check for mistakes.
Comment thread src/zndraw-mcp/__init__.py Outdated
def get_current_state() -> tuple[bool, ServerInfo | None, str]:
def get_current_state() -> tuple[bool, ServerInfo | None]:
"""
Returns the current status of the ZnDraw server.
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature changed from returning a tuple of three values (bool, ServerInfo | None, str) to a tuple of two values (bool, ServerInfo | None). However, the function's docstring still says "Returns the current status of the ZnDraw server" without documenting the updated return type. The return type annotation should be added and the docstring should clarify what the boolean represents (whether a server is running).

Suggested change
Returns the current status of the ZnDraw server.
Get the current status of the ZnDraw server.
Returns:
tuple[bool, ServerInfo | None]: A pair ``(is_running, server_info)`` where
- ``is_running`` is ``True`` if a ZnDraw server is currently running,
otherwise ``False``.
- ``server_info`` is the corresponding :class:`ServerInfo` if a server
is running, otherwise ``None``.

Copilot uses AI. Check for mistakes.
Comment thread src/zndraw/server_manager.py Outdated
timeout=5.0,
)
return response.status_code == 200
except (requests.RequestException, requests.ConnectionError):
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching both requests.RequestException and requests.ConnectionError is redundant because ConnectionError is a subclass of RequestException. The except clause should only catch requests.RequestException to avoid confusion.

Suggested change
except (requests.RequestException, requests.ConnectionError):
except requests.RequestException:

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 86.71875% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.42%. Comparing base (1165a1a) to head (bb95004).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/zndraw/cli.py 25.49% 38 Missing ⚠️
src/zndraw/server_manager.py 75.00% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #832      +/-   ##
==========================================
+ Coverage   78.88%   79.42%   +0.53%     
==========================================
  Files         154      156       +2     
  Lines       18818    19075     +257     
==========================================
+ Hits        14845    15150     +305     
+ Misses       3973     3925      -48     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Base automatically changed from feat/auth-refactoring to main January 8, 2026 09:54
PythonFZ and others added 4 commits January 8, 2026 10:54
- Fix redundant exception handling: remove ConnectionError (subclass of
  RequestException)
- Skip default port in find_running_server loop to avoid double-checking
- Update docstring for get_current_state with proper return type docs
- Improve error messages with actionable guidance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add unit tests for server_manager.py (22 tests):
  - PID file path generation (port-specific paths)
  - Read/write/remove server info operations
  - list_all_pid_files discovery
  - find_running_server auto-discovery logic
  - get_server_status with stale file cleanup

- Add CLI tests for --status and --shutdown flags (14 tests):
  - Mock-based tests for CLI behavior
  - Port-specific vs auto-discovery behavior
  - Integration tests with real server fixture

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@PythonFZ
Copy link
Copy Markdown
Member Author

PythonFZ commented Jan 8, 2026

@coderabbitai help

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 8, 2026

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
    • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
    • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit configuration file (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@PythonFZ
Copy link
Copy Markdown
Member Author

PythonFZ commented Jan 8, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@PythonFZ
Copy link
Copy Markdown
Member Author

PythonFZ commented Jan 8, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 8, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @tests/test_cli_port_features.py:
- Around line 167-174: The help output is missing the --port option; update the
CLI command definition (the function registered with @app.command() in your CLI
module, e.g., the function that builds the main command where app is defined) to
include a port parameter declared like port: int = typer.Option(..., help="...")
and provide concise help text mentioning auto-discovery behavior; ensure the
parameter name is exactly port so Typer generates the --port flag and that the
help string explains the auto-discovery behavior so the test asserting "--port"
in help passes.
🧹 Nitpick comments (1)
src/zndraw/cli.py (1)

437-447: Consider adding strict=True to zip() for explicit length checking.

While the current code is correct (since room_names is derived from path with a 1:1 mapping), adding strict=True would make the length invariant explicit and catch potential bugs if the implementation changes in the future.

🛡️ Defensive improvement
-            for p, room_name in zip(path, room_names):
+            for p, room_name in zip(path, room_names, strict=True):

Apply the same change to other zip() calls in the file (lines 370, 533).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1165a1a and 85b812a.

📒 Files selected for processing (9)
  • src/zndraw-mcp/__init__.py
  • src/zndraw/cli.py
  • src/zndraw/server_manager.py
  • src/zndraw/zndraw.py
  • tests/conftest.py
  • tests/test_auto_discovery.py
  • tests/test_cli_port_features.py
  • tests/test_server_manager.py
  • tests/test_zndraw_admin.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: If sensible, implement collections.abc interfaces for classes, such as MutableMapping or MutableSequence
Use numpy style docstrings
Docstrings must be concise and to the point
Use type hints wherever possible. Import typing as t if necessary, but use list[int|float] | None instead of t.Optional[t.List[int|float]]
Imports should always be at the top of the file

Files:

  • tests/conftest.py
  • src/zndraw-mcp/__init__.py
  • tests/test_auto_discovery.py
  • tests/test_zndraw_admin.py
  • src/zndraw/zndraw.py
  • tests/test_cli_port_features.py
  • src/zndraw/server_manager.py
  • tests/test_server_manager.py
  • src/zndraw/cli.py
**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/test_*.py: Use pytest.mark.parametrize to avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class

Files:

  • tests/test_auto_discovery.py
  • tests/test_zndraw_admin.py
  • tests/test_cli_port_features.py
  • tests/test_server_manager.py
🧬 Code graph analysis (8)
tests/conftest.py (1)
src/zndraw/server_manager.py (1)
  • remove_server_info (120-131)
src/zndraw-mcp/__init__.py (1)
src/zndraw/server_manager.py (2)
  • ServerInfo (18-46)
  • find_running_server (273-320)
tests/test_auto_discovery.py (2)
src/zndraw/zndraw.py (1)
  • ZnDraw (383-1836)
tests/conftest.py (1)
  • server (300-306)
tests/test_zndraw_admin.py (1)
src/zndraw/server_manager.py (1)
  • remove_server_info (120-131)
src/zndraw/zndraw.py (1)
src/zndraw/server_manager.py (1)
  • find_running_server (273-320)
tests/test_cli_port_features.py (2)
tests/conftest.py (2)
  • app (179-189)
  • get_free_port (155-157)
src/zndraw/server_manager.py (1)
  • ServerInfo (18-46)
tests/test_server_manager.py (1)
src/zndraw/server_manager.py (9)
  • ServerInfo (18-46)
  • find_running_server (273-320)
  • get_pid_file_path (63-77)
  • get_server_status (236-270)
  • get_zndraw_dir (49-60)
  • list_all_pid_files (134-144)
  • read_server_info (80-103)
  • remove_server_info (120-131)
  • write_server_info (106-117)
src/zndraw/cli.py (1)
src/zndraw/server_manager.py (3)
  • ServerInfo (18-46)
  • find_running_server (273-320)
  • remove_server_info (120-131)
🪛 GitHub Actions: tests
tests/test_cli_port_features.py

[error] 173-173: test_port_help_text failed: the help output does not contain the string "--port" as expected.

🪛 Ruff (0.14.10)
tests/test_cli_port_features.py

193-193: Unused function argument: server

(ARG001)

tests/test_server_manager.py

39-39: Unused function argument: mock_zndraw_dir

(ARG001)


51-51: Unused function argument: mock_zndraw_dir

(ARG001)


53-53: Possible hardcoded password assigned to argument: "shutdown_token"

(S106)


61-61: Possible hardcoded password assigned to: "shutdown_token"

(S105)


64-64: Unused function argument: mock_zndraw_dir

(ARG001)


100-100: Unused function argument: mock_zndraw_dir

(ARG001)


109-109: Unused function argument: mock_zndraw_dir

(ARG001)


115-115: Unused function argument: mock_zndraw_dir

(ARG001)


149-149: Unused function argument: mock_zndraw_dir

(ARG001)


155-155: Unused function argument: mock_zndraw_dir

(ARG001)


170-170: Unused function argument: mock_zndraw_dir

(ARG001)


187-187: Unused function argument: mock_zndraw_dir

(ARG001)


194-194: Unused function argument: pid

(ARG001)


198-198: Unused function argument: port

(ARG001)


217-217: Unused function argument: mock_zndraw_dir

(ARG001)


243-243: Unused function argument: mock_zndraw_dir

(ARG001)


269-269: Unused function argument: mock_zndraw_dir

(ARG001)


286-286: Unused function argument: mock_zndraw_dir

(ARG001)

src/zndraw/cli.py

437-437: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🔇 Additional comments (36)
src/zndraw/zndraw.py (1)

36-36: LGTM! Auto-discovery updated to use port-aware API.

The changes correctly replace get_server_status with find_running_server() and construct the URL from server_info.port. The error message properly guides users when no server is found.

Also applies to: 453-464

tests/conftest.py (1)

246-246: LGTM! Teardown updated for port-based cleanup.

The comment accurately reflects that unique ports ensure fresh servers without requiring --force-new-server. The teardown correctly passes the port to remove_server_info(port), aligning with the per-port PID management introduced in this PR.

Also applies to: 296-296

tests/test_zndraw_admin.py (1)

36-36: LGTM! Admin test fixtures updated for port-based management.

Comments and teardown logic properly updated to use remove_server_info(port), consistent with the per-port cleanup pattern introduced across the test suite.

Also applies to: 72-72, 93-93, 129-129

tests/test_auto_discovery.py (2)

22-29: LGTM! Test correctly verifies explicit URL bypasses discovery.

The patch target and assertions are properly updated to use find_running_server instead of the deprecated get_server_status.


34-49: LGTM! Tests properly mock the new discovery API.

Both no-server and stale-server scenarios correctly mock find_running_server returning None, which aligns with the new port-aware discovery behavior where stale PID files are automatically cleaned up.

tests/test_cli_port_features.py (3)

19-76: LGTM! Status tests are comprehensive and well-structured.

The tests properly cover all scenarios: no server, running server, port-specific checks, and auto-discovery behavior. Mocking is correctly applied and assertions verify both exit codes and output messages.


83-160: LGTM! Shutdown tests comprehensively cover all scenarios.

The tests properly verify shutdown behavior across multiple scenarios including success, failure, port-specific operations, and auto-discovery. Mock assertions ensure the correct functions are called with expected arguments.


181-201: LGTM! Integration tests verify real server behavior.

The tests properly verify status checks against a real running server, including checking both the correct port and an incorrect port.

Note: The static analysis warning about unused server argument on line 193 is a false positive. The server fixture is intentionally used to ensure a server is running on one port while the test verifies that checking a different port correctly reports no server. This validates that port-specific checks work as intended.

src/zndraw-mcp/__init__.py (3)

53-64: LGTM! Clean migration to port-aware discovery.

The function correctly uses find_running_server() and derives the running state from the returned ServerInfo. The simplified return signature (removing the status string) aligns with the PR's goal of using port-aware discovery.


71-84: LGTM! Error handling is clear and concise.

The migration to find_running_server() is correct, and the error message clearly communicates the issue to the user.


88-149: LGTM! Consistent with the port-aware API migration.

The function correctly checks for a running server using find_running_server() and provides a clear error message when none is found.

tests/test_server_manager.py (7)

23-27: LGTM! Proper use of pytest fixtures for isolation.

The fixture correctly uses monkeypatch to override the directory path, ensuring tests are isolated and don't affect the user's actual ~/.zndraw directory.


33-45: LGTM! Clear and focused tests.

These tests properly validate that different ports produce different PID file paths, which is critical for the port-based isolation feature.


51-103: LGTM! Comprehensive coverage of file operations.

The tests thoroughly cover the happy path, error cases, and edge cases for server info persistence. The error handling tests ensure robustness against invalid data.


109-143: LGTM! Tests validate sorting and filtering.

The tests correctly verify that the function returns sorted results and properly filters non-PID files, which is important for reliable server discovery.


149-237: LGTM! Thorough testing of discovery logic.

These tests comprehensively validate the server discovery behavior, including the priority order (default port first, then smallest non-default port) and automatic cleanup of stale PID files. The use of mocking is appropriate for isolating the discovery logic.


243-299: LGTM! Comprehensive status checking tests.

The tests properly validate different server states, including the important distinction between a process running but the server being unresponsive. The automatic cleanup of stale PID files is correctly tested.


305-307: LGTM! Documents the expected default port.

While simple, this test documents the default port value and would catch unintentional changes.

src/zndraw/server_manager.py (9)

14-14: LGTM! Well-chosen default port.

Port 5000 is a common default for development servers and is well-documented across the codebase.


49-60: LGTM! Standard config directory pattern.

The function correctly creates the directory if needed and uses exist_ok=True for idempotency.


63-77: LGTM! Port-specific file naming ensures isolation.

The server-{port}.pid naming scheme allows multiple servers to run simultaneously without conflicts.


80-103: LGTM! Robust error handling.

The function correctly handles all potential JSON parsing and deserialization errors, including TypeError for structural mismatches.


106-131: LGTM! Consistent port-aware file operations.

Both functions correctly use the port to identify the target file, maintaining consistency across the API.


134-144: LGTM! Sorted results enable deterministic discovery.

The sorted return ensures consistent behavior when multiple servers are running, which is critical for the "smallest port" fallback logic in find_running_server().


236-270: LGTM! Port-aware status checking with automatic cleanup.

The function correctly reports port-specific status and automatically cleans up stale PID files, which is important for maintaining accurate state.


273-320: LGTM! Well-designed discovery algorithm.

The function implements a sensible discovery strategy:

  1. If a specific port is requested, check only that port
  2. Otherwise, check the default port first (most common case)
  3. Fall back to the smallest non-default port

The port extraction from filenames (line 303) is correct and protected by error handling. Automatic cleanup of stale PID files during discovery helps maintain system hygiene.


323-359: LGTM! Graceful shutdown with cleanup.

The function correctly sends a shutdown request with optional token authentication, polls for process exit, and cleans up the PID file on success. The timeout logic (5 seconds) is reasonable, and not cleaning up the PID file on timeout is the right choice since the process may still be running.

src/zndraw/cli.py (9)

17-24: LGTM! Clean API migration.

The CLI now uses the simplified find_running_server() API instead of lower-level server status functions, which makes the code cleaner and more maintainable.


174-179: LGTM! Improved user documentation.

The expanded help text clearly explains the port-driven behavior and auto-discovery logic, which will help users understand how to use the CLI effectively.


223-232: LGTM! Consistent help documentation.

The help text for status and shutdown options correctly references the port parameter and auto-discovery behavior, maintaining consistency with the --port option documentation.


259-270: LGTM! Clear documentation of CLI behavior.

The docstring accurately describes the port-aware discovery behavior and provides helpful guidance for users.


297-312: LGTM! Clear status reporting.

The status command correctly uses find_running_server(port) and provides appropriate messages based on whether a specific port was requested or auto-discovery was used.


315-333: LGTM! Consistent shutdown flow.

The shutdown command correctly discovers the server and provides appropriate feedback, with port-aware messaging that matches the status command.


395-459: LGTM! Consistent port-aware connection logic.

The server discovery and connection flow correctly uses find_running_server() and consistently constructs URLs using server_info.port throughout. The version compatibility warning is a nice touch.


469-527: LGTM! Clear startup flow with port awareness.

The new server startup logic correctly determines the effective port (explicit or default), creates the server info, and provides informative logging with PID and port details.


556-572: LGTM! Proper cleanup on shutdown.

The cleanup correctly calls remove_server_info(config.server_port) with the port parameter, ensuring the correct PID file is removed when the server stops.

Comment thread tests/test_cli_port_features.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/zndraw/cli.py (1)

438-438: Consider adding strict=True to zip() for explicitness.

While the code is safe (since room_names = get_room_names(path) always returns a list of the same length as path), adding strict=True would make the invariant explicit and catch any future refactoring mistakes.

♻️ Proposed fix
-            for p, room_name in zip(path, room_names):
+            for p, room_name in zip(path, room_names, strict=True):

Note: Apply the same change to lines 371, 438, and 534 for consistency.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85b812a and d98b6ce.

📒 Files selected for processing (3)
  • AGENTS.md
  • src/zndraw/cli.py
  • src/zndraw/server_manager.py
✅ Files skipped from review due to trivial changes (1)
  • AGENTS.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: If sensible, implement collections.abc interfaces for classes, such as MutableMapping or MutableSequence
Use numpy style docstrings
Docstrings must be concise and to the point
Use type hints wherever possible. Import typing as t if necessary, but use list[int|float] | None instead of t.Optional[t.List[int|float]]
Imports should always be at the top of the file

Files:

  • src/zndraw/cli.py
  • src/zndraw/server_manager.py
🧬 Code graph analysis (1)
src/zndraw/cli.py (1)
src/zndraw/server_manager.py (5)
  • ServerInfo (18-46)
  • find_running_server (273-320)
  • shutdown_server (323-359)
  • write_server_info (106-117)
  • remove_server_info (120-131)
🪛 Ruff (0.14.10)
src/zndraw/cli.py

438-438: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pytest (3.11, ubuntu-latest)
  • GitHub Check: pytest (3.13, ubuntu-latest)
  • GitHub Check: pytest (3.12, ubuntu-latest)
🔇 Additional comments (14)
src/zndraw/cli.py (7)

18-24: LGTM! Imports align with new port-aware API.

The import changes correctly bring in DEFAULT_PORT and find_running_server from the updated server_manager module, supporting the port-based server isolation feature.


174-180: LGTM! Clear and comprehensive port option help text.

The help text effectively explains the three behaviors:

  1. Connect to existing server on specified port
  2. Start new server on specified port if none exists
  3. Auto-discover when port is not specified (default port first, then smallest)

298-313: LGTM! Status command correctly uses port-aware discovery.

The status command properly handles both explicit port and auto-discovery cases, with clear messaging that distinguishes between port-specific and general discovery failures.


316-334: LGTM! Shutdown command correctly uses port-aware discovery.

The shutdown flow properly discovers the target server (port-specific or auto-discovery) and includes port information in all user-facing messages.


396-461: LGTM! Server discovery and connection logic is correct.

The flow properly:

  • Uses find_running_server(port) for discovery
  • Checks version compatibility and warns on mismatch
  • Handles file uploads to existing server with port-specific URLs
  • Opens browser at correct port-based URLs

The version compatibility warning is helpful for preventing subtle bugs when CLI and server versions diverge.


470-528: LGTM! Server startup logic correctly uses effective port.

The startup flow properly:

  • Determines effective port (explicit or DEFAULT_PORT)
  • Passes effective port to config builder
  • Creates and writes ServerInfo with correct port
  • Includes port in user-facing messages

572-572: LGTM! Cleanup correctly uses port-aware removal.

The cleanup in the finally block properly removes the port-specific PID file using remove_server_info(config.server_port).

src/zndraw/server_manager.py (7)

14-14: LGTM! DEFAULT_PORT constant enables consistent default behavior.

Setting the default port to 5000 provides a well-known starting point for auto-discovery and is consistent with common web server conventions.


49-78: LGTM! Filesystem helpers correctly implement port-specific paths.

The helpers properly:

  • Create ~/.zndraw directory on demand
  • Generate port-specific PID file paths (server-{port}.pid)
  • Use consistent naming convention

80-104: LGTM! Defensive error handling improved.

Adding TypeError to the exception handling at line 102 is good defensive coding that handles edge cases where the JSON structure might be valid but the data types are incorrect.


106-145: LGTM! Port-aware CRUD operations are correct.

The write, remove, and list operations properly handle port-specific PID files:

  • write_server_info derives port from server_info.port
  • remove_server_info takes port parameter
  • list_all_pid_files returns sorted list for deterministic scanning

236-271: LGTM! Port-specific status check includes cleanup.

The status function properly:

  • Checks port-specific PID file
  • Cleans up stale PID files when process is not running
  • Validates server responsiveness
  • Includes port information in status messages

273-321: LGTM! Auto-discovery logic is well-designed.

The discovery function implements the correct priority:

  1. If port specified, check only that port
  2. If None, check DEFAULT_PORT first (fastest path for typical case)
  3. Scan all other PID files (excluding DEFAULT_PORT to avoid duplicate check)
  4. Return server with smallest port number (deterministic selection)

The port parsing at lines 303-306 includes proper error handling for malformed filenames.


323-359: LGTM! Shutdown flow includes proper cleanup and polling.

The shutdown function properly:

  • Checks process existence before attempting shutdown
  • Includes shutdown token in request headers when available
  • Polls for process exit (25 attempts × 0.2s = 5s total)
  • Cleans up port-specific PID file on successful shutdown
  • Handles request exceptions gracefully

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @tests/test_cli_port_features.py:
- Around line 196-204: The test function test_status_wrong_port_with_real_server
has an unused fixture parameter server; remove server from the function
signature so it only accepts get_free_port, e.g., change def
test_status_wrong_port_with_real_server(server, get_free_port): to def
test_status_wrong_port_with_real_server(get_free_port): and run tests to ensure
no other references rely on server in that test.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d98b6ce and 70d590d.

📒 Files selected for processing (4)
  • tests/test_cli_port_features.py
  • tests/test_job_errors.py
  • tests/test_vis_figures.py
  • tests/test_workers.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: If sensible, implement collections.abc interfaces for classes, such as MutableMapping or MutableSequence
Use numpy style docstrings
Docstrings must be concise and to the point
Use type hints wherever possible. Import typing as t if necessary, but use list[int|float] | None instead of t.Optional[t.List[int|float]]
Imports should always be at the top of the file

Files:

  • tests/test_job_errors.py
  • tests/test_workers.py
  • tests/test_vis_figures.py
  • tests/test_cli_port_features.py
**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/test_*.py: Use pytest.mark.parametrize to avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class

Files:

  • tests/test_job_errors.py
  • tests/test_workers.py
  • tests/test_vis_figures.py
  • tests/test_cli_port_features.py
🧬 Code graph analysis (3)
tests/test_job_errors.py (2)
src/zndraw/zndraw.py (2)
  • run (1644-1689)
  • wait (917-928)
src/zndraw/job.py (2)
  • wait (91-119)
  • status (55-65)
tests/test_workers.py (2)
src/zndraw/zndraw.py (1)
  • wait (917-928)
src/zndraw/job.py (1)
  • wait (91-119)
tests/test_cli_port_features.py (2)
tests/conftest.py (2)
  • app (179-189)
  • get_free_port (155-157)
src/zndraw/server_manager.py (1)
  • ServerInfo (18-46)
🪛 Ruff (0.14.10)
tests/test_cli_port_features.py

196-196: Unused function argument: server

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pytest (3.12, ubuntu-latest)
  • GitHub Check: pytest (3.11, ubuntu-latest)
  • GitHub Check: pytest (3.13, ubuntu-latest)
🔇 Additional comments (9)
tests/test_workers.py (2)

873-874: Timeout increase addresses CI flakiness.

The timeout increase from 5 to 15 seconds aligns with the failing test mentioned in the PR comments. This is reasonable for concurrent job execution with multiple workers in CI environments.


885-887: LGTM - consistent timeout adjustment.

The 15-second timeout is applied consistently across all three jobs in the concurrent execution scenario.

tests/test_job_errors.py (2)

21-21: Test refactored to use client API.

The removal of get_jwt_auth_headers parameter and switch to using the client API (instead of direct HTTP calls) simplifies the test. This aligns with the Job.wait() pattern used throughout the test suite.


30-40: Improved test reliability with Job.wait().

The refactored test flow using client.run() and job.wait(timeout=10) addresses the flaky test behavior mentioned in the PR comments where jobs remained in "assigned" status. The in-memory status check (job.status == "completed") is more robust than HTTP polling.

tests/test_vis_figures.py (1)

319-322: More robust wait pattern addresses test failure.

Replacing the fixed 4-second sleep with job.wait(timeout=30) addresses the failing test mentioned in PR comments where "Properties2D" was not present in vis.figures. The new approach waits for actual job completion rather than assuming a fixed duration.

tests/test_cli_port_features.py (4)

1-12: Well-structured test file for port-based CLI features.

The test file provides comprehensive coverage of the new port-aware CLI behavior introduced in this PR, including status checks, shutdown operations, and help text verification.


19-76: Thorough coverage of --status flag scenarios.

The tests cover all critical scenarios: no server, server running, explicit port, and auto-discovery. The use of mocking for find_running_server enables isolated unit testing of the CLI logic.


83-160: Comprehensive --shutdown flag testing.

The shutdown tests mirror the structure of status tests, covering both success and failure scenarios with proper exit code verification.


167-177: CI-compatible help text verification.

The TERM=dumb and COLUMNS=200 environment variables ensure consistent Rich output in CI environments where TTY may not be available. This addresses the test failure mentioned in PR comments.

Comment on lines +196 to +204
def test_status_wrong_port_with_real_server(server, get_free_port):
"""--status --port with wrong port shows no server."""
# Get a different port that's not running
other_port = get_free_port()

result = runner.invoke(app, ["--status", "--port", str(other_port)])

assert result.exit_code == 1
assert f"No ZnDraw server running on port {other_port}" in result.output
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused server fixture parameter.

The server parameter is unused in this test function. The test only needs get_free_port to obtain an alternative port for verification.

🔧 Proposed fix
-def test_status_wrong_port_with_real_server(server, get_free_port):
+def test_status_wrong_port_with_real_server(get_free_port):
     """--status --port with wrong port shows no server."""
     # Get a different port that's not running
     other_port = get_free_port()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_status_wrong_port_with_real_server(server, get_free_port):
"""--status --port with wrong port shows no server."""
# Get a different port that's not running
other_port = get_free_port()
result = runner.invoke(app, ["--status", "--port", str(other_port)])
assert result.exit_code == 1
assert f"No ZnDraw server running on port {other_port}" in result.output
def test_status_wrong_port_with_real_server(get_free_port):
"""--status --port with wrong port shows no server."""
# Get a different port that's not running
other_port = get_free_port()
result = runner.invoke(app, ["--status", "--port", str(other_port)])
assert result.exit_code == 1
assert f"No ZnDraw server running on port {other_port}" in result.output
🧰 Tools
🪛 Ruff (0.14.10)

196-196: Unused function argument: server

(ARG001)

🤖 Prompt for AI Agents
In @tests/test_cli_port_features.py around lines 196 - 204, The test function
test_status_wrong_port_with_real_server has an unused fixture parameter server;
remove server from the function signature so it only accepts get_free_port,
e.g., change def test_status_wrong_port_with_real_server(server, get_free_port):
to def test_status_wrong_port_with_real_server(get_free_port): and run tests to
ensure no other references rely on server in that test.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/test_workers.py (2)

863-893: Same timeout issue as test_submit_task_twice_via_vis_run.

This test exhibits the same pattern: increased timeouts from 5s to 15s and added registration sleep. Given the failure in the previous test with similar logic, this test may also be at risk of flakiness.

The underlying job pickup issue needs to be resolved rather than relying on longer timeouts. Based on coding guidelines, the tests should verify specific behavior quickly and reliably.


895-910: Same timeout pattern - part of systemic issue.

This test follows the same pattern of increased timeouts and added sleeps. As with the other tests, the root cause of job pickup failure should be investigated rather than masking with longer timeouts.

🤖 Fix all issues with AI agents
In @tests/test_workers.py:
- Line 840: Replace the fixed vis1.socket.sio.sleep(0.5) pauses with a polling
helper that queries the room extension workers endpoint until the test client
appears as an idle worker; create a utility (e.g., wait_for_registration(vis,
extension_name, category, timeout=5.0)) that repeatedly GETs
f"{vis.url}/api/rooms/{vis.room}/extensions/{category}/{extension_name}/workers",
checks response.json() for vis.sid in "idleWorkers", sleeps briefly between
polls, and raises on timeout, then call wait_for_registration(vis1,
<extension_name>, <category>) in place of the hard sleep (apply same change for
occurrences at the other lines mentioned).
🧹 Nitpick comments (3)
tests/test_dual_namespace_extensions.py (1)

177-177: Consider replacing arbitrary sleep with explicit synchronization.

The 0.5-second sleep after registration suggests that extension registration is asynchronous without a proper completion signal. This can lead to flaky tests if the duration is insufficient under varying conditions.

Consider implementing one of the following:

  • Add a synchronous confirmation endpoint or mechanism to verify registration completion
  • Poll the schema endpoint until the extension appears (with timeout)
  • Make register_extension block until registration is confirmed
💡 Example: Poll for registration completion
# Instead of arbitrary sleep, poll until registration is confirmed
vis.register_extension(DualExtension, public=False)
vis.register_extension(DualExtension, public=True)

# Poll for registration completion
import time
timeout = 5
start = time.time()
while time.time() - start < timeout:
    response = requests.get(f"{server}/api/rooms/test_room/schema/modifiers")
    if response.status_code == 200:
        schemas = response.json()
        extension_names = [ext.get("name") for ext in schemas]
        if extension_names.count("DualExtension") >= 2:  # Both registrations complete
            break
    vis.socket.sio.sleep(0.1)
else:
    raise TimeoutError("Extension registration did not complete in time")
tests/test_vis_figures.py (1)

319-322: LGTM! Consider adding timeout error handling.

The switch to job.wait(timeout=30) properly leverages the new asynchronous Job API. The 30-second timeout is appropriate for this integration test.

However, consider wrapping the wait in a try-except to provide clearer test failure messages if the timeout is reached.

♻️ Optional: Add explicit timeout handling
     job = vis.run(analysis)
 
-    # Wait for celery worker to process the job using Job.wait()
-    job.wait(timeout=30)
+    # Wait for celery worker to process the job using Job.wait()
+    try:
+        job.wait(timeout=30)
+    except TimeoutError as e:
+        pytest.fail(f"Job failed to complete in time. Status: {job.status}, Error: {e}")
tests/test_workers.py (1)

837-893: Optional: Consider splitting complex test scenarios.

While these tests follow pytest best practices with parametrization, they verify multiple scenarios in a single test (e.g., submitting once, twice, then checking both completions). Per coding guidelines, tests should be very specific and test only one thing.

Consider splitting into focused tests:

  • test_submit_single_job_via_vis_run
  • test_submit_failed_job_via_vis_run
  • test_submit_multiple_jobs_queuing

This improves:

  1. Test failure diagnosis (clearer which scenario failed)
  2. Test isolation and reliability
  3. Readability and maintainability
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e13256 and 162360e.

📒 Files selected for processing (4)
  • tests/test_dual_namespace_extensions.py
  • tests/test_job_errors.py
  • tests/test_vis_figures.py
  • tests/test_workers.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: If sensible, implement collections.abc interfaces for classes, such as MutableMapping or MutableSequence
Use numpy style docstrings
Docstrings must be concise and to the point
Use type hints wherever possible. Import typing as t if necessary, but use list[int|float] | None instead of t.Optional[t.List[int|float]]
Imports should always be at the top of the file

Files:

  • tests/test_job_errors.py
  • tests/test_vis_figures.py
  • tests/test_dual_namespace_extensions.py
  • tests/test_workers.py
**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/test_*.py: Use pytest.mark.parametrize to avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class

Files:

  • tests/test_job_errors.py
  • tests/test_vis_figures.py
  • tests/test_dual_namespace_extensions.py
  • tests/test_workers.py
🧬 Code graph analysis (4)
tests/test_job_errors.py (2)
src/zndraw/zndraw.py (2)
  • run (1644-1689)
  • wait (917-928)
src/zndraw/job.py (2)
  • wait (91-119)
  • status (55-65)
tests/test_vis_figures.py (1)
src/zndraw/zndraw.py (2)
  • run (1644-1689)
  • wait (917-928)
tests/test_dual_namespace_extensions.py (1)
src/zndraw/job.py (1)
  • wait (91-119)
tests/test_workers.py (2)
src/zndraw/job.py (2)
  • wait (91-119)
  • status (55-65)
app/src/types/jobs.ts (1)
  • JobStatus (4-9)
🪛 GitHub Actions: tests
tests/test_job_errors.py

[error] 36-36: TimeoutError: Job 0a8b98dd-8969-4ce6-aef3-e3786dd76b0a did not complete within 10 seconds

tests/test_workers.py

[error] 845-845: TimeoutError: Job fb2d6e1c-e0b6-4baf-a897-9373a2bc53d6 did not complete within 15 seconds

🔇 Additional comments (4)
tests/test_dual_namespace_extensions.py (1)

185-185: Investigate root cause of timeout increase.

The timeout was tripled from 5 to 15 seconds. Given that the PR introduces port-based server isolation and there are 2 other failing tests with similar timeout issues (noted in PR comments), this suggests potential performance regression or race conditions introduced by the changes.

Verify whether:

  • The port-based isolation introduces legitimate processing overhead that requires longer waits
  • There are race conditions in server discovery/startup that aren't properly synchronized
  • 15 seconds is sufficient or if tests could still be flaky under load

If the increased timeout is purely due to timing issues in the test environment rather than actual processing time, consider adding explicit synchronization or event-based waiting instead of relying on longer timeouts.

tests/test_vis_figures.py (1)

363-366: Same pattern as test_properties_2d_analysis.

The implementation correctly uses the new Job API. Same optional suggestion applies: consider adding explicit timeout error handling for clearer test diagnostics.

tests/test_job_errors.py (1)

21-42: Verify if test is actually failing in CI and provide error logs.

The review references a TimeoutError with a job stuck in "assigned" state, but provides no CI logs or error messages as evidence. Before attributing the failure to specific causes like insufficient registration time or auto-pickup issues, confirm:

  1. The actual CI failure output and error message
  2. Whether this test consistently fails or is intermittent
  3. The actual duration of the timeout (verify it's really ~10 seconds)
  4. Whether other tests with similar registration patterns also fail

Once you provide the CI logs and implementation context, the specific causes (registration timing, race conditions, or auto-pickup mechanism issues) can be properly diagnosed.

tests/test_workers.py (1)

840-860: The review comment is based on incorrect assumptions about the code changes. The test file is newly created rather than modified, and timeout values were set to 15 seconds from the start—not increased from 5s. Without evidence of actual CI failures or job assignment issues, the claims about systemic problems in the auto-pickup job flow cannot be substantiated. The existing implementation appears sound: the job assignment mechanism properly emits socket events, workers fetch job details via REST API, and status transitions are correctly validated.

Likely an incorrect or invalid review comment.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @tests/test_job_errors.py:
- Around line 28-29: The test has a race where
vis.register_extension(TestExtension) is followed by only
vis.socket.sio.sleep(0.5), so the worker may not be ready before the job is
submitted; increase the wait to a longer duration (e.g., 2-3 seconds) or replace
the fixed sleep with a polling readiness check that inspects the extension
registration state (e.g., loop until TestExtension appears in vis.extensions or
a vis.socket readiness flag is true) before submitting the job, ensuring
Job.wait() no longer times out.

In @tests/test_workers.py:
- Line 840: The test uses hardcoded vis1.socket.sio.sleep(0.5) to wait for
extension worker registration (and similar sleeps at other places) which is
flaky; implement a polling helper (e.g., wait_for_registration(vis,
extension_name, category, timeout=...)) that repeatedly GETs the
/api/rooms/{vis.room}/extensions/{category}/{extension_name}/workers endpoint
via vis.url until response.status_code == 200 and
response.json()["totalWorkers"] > 0 (or timeout), then replace calls to
vis1.socket.sio.sleep(...) after register_extension with calls to
wait_for_registration(vis1, "<extension_name>", "<category>") and apply the same
replacement for other sleeps (tests referencing vis, register_extension, and the
workers endpoint).
🧹 Nitpick comments (2)
tests/test_job_errors.py (2)

7-7: Remove unused import.

The requests module is no longer used after refactoring from HTTP POST to the client.run() pattern.

♻️ Remove the unused import
-import requests
-
 from zndraw import ZnDraw

34-42: Add proper cleanup on timeout.

If job.wait() raises a TimeoutError, the disconnect calls won't execute, potentially leaving connections open. Consider using a try/finally block or pytest fixtures to ensure cleanup happens even when assertions fail.

♻️ Add try/finally for reliable cleanup
 # Submit and wait for job using the Job.wait() pattern
-job = client.run(TestExtension(parameter=42))
-job.wait(timeout=30)
-
-# Check job completed
-assert job.status == "completed"
-
-client.disconnect()
-vis.disconnect()
+try:
+    job = client.run(TestExtension(parameter=42))
+    job.wait(timeout=30)
+    
+    # Check job completed
+    assert job.status == "completed"
+finally:
+    client.disconnect()
+    vis.disconnect()
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 162360e and bb95004.

📒 Files selected for processing (3)
  • tests/test_dual_namespace_extensions.py
  • tests/test_job_errors.py
  • tests/test_workers.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_dual_namespace_extensions.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: If sensible, implement collections.abc interfaces for classes, such as MutableMapping or MutableSequence
Use numpy style docstrings
Docstrings must be concise and to the point
Use type hints wherever possible. Import typing as t if necessary, but use list[int|float] | None instead of t.Optional[t.List[int|float]]
Imports should always be at the top of the file

Files:

  • tests/test_job_errors.py
  • tests/test_workers.py
**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/test_*.py: Use pytest.mark.parametrize to avoid code duplication in tests
Tests should be very specific and test only one thing
Avoid complex test setups
Each test must be a function, not a method of a class

Files:

  • tests/test_job_errors.py
  • tests/test_workers.py
🧬 Code graph analysis (2)
tests/test_job_errors.py (2)
src/zndraw/zndraw.py (2)
  • run (1644-1689)
  • wait (917-928)
src/zndraw/job.py (2)
  • wait (91-119)
  • status (55-65)
tests/test_workers.py (1)
src/zndraw/job.py (2)
  • wait (91-119)
  • status (55-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pytest (3.13, ubuntu-latest)
  • GitHub Check: pytest (3.11, ubuntu-latest)
  • GitHub Check: pytest (3.12, ubuntu-latest)
🔇 Additional comments (2)
tests/test_job_errors.py (1)

17-18: Verify that the empty run() method is intentional.

The run() method currently does nothing, which means the test only verifies that the job framework completes without checking whether the extension logic actually executes. If this is intentional for testing the framework itself, consider adding a comment to clarify. Otherwise, consider adding minimal work (e.g., setting a flag or modifying state) to verify execution.

tests/test_workers.py (1)

845-845: Investigate why timeout values were increased to 30 seconds instead of addressing root causes.

The timeout increases to 30s at lines 845, 849, 856, 858, 875-876, 887-889, and 907, 909 need investigation to determine if they represent a lasting solution or a temporary workaround. Without clear documentation of what failures prompted these changes, it's difficult to assess whether:

  • The underlying cause (job assignment delays, worker pickup issues, etc.) has been addressed
  • These timeouts will cause test suite slowdown or mask performance regressions
  • A more targeted fix to the job coordination mechanism would be more appropriate

Please clarify the specific test failures that necessitated increasing timeouts and confirm that the root cause has been identified and addressed.

Comment thread tests/test_job_errors.py
Comment on lines 28 to +29
vis.register_extension(TestExtension)
vis.socket.sio.sleep(0.5) # Give registration time to complete
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Address the race condition causing test failures.

This test is currently failing in CI with "job remained in assigned state; Job.wait() timed out after 10s" (per PR comments). The 0.5-second sleep after registration is likely insufficient to ensure the extension is fully registered and ready to pick up jobs. The job is being submitted before the worker is ready to process it.

Consider one of these solutions:

  1. Increase the sleep duration significantly (e.g., 2-3 seconds)
  2. Implement a polling check that waits for the extension to be confirmed as registered
  3. Add an explicit ready/health check endpoint to verify worker readiness
💡 Suggested fix: Increase sleep duration
 vis.register_extension(TestExtension)
-vis.socket.sio.sleep(0.5)  # Give registration time to complete
+vis.socket.sio.sleep(3.0)  # Give registration time to complete and worker to be ready

Note: This is a temporary fix. A more robust solution would be to implement an explicit readiness check.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
vis.register_extension(TestExtension)
vis.socket.sio.sleep(0.5) # Give registration time to complete
vis.register_extension(TestExtension)
vis.socket.sio.sleep(3.0) # Give registration time to complete and worker to be ready
🤖 Prompt for AI Agents
In @tests/test_job_errors.py around lines 28 - 29, The test has a race where
vis.register_extension(TestExtension) is followed by only
vis.socket.sio.sleep(0.5), so the worker may not be ready before the job is
submitted; increase the wait to a longer duration (e.g., 2-3 seconds) or replace
the fixed sleep with a polling readiness check that inspects the extension
registration state (e.g., loop until TestExtension appears in vis.extensions or
a vis.socket readiness flag is true) before submitting the job, ensuring
Job.wait() no longer times out.

Comment thread tests/test_workers.py
def test_submit_task_twice_via_vis_run(server, public):
vis1 = ZnDraw(url=server, room="testroom", user="testuser", auto_pickup_jobs=True)
vis1.register_extension(RaiseOnParameterExtension, public=public)
vis1.socket.sio.sleep(0.5) # Give registration time to complete
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace hardcoded sleeps with proper synchronization.

The hardcoded sleep(0.5) calls after extension registration are a test anti-pattern that can lead to flaky tests and unnecessarily slow test execution. If registration is now asynchronous due to port-based server isolation, the tests should poll for registration completion rather than rely on arbitrary delays.

Consider implementing a helper that polls the /api/rooms/{room}/extensions/{category}/{extension}/workers endpoint until the worker appears, or expose a registration completion event through the socket connection.

♻️ Suggested polling-based approach
def wait_for_registration(vis, extension_name, category, timeout=5.0):
    """Wait for extension to be registered with an available worker."""
    import time
    start = time.time()
    while time.time() - start < timeout:
        response = requests.get(
            f"{vis.url}/api/rooms/{vis.room}/extensions/{category}/{extension_name}/workers"
        )
        if response.status_code == 200 and response.json()["totalWorkers"] > 0:
            return
        time.sleep(0.1)
    raise TimeoutError(f"Extension {extension_name} did not register within {timeout}s")

Then replace the sleep calls:

 vis1.register_extension(RaiseOnParameterExtension, public=public)
-vis1.socket.sio.sleep(0.5)  # Give registration time to complete
+wait_for_registration(vis1, "RaiseOnParameterExtension", "modifiers")

Note: The registration sleep is applied inconsistently. Line 823 (test_submit_task_via_vis_run) also calls register_extension but has no sleep, suggesting these changes were made reactively rather than systematically.

Also applies to: 868-868, 900-900

🤖 Prompt for AI Agents
In @tests/test_workers.py at line 840, The test uses hardcoded
vis1.socket.sio.sleep(0.5) to wait for extension worker registration (and
similar sleeps at other places) which is flaky; implement a polling helper
(e.g., wait_for_registration(vis, extension_name, category, timeout=...)) that
repeatedly GETs the
/api/rooms/{vis.room}/extensions/{category}/{extension_name}/workers endpoint
via vis.url until response.status_code == 200 and
response.json()["totalWorkers"] > 0 (or timeout), then replace calls to
vis1.socket.sio.sleep(...) after register_extension with calls to
wait_for_registration(vis1, "<extension_name>", "<category>") and apply the same
replacement for other sleeps (tests referencing vis, register_extension, and the
workers endpoint).

@PythonFZ PythonFZ merged commit 3f80b07 into main Jan 8, 2026
6 checks passed
@PythonFZ PythonFZ deleted the feat/port-based-server-isolation branch January 8, 2026 16:00
@coderabbitai coderabbitai Bot mentioned this pull request Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants