Skip to content

fix(bot): 端口冲突预检 + /health 上报真实版本#1566

Merged
chenjw merged 1 commit intomainfrom
fix/bot-port-in-use-and-version
Apr 19, 2026
Merged

fix(bot): 端口冲突预检 + /health 上报真实版本#1566
chenjw merged 1 commit intomainfrom
fix/bot-port-in-use-and-version

Conversation

@ZaynJarvis
Copy link
Copy Markdown
Collaborator

@ZaynJarvis ZaynJarvis commented Apr 18, 2026

修两个排障时被坑的点

1. --with-bot 端口被占时不报错

bootstrap.py 启动 vikingbot 子进程后只 sleep(2) 看进程死没死。但 vikingbot 要 ~5s 才走到 uvicorn bind,2s 时还在 import,进程当然是活的 → openviking-server 以为成功了,HTTP 代理转发到 18790,那里旧的 vikingbot 还在响应。从外面看一切正常,实际升级根本没生效。

修复:fork 前用 socket.connect 试一下 18790,被占就直接 exit 1 + 打 lsof 提示。两层都加(bootstrap.pyvikingbot gateway)。

2. /health 永远返回 version: "0.1.3"

vikingbot/__init__.py 里写死的。改成从 importlib.metadata.version("openviking") 动态读。

测试

  • 端口空闲 → 静默通过
  • 端口被占 → 毫秒级 exit 1,错误信息带 lsof 提示
  • /health 正确反映已安装版本

🤖 Generated with Claude Code

- bootstrap.py / vikingbot gateway: abort with a clear message when
  18790 is already held, before fork / before uvicorn startup.
  Avoids the silent fail-open caused by the existing 2s subprocess
  poll firing before uvicorn's bind attempt (~5s).
- vikingbot/__init__.py: read __version__ from openviking package
  metadata instead of a hardcoded "0.1.3" that lied in /health.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 80
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Fix version reporting for vikingbot

Relevant files:

  • bot/vikingbot/init.py

Sub-PR theme: Add port conflict pre-check for vikingbot

Relevant files:

  • bot/vikingbot/cli/commands.py
  • openviking/server/bootstrap.py

⚡ Recommended focus areas for review

Port Check Limitation

The bot port pre-check uses a hardcoded default port (18790) instead of the actual configured bot port. This will miss conflicts if the bot is running on a non-default port.

VIKINGBOT_DEFAULT_PORT = 18790


def _abort_if_port_in_use(port: int, label: str) -> None:
    """Exit with a clear message if anything is already listening on ``port``.

    Without this, ``--with-bot`` would spawn a vikingbot subprocess that
    silently fails to bind while a stale process keeps serving traffic —
    the operator believes they upgraded but the old binary still answers.
    """
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.settimeout(0.5)
        try:
            s.connect(("127.0.0.1", port))
            in_use = True
        except (ConnectionRefusedError, socket.timeout, OSError):
            in_use = False
    if in_use:
        print(
            f"Error: {label} port {port} is already in use.\n"
            f"  A previous process is still bound — refusing to start a duplicate.\n"
            f"  Identify it:  lsof -nP -iTCP:{port} -sTCP:LISTEN\n"
            f"  Kill it, then retry.",
            file=sys.stderr,
        )
        sys.exit(1)
Code Duplication

The _abort_if_port_in_use function is duplicated in both commands.py and bootstrap.py. This could lead to inconsistent behavior if the function is updated in one place but not the other.

def _abort_if_port_in_use(port: int, label: str) -> None:
    """Exit with a clear message if anything is already listening on ``port``.

    Without this check, a stale process holding the port keeps serving
    traffic while a freshly-started gateway silently fails to bind — the
    operator believes they upgraded but the old (potentially unpatched)
    binary is still answering requests.
    """
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.settimeout(0.5)
        try:
            s.connect(("127.0.0.1", port))
            in_use = True
        except (ConnectionRefusedError, socket.timeout, OSError):
            in_use = False
    if in_use:
        print(
            f"Error: {label} port {port} is already in use.\n"
            f"  A previous process is still bound — refusing to start a duplicate.\n"
            f"  Identify it:  lsof -nP -iTCP:{port} -sTCP:LISTEN\n"
            f"  Kill it, then retry.",
            file=sys.stderr,
        )
        sys.exit(1)

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@qin-ctx
Copy link
Copy Markdown
Collaborator

qin-ctx commented Apr 18, 2026

api 测试流水线有报错

@chenjw chenjw merged commit 1e51c54 into main Apr 19, 2026
4 of 6 checks passed
@chenjw chenjw deleted the fix/bot-port-in-use-and-version branch April 19, 2026 07:39
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants