Skip to content

feat(voice): move composed session prompt off WS query string (first-message or REST pre-handoff) #802

@heavygee

Description

@heavygee

Summary

The current Gemini Live and Qwen Realtime voice session implementations pass the composed system prompt as a base64url query parameter in the WebSocket upgrade URL:

wss://hub/api/voice/gemini-ws?token=...&systemPrompt=<base64url-encoded-prompt>

This is mitigated in the current implementation by truncatePromptForProxy() which caps the raw prompt at 9,000 bytes before encoding, keeping the URL well within practical limits. However, the architecture has two theoretical weaknesses:

  1. URL length limits — if the cap is ever raised, prompts could exceed proxy request-line limits
  2. URL logging — prompt content appears in access logs (not a concern for Tailscale-only hubs, but worth noting for multi-tenant deployments)

Proposed improvement

Move the composed prompt off the URL onto a dedicated transport:

Option A — First WebSocket message: Browser sends prompt as the first message after ws.onopen fires; hub buffers it before proceeding with its own setup. Requires a new hub-side message type and careful ordering relative to the existing setup gate.

Option B — REST pre-handoff: Browser POSTs prompt to POST /api/voice/prompts, receives a short-lived promptId, includes only the ID in the WS URL. Hub looks up the stored prompt on upgrade. More robust for large prompts but adds a round-trip and ephemeral storage.

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions