Add agora-moss: Moss semantic search for Agora Conversational AI via MCP#174
Add agora-moss: Moss semantic search for Agora Conversational AI via MCP#174ashvathsureshkumar merged 29 commits intomainfrom
Conversation
…_uids - greeting_message so the agent speaks first when users join - system_messages for concise voice-assistant behavior - LLM_MODEL env var (optional) so providers that require a model field work - remote_rtc_uids from AGORA_REMOTE_RTC_UIDS (defaults to '2001') — fixes 'remote_rtc_uids must not be empty' rejection from ConvoAI
… tunnels
- MCP_ALLOW_ALL_HOSTS=1 disables FastMCP's DNS-rebinding protection so
a public tunnel hostname (ngrok/cloudflared) can reach /mcp
- llm_proxy mounted at /llm/chat/completions when enabled; cleans Agora's
ConvoAI request body before forwarding to any OpenAI-compatible upstream:
* injects top-level 'model' if LLM_MODEL is set (required by some
providers whose chat/completions endpoint expects it in the body)
* strips non-spec fields Agora adds (turn_id, timestamp, interruptable,
metadata; 'strict' at tools[0])
* auto-decodes gzipped upstream responses
This makes the demo work with OpenAI-compat upstreams that are strict
about the request schema, without the agora-moss package owning any
provider-specific logic.
- fastapi added as a dev dep for the proxy
- test_mcp_client.py calls search_knowledge_base against the local MCP server; useful for verifying the tool works without any Agora setup - test_mcp_client_remote.py does the same over the public tunnel URL
There was a problem hiding this comment.
Pull request overview
Adds a new Moss→Agora Conversational AI integration (agora-moss) plus a runnable demo app that exposes Moss semantic search as a single MCP tool over streamable HTTP.
Changes:
- Introduces
packages/agora-moss:MossAgoraSearchadapter +create_mcp_app()FastMCP server exposingsearch_knowledge_base. - Adds
apps/agora-moss: demo MCP server, index seeder, ConvoAI join-body starter, and an optional OpenAI-compat LLM proxy. - Updates repo documentation to list the new Agora integration.
Reviewed changes
Copilot reviewed 19 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/agora-moss/uv.lock | Locks Python dependencies for the new agora-moss package. |
| packages/agora-moss/src/agora_moss/search.py | Implements the Moss adapter and FastMCP tool exposure. |
| packages/agora-moss/src/agora_moss/init.py | Exposes the package’s public API via __all__. |
| packages/agora-moss/pyproject.toml | Defines package metadata, dependencies, and tooling config. |
| packages/agora-moss/README.md | Package-level install/quickstart and API documentation. |
| packages/agora-moss/LICENSE | Adds BSD-2-Clause license file for the package. |
| packages/agora-moss/tests/init.py | Initializes the package test module. |
| packages/agora-moss/tests/test_search.py | Unit + env-gated integration test coverage for search + MCP tool. |
| apps/agora-moss/pyproject.toml | Demo app dependencies and tooling config. |
| apps/agora-moss/server.py | ASGI entrypoint wiring FastMCP server (and proxy mount). |
| apps/agora-moss/llm_proxy.py | Optional OpenAI-compat proxy for strict upstreams. |
| apps/agora-moss/create_index.py | Seeds a Moss index with sample docs for the demo. |
| apps/agora-moss/start_agent.py | Mints RTC token and posts ConvoAI join body wiring MCP/ASR/TTS/LLM. |
| apps/agora-moss/env.example | Documents required environment variables for demo usage. |
| apps/agora-moss/moss_docs.json | Sample documents used by the seeding script. |
| apps/agora-moss/README.md | End-to-end demo walkthrough and Docker usage instructions. |
| apps/agora-moss/Dockerfile | Builds a container image for serving the MCP server demo. |
| apps/agora-moss/test_mcp_client.py | Local script to call the MCP tool via streamable HTTP. |
| apps/agora-moss/test_mcp_client_remote.py | Remote script to call a deployed MCP endpoint. |
| apps/agora-moss/tests/init.py | Initializes the demo app test module. |
| apps/agora-moss/tests/test_start_agent.py | Tests join-body construction and MCP server naming constraints. |
| README.md | Adds the Agora demo and integration-row to the repo’s top-level docs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| return JSONResponse(r.json(), status_code=r.status_code) | ||
| except Exception: | ||
| return JSONResponse({"error": "non-json upstream"}, status_code=500) |
There was a problem hiding this comment.
If the upstream response isn’t JSON, the proxy returns a generic 500 and discards the upstream status/body. This makes debugging upstream schema errors difficult. Prefer returning the upstream status code and raw body (and ideally content-type) instead of forcing a 500.
| async def gen(): | ||
| client = httpx.AsyncClient(timeout=60) | ||
| try: |
There was a problem hiding this comment.
The streaming path creates an httpx.AsyncClient inside the async generator without a context manager. On client disconnect/cancellation, cleanup can be fragile and may leak connections under load. Prefer using async with httpx.AsyncClient(...) (with appropriate cancellation shielding) or a shared client managed by app lifespan.
| URL = "https://attributable-marni-electrosurgical.ngrok-free.dev/mcp" | ||
|
|
There was a problem hiding this comment.
This script hard-codes a specific ngrok URL. That will quickly go stale and can unintentionally leak an endpoint in the repo. Consider reading the URL from an env var/CLI arg (with a placeholder default) or moving this to a clearly labeled local-only example file.
| from llm_proxy import app as proxy_app # noqa: E402 | ||
| from starlette.routing import Mount # noqa: E402 | ||
|
|
||
| app.router.routes.insert(0, Mount("/llm", app=proxy_app)) |
There was a problem hiding this comment.
llm_proxy is imported and mounted unconditionally. Because apps/agora-moss/llm_proxy.py reads LLM_PROXY_UPSTREAM at import time, the MCP server will crash on startup when that env var is unset (even if the proxy isn’t needed). Consider only importing/mounting the proxy when LLM_PROXY_UPSTREAM is present (or making llm_proxy safe to import when unset).
| from llm_proxy import app as proxy_app # noqa: E402 | |
| from starlette.routing import Mount # noqa: E402 | |
| app.router.routes.insert(0, Mount("/llm", app=proxy_app)) | |
| if os.environ.get("LLM_PROXY_UPSTREAM"): | |
| from llm_proxy import app as proxy_app # noqa: E402 | |
| from starlette.routing import Mount # noqa: E402 | |
| app.router.routes.insert(0, Mount("/llm", app=proxy_app)) |
|
|
||
| COPY apps/agora-moss/pyproject.toml apps/agora-moss/ | ||
| COPY apps/agora-moss/README.md apps/agora-moss/ | ||
| COPY apps/agora-moss/server.py apps/agora-moss/ |
There was a problem hiding this comment.
The image copies server.py but not apps/agora-moss/llm_proxy.py. Since server.py imports llm_proxy on startup, the container will crash with ModuleNotFoundError. Copy llm_proxy.py into the image or make the proxy import/mount optional.
| COPY apps/agora-moss/server.py apps/agora-moss/ | |
| COPY apps/agora-moss/server.py apps/agora-moss/ | |
| COPY apps/agora-moss/llm_proxy.py apps/agora-moss/ |
| UPSTREAM = os.environ["LLM_PROXY_UPSTREAM"] | ||
| UPSTREAM_KEY = os.environ.get("LLM_API_KEY", "") | ||
| MODEL = os.environ.get("LLM_MODEL") |
There was a problem hiding this comment.
UPSTREAM = os.environ["LLM_PROXY_UPSTREAM"] is evaluated at import time, which makes importing this module crash when the env var is unset (even if the proxy route isn’t mounted/used). Use a lazy lookup (e.g., inside the handler) or os.environ.get with a clear runtime error/disabled behavior.
| if r.status_code >= 400: | ||
| err = b"".join([chunk async for chunk in r.aiter_bytes()]) | ||
| print(f"UPSTREAM err body: {err.decode('utf-8', 'replace')[:2000]}", flush=True) | ||
| yield err | ||
| return |
There was a problem hiding this comment.
When the upstream returns an error (>=400) during streaming, the generator yields the error body but StreamingResponse still returns HTTP 200 with text/event-stream. Consider checking r.status_code before returning a streaming response and, on error, returning a non-streaming response that preserves the upstream status/body.
| ## Prerequisites | ||
|
|
||
| - Python 3.10+ and [uv](https://docs.astral.sh/uv/). | ||
| - Moss project credentials: `MOSS_PROJECT_ID`, `MOSS_PROJECT_KEY`. |
…ling Addresses review feedback on PR #174: - server.py: only import/mount llm_proxy when LLM_PROXY_UPSTREAM is set, so the server starts cleanly out of the box (previously server.py crashed at startup with KeyError because llm_proxy.py read the env var at import time). - llm_proxy.py: look up env vars inside the handler instead of at module load, so importing the module never crashes and a missing upstream produces a 503 JSON body instead. - llm_proxy.py: one module-level httpx.AsyncClient reused across requests (previously a new client was created inside the streaming generator and could leak on cancellation). - llm_proxy.py: propagate upstream status and body in all paths. A streaming error now returns a non-streaming response with the upstream status code instead of HTTP 200 text/event-stream wrapping a 4xx body. Non-streaming success passes through the upstream content-type. - Dockerfile: COPY llm_proxy.py into the image alongside server.py so the conditional mount works inside the container too.
…nel URL) The script hardcoded a developer-specific ephemeral ngrok URL that wouldn't work for anyone else. test_mcp_client.py (localhost) covers the same smoke-test use case; remote testing can trivially be done by pointing that script at any public URL.
Addresses @HarshaNalluru's review comment on PR #174.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…MCP (usemoss#174) ## Summary - **New `packages/agora-moss`** — Python library exposing a single MCP tool (`search_knowledge_base`) backed by a Moss index over streamable HTTP. Plugs into Agora ConvoAI's `llm.mcp_servers` join-body field with zero LLM-side plumbing. Sibling to `vapi-moss` / `elevenlabs-moss`. - **New `apps/agora-moss`** — runnable end-to-end demo: FastMCP streamable-HTTP server (`server.py`), `create_index.py` seeder, `start_agent.py` that mints an RTC token and POSTs a full ConvoAI `join` body wiring Moss MCP + Deepgram ASR + Cartesia TTS + any OpenAI-compatible LLM. Dockerfile targeting `ghcr.io/usemoss/agora-moss`. - **Docs** — library README with quickstart + API table, full walkthrough README in the demo app, and a row for `agora-moss` in the integrations table. - **LLM-compat proxy (optional)** — `apps/agora-moss/llm_proxy.py`, mounted at `/llm/chat/completions` when `LLM_PROXY_UPSTREAM` is set. Strips non-OpenAI-spec fields Agora injects into the chat/completions body (`turn_id`, `timestamp`, `interruptable`, `strict` on tool defs), optionally injects top-level `model`, auto-decompresses gzipped upstream responses. Lets the demo work against upstreams that strictly enforce the OpenAI schema. - **Dev flag `MCP_ALLOW_ALL_HOSTS=1`** — disables FastMCP's DNS-rebinding protection so a public tunnel host (ngrok / cloudflared) can reach `/mcp` during development. Verified end-to-end on a live ConvoAI voice agent: browser mic → Deepgram ASR → OpenAI-compat LLM (via the proxy) → MCP tool call → in-memory Moss query → LLM final answer → Cartesia TTS → audio published back into the channel. ## Test plan - [x] `packages/agora-moss`: `uv run pytest` passes (12 unit tests + 1 env-gated integration test) - [x] `apps/agora-moss`: `uv run pytest` passes (6 unit tests) - [x] Local MCP smoke test: direct client call to `search_knowledge_base` returns ranked docs - [x] Public-tunnel MCP smoke test: same call over ngrok - [x] Full ConvoAI voice roundtrip including `search_knowledge_base` tool call and spoken answer - [ ] Follow-up: `LLM_PROXY_UPSTREAM` read at import time — crashes server if unset even when proxy unused (minor; guard the import or defer the lookup into the handler) - [ ] Follow-up: streaming `httpx.AsyncClient` created inside an async generator — teardown on abnormal disconnect relies on `aclose()` being awaited; consider a module-level client to avoid connection-leak risk under load <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/usemoss/moss/pull/174" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->
Summary
packages/agora-moss— Python library exposing a single MCP tool (search_knowledge_base) backed by a Moss index over streamable HTTP. Plugs into Agora ConvoAI'sllm.mcp_serversjoin-body field with zero LLM-side plumbing. Sibling tovapi-moss/elevenlabs-moss.apps/agora-moss— runnable end-to-end demo: FastMCP streamable-HTTP server (server.py),create_index.pyseeder,start_agent.pythat mints an RTC token and POSTs a full ConvoAIjoinbody wiring Moss MCP + Deepgram ASR + Cartesia TTS + any OpenAI-compatible LLM. Dockerfile targetingghcr.io/usemoss/agora-moss.agora-mossin the integrations table.apps/agora-moss/llm_proxy.py, mounted at/llm/chat/completionswhenLLM_PROXY_UPSTREAMis set. Strips non-OpenAI-spec fields Agora injects into the chat/completions body (turn_id,timestamp,interruptable,stricton tool defs), optionally injects top-levelmodel, auto-decompresses gzipped upstream responses. Lets the demo work against upstreams that strictly enforce the OpenAI schema.MCP_ALLOW_ALL_HOSTS=1— disables FastMCP's DNS-rebinding protection so a public tunnel host (ngrok / cloudflared) can reach/mcpduring development.Verified end-to-end on a live ConvoAI voice agent: browser mic → Deepgram ASR → OpenAI-compat LLM (via the proxy) → MCP tool call → in-memory Moss query → LLM final answer → Cartesia TTS → audio published back into the channel.
Test plan
packages/agora-moss:uv run pytestpasses (12 unit tests + 1 env-gated integration test)apps/agora-moss:uv run pytestpasses (6 unit tests)search_knowledge_basereturns ranked docssearch_knowledge_basetool call and spoken answerLLM_PROXY_UPSTREAMread at import time — crashes server if unset even when proxy unused (minor; guard the import or defer the lookup into the handler)httpx.AsyncClientcreated inside an async generator — teardown on abnormal disconnect relies onaclose()being awaited; consider a module-level client to avoid connection-leak risk under load