Skip to content

Add agora-moss: Moss semantic search for Agora Conversational AI via MCP#174

Merged
ashvathsureshkumar merged 29 commits intomainfrom
moss-agora-llm-compat
Apr 22, 2026
Merged

Add agora-moss: Moss semantic search for Agora Conversational AI via MCP#174
ashvathsureshkumar merged 29 commits intomainfrom
moss-agora-llm-compat

Conversation

@ashvathsureshkumar
Copy link
Copy Markdown
Contributor

@ashvathsureshkumar ashvathsureshkumar commented Apr 22, 2026

Summary

  • New packages/agora-moss — Python library exposing a single MCP tool (search_knowledge_base) backed by a Moss index over streamable HTTP. Plugs into Agora ConvoAI's llm.mcp_servers join-body field with zero LLM-side plumbing. Sibling to vapi-moss / elevenlabs-moss.
  • New apps/agora-moss — runnable end-to-end demo: FastMCP streamable-HTTP server (server.py), create_index.py seeder, start_agent.py that mints an RTC token and POSTs a full ConvoAI join body wiring Moss MCP + Deepgram ASR + Cartesia TTS + any OpenAI-compatible LLM. Dockerfile targeting ghcr.io/usemoss/agora-moss.
  • Docs — library README with quickstart + API table, full walkthrough README in the demo app, and a row for agora-moss in the integrations table.
  • LLM-compat proxy (optional)apps/agora-moss/llm_proxy.py, mounted at /llm/chat/completions when LLM_PROXY_UPSTREAM is set. Strips non-OpenAI-spec fields Agora injects into the chat/completions body (turn_id, timestamp, interruptable, strict on tool defs), optionally injects top-level model, auto-decompresses gzipped upstream responses. Lets the demo work against upstreams that strictly enforce the OpenAI schema.
  • Dev flag MCP_ALLOW_ALL_HOSTS=1 — disables FastMCP's DNS-rebinding protection so a public tunnel host (ngrok / cloudflared) can reach /mcp during development.

Verified end-to-end on a live ConvoAI voice agent: browser mic → Deepgram ASR → OpenAI-compat LLM (via the proxy) → MCP tool call → in-memory Moss query → LLM final answer → Cartesia TTS → audio published back into the channel.

Test plan

  • packages/agora-moss: uv run pytest passes (12 unit tests + 1 env-gated integration test)
  • apps/agora-moss: uv run pytest passes (6 unit tests)
  • Local MCP smoke test: direct client call to search_knowledge_base returns ranked docs
  • Public-tunnel MCP smoke test: same call over ngrok
  • Full ConvoAI voice roundtrip including search_knowledge_base tool call and spoken answer
  • Follow-up: LLM_PROXY_UPSTREAM read at import time — crashes server if unset even when proxy unused (minor; guard the import or defer the lookup into the handler)
  • Follow-up: streaming httpx.AsyncClient created inside an async generator — teardown on abnormal disconnect relies on aclose() being awaited; consider a module-level client to avoid connection-leak risk under load

Open in Devin Review

…_uids

- greeting_message so the agent speaks first when users join
- system_messages for concise voice-assistant behavior
- LLM_MODEL env var (optional) so providers that require a model field work
- remote_rtc_uids from AGORA_REMOTE_RTC_UIDS (defaults to '2001') — fixes
  'remote_rtc_uids must not be empty' rejection from ConvoAI
… tunnels

- MCP_ALLOW_ALL_HOSTS=1 disables FastMCP's DNS-rebinding protection so
  a public tunnel hostname (ngrok/cloudflared) can reach /mcp
- llm_proxy mounted at /llm/chat/completions when enabled; cleans Agora's
  ConvoAI request body before forwarding to any OpenAI-compatible upstream:
    * injects top-level 'model' if LLM_MODEL is set (required by some
      providers whose chat/completions endpoint expects it in the body)
    * strips non-spec fields Agora adds (turn_id, timestamp, interruptable,
      metadata; 'strict' at tools[0])
    * auto-decodes gzipped upstream responses
  This makes the demo work with OpenAI-compat upstreams that are strict
  about the request schema, without the agora-moss package owning any
  provider-specific logic.
- fastapi added as a dev dep for the proxy
- test_mcp_client.py calls search_knowledge_base against the local MCP
  server; useful for verifying the tool works without any Agora setup
- test_mcp_client_remote.py does the same over the public tunnel URL
Copilot AI review requested due to automatic review settings April 22, 2026 23:24
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Moss→Agora Conversational AI integration (agora-moss) plus a runnable demo app that exposes Moss semantic search as a single MCP tool over streamable HTTP.

Changes:

  • Introduces packages/agora-moss: MossAgoraSearch adapter + create_mcp_app() FastMCP server exposing search_knowledge_base.
  • Adds apps/agora-moss: demo MCP server, index seeder, ConvoAI join-body starter, and an optional OpenAI-compat LLM proxy.
  • Updates repo documentation to list the new Agora integration.

Reviewed changes

Copilot reviewed 19 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
packages/agora-moss/uv.lock Locks Python dependencies for the new agora-moss package.
packages/agora-moss/src/agora_moss/search.py Implements the Moss adapter and FastMCP tool exposure.
packages/agora-moss/src/agora_moss/init.py Exposes the package’s public API via __all__.
packages/agora-moss/pyproject.toml Defines package metadata, dependencies, and tooling config.
packages/agora-moss/README.md Package-level install/quickstart and API documentation.
packages/agora-moss/LICENSE Adds BSD-2-Clause license file for the package.
packages/agora-moss/tests/init.py Initializes the package test module.
packages/agora-moss/tests/test_search.py Unit + env-gated integration test coverage for search + MCP tool.
apps/agora-moss/pyproject.toml Demo app dependencies and tooling config.
apps/agora-moss/server.py ASGI entrypoint wiring FastMCP server (and proxy mount).
apps/agora-moss/llm_proxy.py Optional OpenAI-compat proxy for strict upstreams.
apps/agora-moss/create_index.py Seeds a Moss index with sample docs for the demo.
apps/agora-moss/start_agent.py Mints RTC token and posts ConvoAI join body wiring MCP/ASR/TTS/LLM.
apps/agora-moss/env.example Documents required environment variables for demo usage.
apps/agora-moss/moss_docs.json Sample documents used by the seeding script.
apps/agora-moss/README.md End-to-end demo walkthrough and Docker usage instructions.
apps/agora-moss/Dockerfile Builds a container image for serving the MCP server demo.
apps/agora-moss/test_mcp_client.py Local script to call the MCP tool via streamable HTTP.
apps/agora-moss/test_mcp_client_remote.py Remote script to call a deployed MCP endpoint.
apps/agora-moss/tests/init.py Initializes the demo app test module.
apps/agora-moss/tests/test_start_agent.py Tests join-body construction and MCP server naming constraints.
README.md Adds the Agora demo and integration-row to the repo’s top-level docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apps/agora-moss/llm_proxy.py Outdated
Comment on lines +121 to +124
try:
return JSONResponse(r.json(), status_code=r.status_code)
except Exception:
return JSONResponse({"error": "non-json upstream"}, status_code=500)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the upstream response isn’t JSON, the proxy returns a generic 500 and discards the upstream status/body. This makes debugging upstream schema errors difficult. Prefer returning the upstream status code and raw body (and ideally content-type) instead of forcing a 500.

Copilot uses AI. Check for mistakes.
Comment thread apps/agora-moss/llm_proxy.py Outdated
Comment on lines +99 to +101
async def gen():
client = httpx.AsyncClient(timeout=60)
try:
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The streaming path creates an httpx.AsyncClient inside the async generator without a context manager. On client disconnect/cancellation, cleanup can be fragile and may leak connections under load. Prefer using async with httpx.AsyncClient(...) (with appropriate cancellation shielding) or a shared client managed by app lifespan.

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +7
URL = "https://attributable-marni-electrosurgical.ngrok-free.dev/mcp"

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script hard-codes a specific ngrok URL. That will quickly go stale and can unintentionally leak an endpoint in the repo. Consider reading the URL from an env var/CLI arg (with a placeholder default) or moving this to a clearly labeled local-only example file.

Copilot uses AI. Check for mistakes.
Comment thread apps/agora-moss/server.py Outdated
Comment on lines +51 to +54
from llm_proxy import app as proxy_app # noqa: E402
from starlette.routing import Mount # noqa: E402

app.router.routes.insert(0, Mount("/llm", app=proxy_app))
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm_proxy is imported and mounted unconditionally. Because apps/agora-moss/llm_proxy.py reads LLM_PROXY_UPSTREAM at import time, the MCP server will crash on startup when that env var is unset (even if the proxy isn’t needed). Consider only importing/mounting the proxy when LLM_PROXY_UPSTREAM is present (or making llm_proxy safe to import when unset).

Suggested change
from llm_proxy import app as proxy_app # noqa: E402
from starlette.routing import Mount # noqa: E402
app.router.routes.insert(0, Mount("/llm", app=proxy_app))
if os.environ.get("LLM_PROXY_UPSTREAM"):
from llm_proxy import app as proxy_app # noqa: E402
from starlette.routing import Mount # noqa: E402
app.router.routes.insert(0, Mount("/llm", app=proxy_app))

Copilot uses AI. Check for mistakes.

COPY apps/agora-moss/pyproject.toml apps/agora-moss/
COPY apps/agora-moss/README.md apps/agora-moss/
COPY apps/agora-moss/server.py apps/agora-moss/
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image copies server.py but not apps/agora-moss/llm_proxy.py. Since server.py imports llm_proxy on startup, the container will crash with ModuleNotFoundError. Copy llm_proxy.py into the image or make the proxy import/mount optional.

Suggested change
COPY apps/agora-moss/server.py apps/agora-moss/
COPY apps/agora-moss/server.py apps/agora-moss/
COPY apps/agora-moss/llm_proxy.py apps/agora-moss/

Copilot uses AI. Check for mistakes.
Comment thread apps/agora-moss/llm_proxy.py Outdated
Comment on lines +27 to +29
UPSTREAM = os.environ["LLM_PROXY_UPSTREAM"]
UPSTREAM_KEY = os.environ.get("LLM_API_KEY", "")
MODEL = os.environ.get("LLM_MODEL")
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPSTREAM = os.environ["LLM_PROXY_UPSTREAM"] is evaluated at import time, which makes importing this module crash when the env var is unset (even if the proxy route isn’t mounted/used). Use a lazy lookup (e.g., inside the handler) or os.environ.get with a clear runtime error/disabled behavior.

Copilot uses AI. Check for mistakes.
Comment thread apps/agora-moss/llm_proxy.py Outdated
Comment on lines +104 to +108
if r.status_code >= 400:
err = b"".join([chunk async for chunk in r.aiter_bytes()])
print(f"UPSTREAM err body: {err.decode('utf-8', 'replace')[:2000]}", flush=True)
yield err
return
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the upstream returns an error (>=400) during streaming, the generator yields the error body but StreamingResponse still returns HTTP 200 with text/event-stream. Consider checking r.status_code before returning a streaming response and, on error, returning a non-streaming response that preserves the upstream status/body.

Copilot uses AI. Check for mistakes.
Comment thread apps/agora-moss/README.md Outdated
## Prerequisites

- Python 3.10+ and [uv](https://docs.astral.sh/uv/).
- Moss project credentials: `MOSS_PROJECT_ID`, `MOSS_PROJECT_KEY`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link?

…ling

Addresses review feedback on PR #174:

- server.py: only import/mount llm_proxy when LLM_PROXY_UPSTREAM is set,
  so the server starts cleanly out of the box (previously server.py crashed
  at startup with KeyError because llm_proxy.py read the env var at import
  time).
- llm_proxy.py: look up env vars inside the handler instead of at module
  load, so importing the module never crashes and a missing upstream
  produces a 503 JSON body instead.
- llm_proxy.py: one module-level httpx.AsyncClient reused across requests
  (previously a new client was created inside the streaming generator and
  could leak on cancellation).
- llm_proxy.py: propagate upstream status and body in all paths. A streaming
  error now returns a non-streaming response with the upstream status code
  instead of HTTP 200 text/event-stream wrapping a 4xx body. Non-streaming
  success passes through the upstream content-type.
- Dockerfile: COPY llm_proxy.py into the image alongside server.py so the
  conditional mount works inside the container too.
…nel URL)

The script hardcoded a developer-specific ephemeral ngrok URL that
wouldn't work for anyone else. test_mcp_client.py (localhost) covers
the same smoke-test use case; remote testing can trivially be done by
pointing that script at any public URL.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
moss_nextjs Ready Ready Preview, Comment Apr 22, 2026 11:49pm

Request Review

@ashvathsureshkumar ashvathsureshkumar merged commit 9479301 into main Apr 22, 2026
20 checks passed
Atulsingh1155 pushed a commit to Atulsingh1155/moss that referenced this pull request Apr 30, 2026
…MCP (usemoss#174)

## Summary

- **New `packages/agora-moss`** — Python library exposing a single MCP
tool (`search_knowledge_base`) backed by a Moss index over streamable
HTTP. Plugs into Agora ConvoAI's `llm.mcp_servers` join-body field with
zero LLM-side plumbing. Sibling to `vapi-moss` / `elevenlabs-moss`.
- **New `apps/agora-moss`** — runnable end-to-end demo: FastMCP
streamable-HTTP server (`server.py`), `create_index.py` seeder,
`start_agent.py` that mints an RTC token and POSTs a full ConvoAI `join`
body wiring Moss MCP + Deepgram ASR + Cartesia TTS + any
OpenAI-compatible LLM. Dockerfile targeting
`ghcr.io/usemoss/agora-moss`.
- **Docs** — library README with quickstart + API table, full
walkthrough README in the demo app, and a row for `agora-moss` in the
integrations table.
- **LLM-compat proxy (optional)** — `apps/agora-moss/llm_proxy.py`,
mounted at `/llm/chat/completions` when `LLM_PROXY_UPSTREAM` is set.
Strips non-OpenAI-spec fields Agora injects into the chat/completions
body (`turn_id`, `timestamp`, `interruptable`, `strict` on tool defs),
optionally injects top-level `model`, auto-decompresses gzipped upstream
responses. Lets the demo work against upstreams that strictly enforce
the OpenAI schema.
- **Dev flag `MCP_ALLOW_ALL_HOSTS=1`** — disables FastMCP's
DNS-rebinding protection so a public tunnel host (ngrok / cloudflared)
can reach `/mcp` during development.

Verified end-to-end on a live ConvoAI voice agent: browser mic →
Deepgram ASR → OpenAI-compat LLM (via the proxy) → MCP tool call →
in-memory Moss query → LLM final answer → Cartesia TTS → audio published
back into the channel.

## Test plan

- [x] `packages/agora-moss`: `uv run pytest` passes (12 unit tests + 1
env-gated integration test)
- [x] `apps/agora-moss`: `uv run pytest` passes (6 unit tests)
- [x] Local MCP smoke test: direct client call to
`search_knowledge_base` returns ranked docs
- [x] Public-tunnel MCP smoke test: same call over ngrok
- [x] Full ConvoAI voice roundtrip including `search_knowledge_base`
tool call and spoken answer
- [ ] Follow-up: `LLM_PROXY_UPSTREAM` read at import time — crashes
server if unset even when proxy unused (minor; guard the import or defer
the lookup into the handler)
- [ ] Follow-up: streaming `httpx.AsyncClient` created inside an async
generator — teardown on abnormal disconnect relies on `aclose()` being
awaited; consider a module-level client to avoid connection-leak risk
under load
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/usemoss/moss/pull/174"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants