An information access platform that automatically downloads, converts, vectorizes, and provides search/chat APIs for NVIDIA DeepStream documentation.
DeepstreamMCP automatically downloads the NVIDIA DeepStream SDK documentation, converts it to text, indexes it into a vector database, and provides natural language search/chat capabilities. The goal is to enable fast and intelligent querying of technical documentation.
- Automatically downloads and updates DeepStream documentation
- Converts HTML documents to readable plain text
- Indexes all texts into a vector database (ChromaDB)
- Natural language search and sample document retrieval
- Smart chatbot interface integrated with Gemini LLM
- Tool-based API support via the MCP protocol
- download_docs.py: Downloads DeepStream documentation from the web (HTML).
- html2txt.py: Converts downloaded HTML files to readable plain text.
- vectorize_docs.py: Vectorizes all text files and adds them to ChromaDB.
- mcp_server.py: Provides search and sample document APIs over the vector database (via MCP protocol).
- client.py: Interactive client to connect to the MCP server and test tools.
- gemini_chatbot.py: Smart chatbot interface integrated with Gemini LLM and document search.
docs (downloaded HTML) → docs_txt (plain text) → chroma_db (vector DB)
- Install the required dependencies:
uv sync uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 uv pip install sentence_transformers
- Download the documentations:
uv run python download_docs.py
- Convert HTML to text:
uv run python html2txt.py
- Build the vector database:
uv run python vectorize_docs.py
uv run python mcp_server.pyTo use this project as an MCP server in GitHub Copilot Chat:
-
Open Copilot Chat and go to Configure Tools.
-
Scroll down and click Add More Tools → Add MCP Server.
-
For Command (stdio), enter:
uv run --directory C:/Users/mehmu/OneDrive/Masaüstü/DeepstreamMCP mcp_server.py⚠️ Note: Adjust the path after--directoryto match your own workspace location. -
Set the Server ID to:
deepstream_docs_http -
For Workspace, select Global.
-
When prompted, your
mcp.jsonshould look like this (with your correct path):{ "servers": { "deepstream_docs_http": { "command": "uv", "args": [ "run", "--directory", "C:/Users/mehmu/OneDrive/Masaüstü/DeepstreamMCP", "mcp_server.py" ], "type": "stdio" } // ...other servers... }, "inputs": [] }
or with the interactive client:
uv run python client.py mcp_server.pyuv run python gemini_chatbot.py mcp_server.py- Python 3.12+
- torch, torchvision, torchaudio
- sentence_transformers
- chromadb
- beautifulsoup4, readability-lxml
- requests
- mcp, mcp-cli
- google-generativeai, python-dotenv
See requirements.txt and pyproject.toml for the full list of dependencies.