Skip to content

vshalsh/DocAI-MCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocAI-MCP (Ollama Local LLM Edition)

Modular, fully local document analysis and Q&A powered by Model Context Protocol (MCP) and Agent-to-Agent (A2A) orchestration.
All AI runs on your own hardware using Ollama and the Mistral large language model.


🚀 What is DocAI-MCP?

  • Upload a PDF, DOCX, or TXT file – get an instant summary, ask questions, or translate summaries.
  • Private by default: All processing happens locally—no cloud, no external API calls, no data leaves your machine.
  • Composable Agent Architecture (MCP, A2A): Specialized “agents” for summarization, Q&A, and translation, working together with explicit, auditable messaging.
  • No API keys or subscriptions required.
  • Simple, browser-based Gradio UI.

⚡ Quick Start

1. Install Ollama

Ollama lets you run open-source large language models like Mistral on your PC, Mac, Linux server, or WSL.

Linux / WSL

curl -fsSL https://ollama.com/install.sh | sh

MacOS

brew install ollama

Windows

Ollama runs under WSL2. See Ollama Windows setup for instructions.

For full details or troubleshooting, see the Ollama documentation.


2. Download the Mistral Model

After installing Ollama, run:

ollama pull mistral

This will download the Mistral model (~4GB). You can swap for any Ollama-compatible model (see below).


3. Start the Ollama Server

ollama serve

By default, Ollama will serve its API at http://localhost:11434.

Tip: You can run Ollama as a background service or leave the terminal open.


4. Clone and Run DocAI-MCP

git clone https://github.com/vishalshell/DocAI-MCP.git
cd DocAI-MCP
pip install -r requirements.txt
python app.py

Open your browser to http://localhost:7860.

  • Upload a document, get a summary, ask questions, and translate—all without sending data to the cloud!

🏗️ Folder Structure

DocAI-MCP/
├── app.py
├── agents/
│   ├── summarizer.py
│   ├── qna.py
│   └── translator.py
├── protocols/
│   └── mcp.py
├── utils/
│   ├── loader.py
│   └── config.py
├── requirements.txt
├── LICENSE
├── README.md
└── WIKI.md

🤖 FAQ

How private is this? 100% private: your files and queries never leave your machine.

What is MCP and A2A? See WIKI.md for details. In short: Each function (summary, Q&A, translation) is handled by a dedicated “agent” via formal, traceable messages. This makes the system modular, extensible, and easy to audit or extend.

Can I use a different model? Yes! Replace "mistral" with your preferred model (e.g., "llama2", "mixtral") in agents/summarizer.py, agents/qna.py, and agents/translator.py. Just run ollama pull MODELNAME first.

Is translation as good as Google Translate? No—prompt-based LLM translation is often good enough for casual/business use, but less accurate than a specialized API. For legal or medical translation, use dedicated tools.

How fast is this? Depends on your hardware. On a modern GPU, Mistral can answer in seconds; on CPU, it’s slower.


🧰 Troubleshooting

  • “Connection refused” – Is ollama serve running? Did you download (ollama pull mistral)?
  • For very large files, only the first ~3000 characters are used per request (adjustable in the code).
  • Some PDF/DOCX files with heavy formatting, images, or tables may not extract all text. For best results, use clear, text-rich documents.

🔐 Security and License

MIT License – see LICENSE

MIT License

Copyright (c) 2024 Vishal

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


NO WARRANTY and NO GUARANTEE:
This software is provided for use "AS IS", without any warranty or guarantee, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, or non-infringement.
You use this software entirely at your own risk. The authors and copyright holders are not liable for any claim, damage, or other liability that may arise from its use.



📝 Extending This Project

  • Swap LLMs, add speech-to-text, OCR, or image agents.
  • See WIKI.md for best practices, architectural explanations, and more.

🤝 Credits


📚 More Info

  • Full architecture, agent details, extensibility, and advanced usage in WIKI.md
  • Open an issue for support or feature requests.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages