A powerful document chat system that supports multimodal interactions with documents containing both text and images.
-
Document Processing:
- Support for Markdown documents with embedded images
- Automatic text chunking with image preservation
- Vector embeddings using BAAI/bge-large-zh
- FAISS vector store for efficient similarity search
-
Chat Capabilities:
- Context-aware responses using InternLM XComposer
- Support for multimodal interactions (text + images)
- Chat history management
- Streaming responses
- Install dependencies:
pip install -r requirements.txt- Start the backend server:
cd backend
python run.pyThe server will start at http://localhost:8000
POST /api/documents/upload- Upload a markdown documentGET /api/documents/list- List all processed documentsDELETE /api/documents/{document_name}- Delete a document
POST /api/chat/chat- Chat with a documentPOST /api/chat/clear-history- Clear chat history
- Prepare your markdown document with embedded images
- Upload the document using the upload endpoint
- Start chatting with the document using the chat endpoint
- Python 3.8+
- CUDA-capable GPU (recommended)
- 16GB+ RAM
- Text Embeddings: BAAI/bge-large-zh
- Multimodal Chat: internlm/internlm-xcomposer2d5-7b
- 支持pdf、word、txt上传,如果其中也有图片的话,需要类似markdown格式一样处理
- 将VLM和sentence模型换成 api
streamlit run Home.py cd /home/lvshuhang/pdfChat && pkill -f "uvicorn backend.app.main:app" || true && uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
Figure 2 | AIME accuracy of DeepSeek-R1-Zero 中展示了什么内容,请你用中文回答