PDFChat System

A powerful document chat system that supports multimodal interactions with documents containing both text and images.

Features

Document Processing:
- Support for Markdown documents with embedded images
- Automatic text chunking with image preservation
- Vector embeddings using BAAI/bge-large-zh
- FAISS vector store for efficient similarity search
Chat Capabilities:
- Context-aware responses using InternLM XComposer
- Support for multimodal interactions (text + images)
- Chat history management
- Streaming responses

Setup

Install dependencies:

pip install -r requirements.txt

Start the backend server:

cd backend
python run.py

The server will start at http://localhost:8000

API Endpoints

Documents

POST /api/documents/upload - Upload a markdown document
GET /api/documents/list - List all processed documents
DELETE /api/documents/{document_name} - Delete a document

Chat

POST /api/chat/chat - Chat with a document
POST /api/chat/clear-history - Clear chat history

Usage Example

Prepare your markdown document with embedded images
Upload the document using the upload endpoint
Start chatting with the document using the chat endpoint

Requirements

Python 3.8+
CUDA-capable GPU (recommended)
16GB+ RAM

Models Used

Text Embeddings: BAAI/bge-large-zh
Multimodal Chat: internlm/internlm-xcomposer2d5-7b

TODO:

支持pdf、word、txt上传，如果其中也有图片的话，需要类似markdown格式一样处理
将VLM和sentence模型换成 api

streamlit run Home.py cd /home/lvshuhang/pdfChat && pkill -f "uvicorn backend.app.main:app" || true && uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

Figure 2 | AIME accuracy of DeepSeek-R1-Zero 中展示了什么内容，请你用中文回答

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Common-Centroid_Layout_for_Active_and_Passive_Devices_A_Review_and_the_Road_Ahead/auto		Common-Centroid_Layout_for_Active_and_Passive_Devices_A_Review_and_the_Road_Ahead/auto
backend		backend
frontend		frontend
images_cache		images_cache
knowledge_bases		knowledge_bases
uploads/auto		uploads/auto
vector_stores/auto		vector_stores/auto
.gitignore		.gitignore
README.md		README.md
SimHei.ttf		SimHei.ttf
backend.log		backend.log
docker.sh		docker.sh
go.sh		go.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDFChat System

Features

Setup

API Endpoints

Documents

Chat

Usage Example

Requirements

Models Used

TODO:

About

Uh oh!

Releases

Packages

Languages

xiaohangguo/pdfChat

Folders and files

Latest commit

History

Repository files navigation

PDFChat System

Features

Setup

API Endpoints

Documents

Chat

Usage Example

Requirements

Models Used

TODO:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages