A powerful Universal Document Question-Answering System that uses advanced embedding models and multimodal LLMs with Coarse-to-Fine search (RAG) approach. Features seamless MCP (Model Context Protocol) integration with Claude Desktop, comprehensive directory management capabilities, visual content analysis, and intelligent hybrid OCR system.
# 1. Install DocsRay
pip install docsray
# 1-1. Tesseract OCR (optional)
# For faster OCR, install Tesseract with appropriate language pack.
#pip install pytesseract
#sudo apt-get install tesseract-ocr # Debian/Ubuntu
#sudo apt-get install tesseract-ocr-kor
#brew install tesseract-ocr # MacOS
#brew install tesseract-ocr-kor
# 1-2. llama_cpp_python rebuild (recommended for CUDA)
#CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
# 2. Download required models (approximately 8GB)
docsray download-models
# 3. Configure Claude Desktop integration (optional)
docsray configure-claude
# 4. Start using DocsRay
docsray web # Launch Web UI
- Advanced RAG System: Coarse-to-Fine search for accurate document retrieval
- Multimodal AI: Visual content analysis using Gemma-3-4B's image recognition capabilities
- Hybrid OCR System: Intelligent selection between AI-powered OCR and traditional Pytesseract
- Adaptive Performance: Automatically optimizes based on available system resources
- Multi-Model Support: Uses BGE-M3, E5-Large, and Gemma-3-4B models
- MCP Integration: Seamless integration with Claude Desktop
- Multiple Interfaces: Web UI, API server, CLI, and MCP server
- Directory Management: Advanced PDF directory handling and caching
- Multi-Language: Supports multiple languages including Korean and English
- Smart Resource Management: FAST_MODE, Standard, and FULL_FEATURE_MODE based on system specs
- Universal Document Support: Automatically converts 30+ file formats to PDF for processing
- Smart File Conversion: Handles Office documents, images, HTML, Markdown, and more
DocsRay now automatically converts various document formats to PDF for processing:
Office Documents
- Microsoft Word (.docx, .doc*)
- Microsoft Excel (.xlsx, .xls)
- Microsoft PowerPoint (.pptx, .ppt)
*Note on .doc files: Legacy .doc format requires additional dependencies. For best compatibility, please save as .docx format or install optional dependencies with pip install docsray[doc]
Text Formats
- Plain Text (.txt)
Image Formats
- JPEG (.jpg, .jpeg)
- PNG (.png)
- GIF (.gif)
- BMP (.bmp)
- TIFF (.tiff, .tif)
- WebP (.webp)
Simply load any supported file type, and DocsRay will:
- Automatically detect the file format
- Convert it to PDF in the background
- Process it with all the same features as native PDFs
- Clean up temporary files automatically
# Works with any supported format!
docsray process /path/to/document.docx
docsray process /path/to/spreadsheet.xlsx
docsray process /path/to/image.png
For Microsoft Word .doc files (legacy format), DocsRay will attempt multiple conversion methods:
- First, it tries to extract content without external dependencies
- If that fails, it will provide clear instructions
Recommended solutions for .doc files:
- Best option: Save the file as .docx format in Microsoft Word
- Alternative: Install optional dependencies:
pip install docsray[doc] # or individually: pip install python-docx docx2txt
- Last resort: Convert to PDF manually and upload the PDF
Note: The newer .docx format is strongly recommended over .doc for better compatibility and features.
DocsRay now features an AI-OCR powered by Gemma3-4b. You can also choose to use Tesseract OCR simply by installing:
sudo apt-get install tesseract-ocr # Debian/Ubuntu
sudo apt-get install tesseract-ocr-kor
brew install tesseract-ocr # MacOS
brew install tesseract-ocr-kor
Automatically detects system resources and optimizes performance:
System Memory | Mode | OCR | Visual Analysis | Max Tokens |
---|---|---|---|---|
CPU | FAST (Q4) | β | β | 8K |
< 16GB | FAST (Q4) | β | β | 8K |
16-24GB | STANDARD (Q8) | β | β | 16K |
> 24GB | FULL_FEATURE (F16) | β | β | 32K |
- Cache Management:
clear_all_cache
,get_cache_info
- Improved Summarization: Batch processing with section-by-section caching
- Detail Levels: Adjustable summary detail (brief/standard/detailed)
DocsRay/
βββ docsray/ # Main package directory
β βββ __init__.py # Package init with FAST_MODE detection
β βββ chatbot.py # Core chatbot functionality
β βββ mcp_server.py # MCP server with directory management
β βββ app.py # FastAPI server
β βββ web_demo.py # Gradio web interface
β βββ download_models.py # Model download utility
β βββ cli.py # Command-line interface
β βββ inference/
β β βββ embedding_model.py # Embedding model implementations
β β βββ gemma3_handler.py # Handler for Gemma3 vision input
β β βββ llm_model.py # LLM implementations (including multimodal)
β βββ scripts/
β β βββ pdf_extractor.py # Enhanced PDF extraction with visual analysis
β β βββ chunker.py # Text chunking logic
β β βββ build_index.py # Search index builder
β β βββ section_rep_builder.py
β βββ search/
β β βββ section_coarse_search.py
β β βββ fine_search.py
β β βββ vector_search.py
β βββ utils/
β βββ text_cleaning.py
βββ setup.py # Package configuration
βββ pyproject.toml # Modern Python packaging
βββ requirements.txt # Dependencies
βββ LICENSE
βββ README.md
pip install docsray
git clone https://github.com/MIMICLab/DocsRay.git
cd DocsRay
pip install -e .
# Download models (required for first-time setup)
docsray download-models
# Check model status
docsray download-models --check
# Process a PDF with visual analysis
docsray process /path/to/document
# Ask questions about a processed PDF
docsray ask "What is the main topic?" --doc document.pdf
# Start web interface
docsray web
# Start API server
docsray api --doc /path/to/document.pdf --port 8000
# Start MCP server
docsray mcp
docsray web
Access the web interface at http://localhost:44665
.
Features:
- Upload and process PDFs with visual content analysis
- Ask questions about document content including images and charts
- Manage multiple PDFs with caching
- Customize system prompts
docsray api --doc /path/to/document
Example API usage:
# Ask a question
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What does the chart on page 5 show?"}'
# Get PDF info
curl http://localhost:8000/info
from docsray import PDFChatBot
from docsray.scripts import pdf_extractor, chunker, build_index, section_rep_builder
# Process any document type - auto-conversion handled internally
extracted = pdf_extractor.extract_content(
"report.docx", # Can be DOCX, XLSX, PNG, HTML, etc.
analyze_visuals=True,
visual_analysis_interval=1
)
# Create chunks and build index
chunks = chunker.process_extracted_file(extracted)
chunk_index = build_index.build_chunk_index(chunks)
sections = section_rep_builder.build_section_reps(extracted["sections"], chunk_index)
# Initialize chatbot
chatbot = PDFChatBot(sections, chunk_index)
# Ask questions
answer, references = chatbot.answer("What are the key trends shown in the graphs?")
-
Configure Claude Desktop:
docsray configure-claude
-
Restart Claude Desktop
-
Start using DocsRay in Claude
What's my current PDF directory?
- Show current working directorySet my PDF directory to /path/to/documents
- Change working directoryShow me information about /path/to/pdfs
- Get directory detailsGet recommended search paths
- Show common document locations for your OS
List all documents in my current directory
- List all supported files (not just PDFs)Load the document named "report.docx"
- Load any supported file typeWhat file types are supported?
- Show list of supported formatsProcess all documents in current directory
- Batch process with summaries
Search for documents about machine learning
- Content-based semantic searchFind and load the quarterly report
- Search and auto-load best matchSearch for PDF files in my home directory
- File system searchFind all Excel files modified this month
- Advanced file search with filters
What charts or figures are in this document?
- List visual elementsDescribe the diagram on page 10
- Get specific visual descriptionsWhat data is shown in the graphs?
- Analyze data visualizationsEnable/disable visual analysis
- Toggle visual content processing
What is the main topic of this document?
- Ask questions about loaded documentSummarize this document briefly
- Generate brief summary with embeddingsCreate a detailed summary
- Comprehensive section-by-section summaryShow all document summaries
- View all generated summaries
Clear all cache
- Remove all cached filesShow cache info
- Display cache statistics and detailsHow much cache space is being used?
- Check cache storage
Process all documents in /path/to/folder with brief summaries
- Processes multiple documents at once
- Generates summaries with embeddings for semantic search
- Supports brief/standard/detailed summary levels
- Caches results for faster access
-
File System Search (
search_files
)- Recursively search directories
- Filter by file type, size, date
- Exclude system directories
- Returns file paths and metadata
-
Content Search (
search_by_content
)- Semantic search using summary embeddings
- GPU-accelerated similarity computation
- Returns relevance scores
- Works only on processed documents
Analyze the path /Users/john/Documents for search complexity
- Estimates document count
- Predicts search time
- Provides complexity assessment
- Recommends search strategies
1. "Get recommended search paths"
2. "Search for all PDF files in Documents folder"
3. "Process all documents with brief summaries"
4. "Search by content for budget analysis"
5. "Load the best match"
1. "Set directory to my research papers"
2. "Process all documents"
3. "Search for papers about neural networks"
4. "Generate detailed summary of current document"
5. "What methodology was used in this paper?"
1. "Enable visual analysis"
2. "Load presentation.pptx"
3. "What charts are in this presentation?"
4. "Describe the diagram on slide 5"
Process only PDF and DOCX files
Search documents modified after 2024-01-01
Find files larger than 10MB
Generate standard summaries for all documents
Process documents without visual analysis
Use coarse search for faster results
Limit processing to 50 files
- First Time Setup: Claude will automatically find your Documents folder
- Batch Processing: Process entire directories before starting research
- Smart Search: Use content search for processed docs, file search for discovery
- Cache Management: Clear cache periodically to free space
- Visual Analysis: Disable for faster processing of text-only documents
# Custom data directory (default: ~/.docsray)
export DOCSRAY_HOME=/path/to/custom/directory
# Force specific mode
export DOCSRAY_FAST_MODE=1 # Force FAST_MODE
# Model paths (optional)
export DOCSRAY_MODEL_DIR=/path/to/models
from docsray import FAST_MODE, FULL_FEATURE_MODE, MAX_TOKENS
print(f"Fast Mode: {FAST_MODE}")
print(f"Full Feature Mode: {FULL_FEATURE_MODE}")
print(f"Max Tokens: {MAX_TOKENS}")
DocsRay stores data in the following locations:
- Models:
~/.docsray/models/
- Cache:
~/.docsray/cache/
- User Data:
~/.docsray/data/
DocsRay uses the following models (automatically downloaded):
Model | Size | Purpose |
---|---|---|
bge-m3 | 1.7GB | Multilingual embedding model |
multilingual-e5-Large | 1.2GB | Multilingual embedding model |
Gemma-3-4B | 4.1GB | Main answer generation & visual analysis |
Total storage requirement: ~8GB
- Recommended: FULL_FEATURE_MODE (ensure sufficient RAM)
- GPU acceleration essential
- Adjust visual_analysis_interval for batch processing
- Recommended: Standard mode
- Switch to FAST_MODE when needed
- Analyze visuals only on important pages
- Use FAST_MODE
- Process text-based PDFs only
- Leverage caching aggressively
[Figure 1 on page 3]: This is a bar chart showing quarterly revenue growth
from Q1 2023 to Q4 2023. The y-axis represents revenue in millions of dollars
ranging from 0 to 50. Each quarter shows progressive growth with Q1 at $12M,
Q2 at $18M, Q3 at $28M, and Q4 at $42M. The trend indicates strong
year-over-year growth of approximately 250%.
[Figure 2 on page 5]: A flowchart diagram illustrating the data processing
pipeline. The flow starts with "Data Input" at the top, branches into three
parallel processes: "Validation", "Transformation", and "Enrichment", which
then converge at "Data Integration" before ending at "Output Database".
[Table 1 on page 7]: A comparison table with 4 columns (Product, Q1 Sales,
Q2 Sales, Growth %) and 5 rows of data. Product A shows the highest growth
at 45%, while Product C has the highest absolute sales in Q2 at $2.3M.
# Check model status
docsray download-models --check
# Manual download (if automatic download fails)
# Download models from HuggingFace and place in ~/.docsray/models/
If you encounter out-of-memory errors:
-
Check current mode:
from docsray import FAST_MODE, MAX_TOKENS print(f"FAST_MODE: {FAST_MODE}") print(f"MAX_TOKENS: {MAX_TOKENS}")
-
Force FAST_MODE:
export DOCSRAY_FAST_MODE=1
-
Reduce visual analysis frequency:
extracted = pdf_extractor.extract_pdf_content( pdf_path, analyze_visuals=True, visual_analysis_interval=5 # Analyze every 5th page )
# Reinstall with GPU support
pip uninstall llama-cpp-python
# For CUDA
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --no-cache-dir
# For Metal
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python --no-cache-dir
-
Ensure all models are downloaded:
docsray download-models
-
Reconfigure Claude Desktop:
docsray configure-claude
-
Check MCP server logs:
docsray mcp
sudo apt-get install tesseract-ocr # Debian/Ubuntu
sudo apt-get install tesseract-ocr-kor
brew install tesseract-ocr # MacOS
brew install tesseract-ocr-kor
If you see "No suitable converter found":
- Check system dependencies are installed
- Verify Python packages:
pip install docsray[conversion]
- Try alternative converters (LibreOffice > docx2pdf > pandoc)
DocsRay includes an automatic restart feature that helps maintain service stability by automatically recovering from errors, memory issues, or crashes.
The service will automatically restart in the following situations:
- Memory Usage Exceeds 85% - Prevents out-of-memory crashes
- PDF Processing Timeout - Default 5 minutes per document
- Error Threshold Reached - When errors occur within the time window
- Process Crashes - Unexpected termination or unhandled exceptions
# Start web interface with auto-restart
docsray web --auto-restart
# Start MCP server with auto-restart
docsray mcp --auto-restart
# Custom retry settings
docsray web --auto-restart --max-retries 10 --retry-delay 10
# With other options
docsray web --auto-restart --port 8080 --timeout 600 --max-retries 20
Parameter | Default | Description |
---|---|---|
--auto-restart |
False | Enable automatic restart on errors |
--max-retries |
5 | Maximum restart attempts for crashes |
--retry-delay |
5 | Seconds to wait between restarts |
-
Intentional Restarts (exit code 42)
- Triggered by memory limits, timeouts, or error thresholds
- Retry counter resets to 0
- Can restart indefinitely
-
Crashes (other exit codes)
- Triggered by unexpected errors
- Retry counter increases
- Stops after reaching max-retries
Check restart logs:
# View recovery log
cat ~/.docsray/logs/recovery_log.txt
# Monitor service logs
tail -f ~/.docsray/logs/DocsRay_Web_wrapper_*.log
# High reliability settings
docsray web --auto-restart \
--max-retries 100 \
--retry-delay 30 \
--timeout 900
# Quick restart for testing
docsray web --auto-restart \
--max-retries 5 \
--retry-delay 2
For production deployments, consider using systemd:
# /etc/systemd/system/docsray.service
[Unit]
Description=DocsRay Web Service
After=network.target
[Service]
Type=simple
User=your-user
WorkingDirectory=/home/your-user
ExecStart=/usr/bin/python -m docsray web --port 80
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Then:
sudo systemctl enable docsray
sudo systemctl start docsray
-
Service keeps restarting
- Check memory usage: might need to increase system RAM
- Reduce visual analysis or page limits
- Increase timeout values
-
Service won't restart
- Check if max-retries reached
- Look for "Max retries reached" in logs
- Restart manually or increase max-retries
from docsray.scripts.pdf_extractor import extract_pdf_content
# Fine-tune visual analysis
extracted = extract_pdf_content(
"technical_report.pdf",
analyze_visuals=True,
visual_analysis_interval=1 # Every page
)
# Access visual descriptions
for i, page_text in enumerate(extracted["pages_text"]):
if "[Figure" in page_text or "[Table" in page_text:
print(f"Visual content found on page {i+1}")
#!/bin/bash
for pdf in *.pdf; do
echo "Processing $pdf with visual analysis..."
docsray process "$pdf" --analyze-visuals
done
from docsray import PDFChatBot
visual_prompt = """
You are a document assistant specialized in analyzing visual content.
When answering questions:
1. Reference specific figures, charts, and tables by their descriptions
2. Integrate visual information with text content
3. Highlight data trends and patterns shown in visualizations
"""
chatbot = PDFChatBot(sections, chunk_index, system_prompt=visual_prompt)
#!/bin/bash
# Process all supported documents in a directory
for file in *.{pdf,docx,xlsx,pptx,txt,md,html,png,jpg}; do
if [[ -f "$file" ]]; then
echo "Processing $file..."
docsray process "$file"
fi
done
from docsray.scripts.file_converter import FileConverter
converter = FileConverter()
# Check if file is supported
if converter.is_supported("presentation.pptx"):
print("File is supported!")
# Get all supported formats
formats = converter.get_supported_formats()
for ext, description in formats.items():
print(f"{ext}: {description}")
# Clone repository
git clone https://github.com/MIMICLab/DocsRay.git
cd DocsRay
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .[dev]
# Run tests
pytest tests/
Contributions are welcome! Areas of interest:
- Additional multimodal model support
- Enhanced table extraction algorithms
- Support for more document formats
- Performance optimizations
- UI/UX improvements
This project is licensed under the MIT License. See LICENSE file for details.
Note: Individual model licenses may have different requirements:
- BAAI/bge-m3: MIT License
- intfloat/multilingual-e5-large: MIT License
- gemma-3-4B-it: Gemma Terms of Use
- Web Demo: https://docsray.com
- Issues: GitHub Issues
- Discussions: GitHub Discussions