Complete OCR service for PDF documents with layout detection, powered by DeepSeek-OCR and vLLM.
- Quick Start
- Installation
- Redis Setup (Optional)
- Service Management
- Authentication
- API Usage
- Configuration
- Troubleshooting
- Advanced Topics
- NVIDIA GPU with CUDA 11.8 support
- Python 3.12 virtual environment at
.venv/ - At least 8GB GPU memory
- Run the installation script:
./install/install.shThis will install:
- PyTorch 2.6.0 with CUDA 11.8
- vLLM 0.8.5
- flash-attn 2.7.3
- All required dependencies
./run.shThe service will be available at http://localhost:8000
./status.sh./stop.shHardware:
- NVIDIA GPU (tested on A40 with 44GB memory)
- CUDA 11.8 compatible GPU
- Minimum 8GB GPU memory
Software:
- Ubuntu 24.04 (or compatible)
- Python 3.12
- CUDA 11.8
- nvidia-smi
- Create virtual environment (if not exists):
python3.12 -m venv .venv- Run installation script:
chmod +x install/install.sh
./install/install.shThe script will:
- Activate virtual environment
- Install PyTorch with CUDA 11.8 support
- Install vLLM wheel
- Install all dependencies
- Install flash-attn
- Display authentication setup instructions
- Verify installation:
.venv/bin/python -c "import torch, vllm; print(f'PyTorch: {torch.__version__}, vLLM: {vllm.__version__}')"Redis can be used as a message broker to ensure tasks are processed sequentially (one at a time), preventing concurrent GPU access issues. This is optional but recommended for production use.
Option 1: Docker (Recommended)
chmod +x install/install_redis_docker.sh
./install/install_redis_docker.shOption 2: Standalone System Installation
chmod +x install/install_redis_standalone.sh
sudo ./install/install_redis_standalone.shAfter installing Redis, update your .env file:
# Copy example if you haven't already
cp .env.example .env
# Edit .env and uncomment Redis settings:
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
QUEUE_NAME=deepseek_ocr_tasks
MAX_WORKERS=1 # Process one task at a time.venv/bin/pip install redis rq# Test connection
redis-cli ping
# Should return: PONG
# Or with Docker
docker exec -it deepseek-redis redis-cli pingFor detailed Redis setup instructions, see REDIS_SETUP.md.
Three scripts provide complete service lifecycle management:
./run.shFeatures:
- Checks if service is already running (prevents duplicates)
- Validates environment and port availability
- Creates PID file for reliable tracking
- Starts service in background with logging
- Waits for service to be ready (up to 60 seconds)
- Displays service URLs and status
- Checks authentication configuration
Output:
=== DeepSeek OCR PDF Service ===
✓ Service started successfully
PID: 12345
Port: 8000
Service URLs:
• Health check: http://localhost:8000/health
• API docs: http://localhost:8000/docs
• Base URL: http://localhost:8000/
Useful commands:
• View logs: tail -f /tmp/deepseek_ocr.log
• Stop service: ./stop.sh
./status.shDisplays:
- Process status (PID, uptime, CPU/memory usage)
- GPU metrics (memory, utilization, temperature)
- Authentication status
- Recent log entries (last 5 lines)
- Service URLs (if running)
./stop.shFeatures:
- Graceful shutdown (SIGTERM, waits 10 seconds)
- Force-kill if necessary (SIGKILL)
- Cleans up PID file
- Stops orphaned processes
- Displays GPU memory status
- Shows log file location
Start and monitor:
./run.sh
tail -f /tmp/deepseek_ocr.logRestart service:
./stop.sh && ./run.shCheck if running:
./status.sh | grep -q "Service is RUNNING" && echo "Running" || echo "Not running"| File | Location | Purpose |
|---|---|---|
| PID file | /tmp/deepseek_ocr.pid |
Process ID tracking |
| Log file | /tmp/deepseek_ocr.log |
Service logs |
| Config | .env |
Authentication token |
The service supports token-based authentication using Bearer tokens.
- Create
.envfile:
cp .env.example .env- Generate a secure token:
# Using Python
python -c "import secrets; print(secrets.token_hex(32))"
# Using OpenSSL
openssl rand -hex 32- Edit
.envand set your token:
AUTH_TOKEN=your-generated-token-here- Restart service (if running):
./stop.sh && ./run.shTo disable authentication (development only):
- Remove or comment out
AUTH_TOKENin.env, or - Don't create a
.envfile
When AUTH_TOKEN is set, these endpoints require authentication:
POST /process_pdf- Upload and process PDFGET /result/{job_id}/markdown- Get markdown outputGET /result/{job_id}/markdown_det- Get markdown with detectionsGET /result/{job_id}/layout_pdf- Download layout PDFGET /result/{job_id}/images- List extracted imagesGET /result/{job_id}/images/{image_name}- Get specific imageDELETE /result/{job_id}- Delete job files
These endpoints are always accessible without authentication:
GET /- API informationGET /health- Health check
Upload and process PDF:
curl -X POST "http://localhost:8000/process_pdf" \
-H "Authorization: Bearer your-token-here" \
-F "file=@document.pdf"Response:
{
"job_id": "abc123-def456-789...",
"status": "completed",
"message": "PDF processed successfully"
}Get markdown result:
curl -X GET "http://localhost:8000/result/{job_id}/markdown" \
-H "Authorization: Bearer your-token-here"Download layout PDF:
curl -X GET "http://localhost:8000/result/{job_id}/layout_pdf" \
-H "Authorization: Bearer your-token-here" \
-o layout.pdfHealth check (no auth required):
curl http://localhost:8000/healthimport requests
# Configure
API_URL = "http://localhost:8000"
AUTH_TOKEN = "your-token-here"
headers = {"Authorization": f"Bearer {AUTH_TOKEN}"}
# Upload PDF
with open("document.pdf", "rb") as f:
files = {"file": f}
response = requests.post(
f"{API_URL}/process_pdf",
headers=headers,
files=files
)
result = response.json()
job_id = result["job_id"]
print(f"Job ID: {job_id}")
# Get markdown result
response = requests.get(
f"{API_URL}/result/{job_id}/markdown",
headers=headers
)
markdown_content = response.json()["content"]
print(markdown_content)
# Download layout PDF
response = requests.get(
f"{API_URL}/result/{job_id}/layout_pdf",
headers=headers
)
with open("layout.pdf", "wb") as f:
f.write(response.content)
# Clean up
requests.delete(f"{API_URL}/result/{job_id}", headers=headers)const API_URL = "http://localhost:8000";
const AUTH_TOKEN = "your-token-here";
// Upload PDF
const formData = new FormData();
formData.append("file", pdfFile);
const uploadResponse = await fetch(`${API_URL}/process_pdf`, {
method: "POST",
headers: {
"Authorization": `Bearer ${AUTH_TOKEN}`
},
body: formData
});
const { job_id } = await uploadResponse.json();
// Get markdown result
const resultResponse = await fetch(
`${API_URL}/result/${job_id}/markdown`,
{
headers: {
"Authorization": `Bearer ${AUTH_TOKEN}`
}
}
);
const { content } = await resultResponse.json();
console.log(content);| Method | Endpoint | Auth | Description |
|---|---|---|---|
| GET | / |
No | API information |
| GET | /health |
No | Health check |
| POST | /process_pdf |
Yes* | Upload and process PDF |
| GET | /result/{job_id}/markdown |
Yes* | Get markdown output |
| GET | /result/{job_id}/markdown_det |
Yes* | Get markdown with detections |
| GET | /result/{job_id}/layout_pdf |
Yes* | Download layout PDF |
| GET | /result/{job_id}/images |
Yes* | List extracted images |
| GET | /result/{job_id}/images/{image_name} |
Yes* | Get specific image |
| DELETE | /result/{job_id} |
Yes* | Delete job files |
*Auth required only if AUTH_TOKEN is configured in .env
Interactive API Documentation:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Create a .env file in the project root:
# Authentication Token
# Set this to enable token-based authentication
# If not set, API will be accessible without authentication
AUTH_TOKEN=your-secret-token-hereEdit config.py to modify:
MODEL_PATH- Model locationPROMPT- OCR prompt templateSKIP_REPEAT- Skip repeated contentMAX_CONCURRENCY- Max concurrent requestsNUM_WORKERS- Number of worker threadsCROP_MODE- Image cropping mode
The service uses DeepSeek-OCR model with these settings:
- Model:
deepseek-ai/DeepSeek-OCR - Max sequence length: 8192 tokens
- GPU memory utilization: 90%
- Tensor parallel size: 1
- Block size: 256
Check if already running:
./status.shCheck for port conflicts:
netstat -tuln | grep 8000
lsof -i :8000Check logs:
tail -100 /tmp/deepseek_ocr.logCheck virtual environment:
ls -la .venv/bin/python
.venv/bin/python --versionForce stop:
./stop.sh
# If that doesn't work:
pkill -9 -f "serve_pdf.py"
rm -f /tmp/deepseek_ocr.pid401 Unauthorized Error:
- Verify token matches
AUTH_TOKENin.env - Check "Bearer " prefix in Authorization header
- Ensure
.envfile is loaded (restart service)
Disable authentication:
# Comment out or remove AUTH_TOKEN from .env
sed -i 's/^AUTH_TOKEN=/#AUTH_TOKEN=/' .env
./stop.sh && ./run.shCheck GPU status:
nvidia-smiFree GPU memory:
./stop.sh
# If memory not released:
pkill -9 -f "python.*serve_pdf"
nvidia-smiReduce memory usage:
Edit serve_pdf.py:
llm = LLM(
...
gpu_memory_utilization=0.7, # Reduce from 0.9
max_num_seqs=4, # Reduce from MAX_CONCURRENCY
)Missing modules:
# Reinstall dependencies
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install PyMuPDF img2pdf easydict addictCheck Python version:
.venv/bin/python --version # Should be 3.12.xLog file too large:
./stop.sh
mv /tmp/deepseek_ocr.log /tmp/deepseek_ocr.log.old
./run.shMonitor logs:
# Real-time
tail -f /tmp/deepseek_ocr.log
# Last 50 lines
tail -50 /tmp/deepseek_ocr.log
# Search for errors
grep -i error /tmp/deepseek_ocr.logrm -f /tmp/deepseek_ocr.pid
./status.sh # Verify clean state
./run.sh # Start freshCreate /etc/systemd/system/deepseek-ocr.service:
[Unit]
Description=DeepSeek OCR PDF Service
After=network.target
[Service]
Type=simple
User=root
WorkingDirectory=/root/dpsk
ExecStart=/root/dpsk/.venv/bin/python /root/dpsk/serve_pdf.py
ExecStop=/root/dpsk/stop.sh
Restart=on-failure
RestartSec=10s
StandardOutput=append:/tmp/deepseek_ocr.log
StandardError=append:/tmp/deepseek_ocr.log
[Install]
WantedBy=multi-user.targetEnable and start:
systemctl daemon-reload
systemctl enable deepseek-ocr
systemctl start deepseek-ocr
systemctl status deepseek-ocrCron job (check every 5 minutes):
# Add to crontab
*/5 * * * * /root/dpsk/status.sh | grep -q "NOT running" && /root/dpsk/run.shMonitoring script:
#!/bin/bash
# check_service.sh
if ! /root/dpsk/status.sh | grep -q "Service is RUNNING"; then
echo "ALERT: Service is DOWN" | mail -s "Service Alert" admin@example.com
/root/dpsk/run.sh
fiAdjust concurrency:
Edit config.py:
MAX_CONCURRENCY = 8 # Adjust based on GPU memory
NUM_WORKERS = 4 # Adjust based on CPU coresOptimize GPU usage:
Edit serve_pdf.py:
llm = LLM(
...
gpu_memory_utilization=0.85, # Adjust (0.7-0.95)
max_num_seqs=MAX_CONCURRENCY,
tensor_parallel_size=1, # Increase for multi-GPU
)Enable CUDA graphs for better performance:
llm = LLM(
...
enforce_eager=False, # Use CUDA graphs
)- Use strong tokens (at least 32 characters)
- Rotate tokens regularly
- Use HTTPS in production (reverse proxy with nginx/caddy)
- Limit token sharing to authorized users only
- Never commit
.envto version control - Set up firewall rules to restrict access
- Monitor access logs for suspicious activity
server {
listen 443 ssl http2;
server_name ocr.example.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Increase timeout for long processing
proxy_read_timeout 300s;
proxy_connect_timeout 300s;
# Increase max body size for large PDFs
client_max_body_size 100M;
}
}/root/dpsk/
├── serve_pdf.py # Main service application
├── pdf_utils.py # PDF conversion utilities
├── processing_utils.py # Image processing utilities
├── deepseek_ocr.py # DeepSeek OCR model
├── config.py # Configuration
├── requirements.txt # Python dependencies
├── install/ # Installation scripts
│ └── install.sh # Installation script
├── run.sh # Start service script
├── stop.sh # Stop service script
├── status.sh # Status check script
├── .env.example # Environment template
├── .env # Your configuration (not in git)
├── .venv/ # Virtual environment
├── process/ # Processing modules
│ ├── ngram_norepeat.py
│ └── image_process.py
├── deepencoder/ # Encoder modules
│ ├── clip_sdpa.py
│ └── sam_vary_sdpa.py
└── README.md # This file
Environment:
- Virtual environment:
.venv/ - Python: 3.12
- PyTorch: 2.6.0 with CUDA 11.8
- vLLM: 0.8.5
- Model: DeepSeek-OCR
GPU Support:
- Tested on NVIDIA A40 (44GB)
- Requires CUDA 11.8
- Uses 90% GPU memory by default
Features:
- PDF to image conversion (high quality, 144 DPI)
- OCR with layout detection
- Bounding box extraction and visualization
- Image region extraction
- Markdown output with/without layout annotations
- Token-based authentication
- RESTful API with OpenAPI docs
- Concurrent request processing
Check Status:
./status.shView Logs:
tail -f /tmp/deepseek_ocr.logReport Issues: Include in your report:
- Service status output
- Last 50 lines of log
- GPU status (
nvidia-smi) - Error messages
This service uses DeepSeek-OCR model. Please refer to the model's license for usage terms.
# Installation
./install/install.sh
# Service Management
./run.sh # Start service
./stop.sh # Stop service
./status.sh # Check status
# Logs
tail -f /tmp/deepseek_ocr.log # Follow logs
grep error /tmp/deepseek_ocr.log # Find errors
# API Testing
curl http://localhost:8000/health # Health check
curl http://localhost:8000/docs # API documentation
# GPU Monitoring
nvidia-smi # Check GPU status
watch -n1 nvidia-smi # Monitor GPU continuously# Create .env
cp .env.example .env
# Generate token
python -c "import secrets; print(secrets.token_hex(32))"
# Edit .env
nano .env# Check if running
ps aux | grep serve_pdf.py
# Check port
netstat -tuln | grep 8000
lsof -i :8000
# Force stop all
pkill -9 -f serve_pdf.py
rm -f /tmp/deepseek_ocr.pid
# Clean restart
./stop.sh && rm -f /tmp/deepseek_ocr.log && ./run.shVersion: 1.0.0 Last Updated: 2025-10-31