Production Docker deployment for DeepSeek-OCR-2 with OpenAI-compatible API.
.
├── backend # Back-end service
├── frontend # Front-end page
├── docker-compose.yaml # docker compose 配置
├── Dockerfile # Production Docker image
├── openai_server.py # Custom OpenAI-compatible API server
└── README.md # This file
DeepSeek-OCR-2 includes a native vLLM model implementation (deepseek_ocr2.py), but it cannot be used directly with vLLM's built-in OpenAI server. The issue is image preprocessing:
- vLLM's OpenAI server passes raw PIL images to the model's processor
- DeepSeek's processor expects images pre-processed by
tokenize_with_images()— a custom method that handles dynamic resolution cropping, tiling, and feature extraction
When you try to use vllm serve directly, you get errors like:
TypeError: cannot unpack non-iterable Image object
TypeError: 'Image' object is not subscriptable
Our openai_server.py solves this by:
- Extracting base64 images from OpenAI-format requests
- Preprocessing them using DeepSeek's
tokenize_with_images()method (same as theirrun_dpsk_ocr2_image.py) - Passing the processed features to vLLM's
AsyncLLMEngine - Returning OpenAI-compatible responses with streaming support
This approach uses the exact same preprocessing pipeline as DeepSeek's official scripts, ensuring correct results.
- Docker with NVIDIA Container Toolkit
- NVIDIA GPU with CUDA 11.8+ support
- ~8GB+ VRAM (model uses ~6.3GB)
docker build -t deepseek-ocr2 .docker run --gpus all -p 8000:8000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
deepseek-ocr2curl http://localhost:8000/v1/modelscurl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-OCR-2",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'$(base64 -w0 image.jpg)'"}},
{"type": "text", "text": "<|grounding|>Convert the document to markdown."}
]
}],
"max_tokens": 8192
}'curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-OCR-2",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'$(base64 -w0 image.jpg)'"}},
{"type": "text", "text": "Free OCR."}
]
}],
"max_tokens": 8192
}'curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-OCR-2",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "http://127.0.0.1/image_url.png"}} ,
{"type": "text", "text": "<|grounding|>Convert the document to markdown."}
]
}],
"max_tokens": 8192
}'
Add "stream": true to the request body for streaming responses.
| Variable | Default | Description |
|---|---|---|
GPU_MEMORY_UTILIZATION |
0.90 |
GPU memory fraction to use |
MAX_MODEL_LEN |
8192 |
Maximum sequence length (input + output) |
TENSOR_PARALLEL_SIZE |
1 |
Number of GPUs for tensor parallelism |
- Total context window: 8192 tokens (input + output combined)
- Visual tokens: up to 1120 tokens per image ((0-6)×144 + 256)
- Default max_tokens: 8192 (will be limited by remaining context after input)
- Practical output limit: ~7000 tokens for typical images
| Prompt | Description |
|---|---|
<|grounding|>Convert the document to markdown. |
OCR with layout detection (bounding boxes) |
Free OCR. |
Plain text extraction without layout |
curl http://localhost:8000/health- First request will be slow as the model loads (~30-40 seconds)
- Model weights are cached in
~/.cache/huggingface - Supports base64-encoded JPEG/PNG images
- Returns markdown with HTML tables for tabular content
A pure frontend, browser-only document processing tool that converts scanned images and multi-page PDFs into various editable formats using DeepSeek-OCR.
DeepSeek-OCR2-WebUI is designed to handle document conversion tasks entirely within the browser. By leveraging modern web technologies like Web Workers and IndexedDB, it provides a powerful, privacy-focused alternative to server-side document processing.
- Frontend Only: No backend services required (except for DeepSeek-OCR2 API).
- Privacy First: Documents never leave your browser for processing.
- Large Document Support: Optimized for hundreds of pages using virtual lists and efficient memory management.
- Persistent State: Progress and intermediate results survive page refreshes using IndexedDB.
- Framework: Vue 3 (Composition API)
- Language: TypeScript
- UI Library: Naive UI
- State Management: Pinia
- Database: Dexie.js (IndexedDB)
- PDF Core:
pdfjs-dist(Rendering) &pdf-lib(Generation) - Converters:
markdown-it(Markdown) &docx(Word) - Build Tool: Vite
- Testing: Vitest & Playwright