This project provides a FastAPI-based service for generating multi-modal embeddings (Text, Image, PDF) using ColQwen/ColPali models, suitable for indexing in Qdrant.
- Python 3.12+ (managed via Conda recommended)
- Qdrant running locally (default:
localhost:6334) - GPU recommended for inference (though
mps/Cpu is supported)
The service device can be configured via the DEVICE environment variable.
DEVICE=auto(Default: Auto-selects CUDA > MPS > CPU)DEVICE=mps(Apple Silicon GPU)DEVICE=cpu(Processor)DEVICE=cuda(NVIDIA GPU)DEVICE=cuda:0(Specific GPU)BATCH_SIZE=1(Number of images to process per batch. Reduce if OOM occurs. Default: 1)
-
Create and Activate Environment
conda create -n qdrant python=3.12 conda activate qdrant
-
Install Dependencies
pip install -r requirements.txt
Note: Ensure you have
popplerinstalled for PDF processing (e.g.,brew install poppleron macOS).
-
Build the Image
docker build -t embedding-service . -
Run the Container
Run the container, specifying the device (default is
cpuin Dockerfile, but you can override it).docker run -p 8025:8025 -e DEVICE=cpu embedding-service
Note: The service listens on port 8025 inside the container.
Start the FastAPI server using uvicorn:
uvicorn api.main:app --host 0.0.0.0 --port 8001 --reloadThe service will load the ColQwen model (approx. 4B params) on startup. This may take a few moments.
Generate embeddings for a text query.
- URL:
/process_query - Method:
POST - Payload:
{ "query": "your search query" } - Response:
{ "embedding": [0.123, ...] }
Generate embeddings for a PDF file located on the server filesystem.
- URL:
/get_pdf_embedding - Method:
POST - Content-Type:
multipart/form-data - Form Data:
file: Upload string PDF file (optional).pdf_path: Absolute path to the PDF on server (optional).- Note: One of
fileorpdf_pathmust be provided.
- Response:
{ "page_count": 1, "pages": [ { "page_number": 1, "size": [595, 842], "image_base64": "iVBORw0KGgoAAAANSUhEUgAA...", "embeddings": [[...], ...], "pooled_rows": [[...], ...], "pooled_cols": [[...], ...] } ] }
Generate embeddings for a batch of uploaded images.
- URL:
/encode_image_batch - Method:
POST - Content-Type:
multipart/form-data - Files: List of image files.
- Response:
{ "embeddings": [[...], ...], "pooled_rows": [[...], ...], "pooled_cols": [[...], ...] }
Check if the service is running and the model is loaded.
- URL:
/health - Method:
GET - Response:
{"status": "healthy", "model_loaded": true}
A test script is provided to verify the API endpoints (mocking the heavy model logic).
python tests/test_api_client.py