feat: add Jina AI embedding provider#245
Conversation
4139a81 to
c7cd135
Compare
|
@MaojiaSheng @ZaynJarvis Could you review this PR when you get a chance? Thanks! |
c7cd135 to
658d757
Compare
| "model_name": cfg.model, | ||
| "api_key": cfg.api_key, | ||
| "api_base": cfg.api_base, | ||
| "dimension": cfg.dimension, |
There was a problem hiding this comment.
looks like jina cannot set other dimension except the configured ones, consider remove this config? (includes docs) also does JinaEmbedder has dimension value validation?
Other lgtm
There was a problem hiding this comment.
v5 supports arbitrary dimensions via MRL - any value (1, 33, 34, 512, etc.) up to model max works. max is per-model: 1024 for small, 768 for nano. validation added.
hanxiao
left a comment
There was a problem hiding this comment.
Thanks for the review! Regarding the dimension config:
Jina v5 models support Matryoshka Representation Learning (MRL), so the dimensions parameter can truncate embeddings to any value up to the max dimension (1024 for small, 768 for nano). The API handles truncation + L2 renormalization server-side.
I will add dimension validation in JinaDenseEmbedder.__init__ to ensure the requested dimension does not exceed the model max. Pushing the fix now.
658d757 to
79957ff
Compare
|
Thanks, will be merged |
Add Jina AI as a supported embedding provider for dense embeddings.
MMTEB scores vs model size. jina-v5-text models (red) outperform models 2-16x their size.
MTEB English v2 scores. v5-text-nano (239M) achieves 71.0, matching models with 2x+ parameters.
Both models are open-weight (Apache 2.0) and support Matryoshka dimension reduction, task-specific embeddings, and local deployment via GGUF/MLX.
Paper: arXiv:2602.15547 | Blog | HuggingFace
Features
https://api.jina.ai/v1(OpenAI-compatible)api_baseto your local server.taskparameterlate_chunkingparameterdimensionsparameterChanges
jina_embedders.pywithJinaDenseEmbedderjinaprovider inembedding_config.py