Skip to content

Comments

feat: add Jina AI embedding provider#245

Merged
MaojiaSheng merged 1 commit intovolcengine:mainfrom
hanxiao:feat/jina-embedding-provider
Feb 22, 2026
Merged

feat: add Jina AI embedding provider#245
MaojiaSheng merged 1 commit intovolcengine:mainfrom
hanxiao:feat/jina-embedding-provider

Conversation

@hanxiao
Copy link
Contributor

@hanxiao hanxiao commented Feb 22, 2026

Add Jina AI as a supported embedding provider for dense embeddings.

MMTEB Multilingual Benchmark

MMTEB scores vs model size. jina-v5-text models (red) outperform models 2-16x their size.

MTEB English Benchmark

MTEB English v2 scores. v5-text-nano (239M) achieves 71.0, matching models with 2x+ parameters.

Both models are open-weight (Apache 2.0) and support Matryoshka dimension reduction, task-specific embeddings, and local deployment via GGUF/MLX.

Paper: arXiv:2602.15547 | Blog | HuggingFace

Features

  • API mode: Jina AI hosted API at https://api.jina.ai/v1 (OpenAI-compatible)
  • Local mode: Open-weight models available in GGUF and MLX formats on HuggingFace. Run locally with llama.cpp, MLX, or vLLM and point api_base to your local server.
  • Task-specific embeddings via task parameter
  • Late chunking support via late_chunking parameter
  • Matryoshka dimension reduction via dimensions parameter

Changes

  • New jina_embedders.py with JinaDenseEmbedder
  • Register jina provider in embedding_config.py
  • Update docs (EN/ZH), README, FAQ with Jina provider info
  • Add unit tests with mocked API calls

@CLAassistant
Copy link

CLAassistant commented Feb 22, 2026

CLA assistant check
All committers have signed the CLA.

@hanxiao hanxiao force-pushed the feat/jina-embedding-provider branch 3 times, most recently from 4139a81 to c7cd135 Compare February 22, 2026 03:25
@hanxiao
Copy link
Contributor Author

hanxiao commented Feb 22, 2026

@MaojiaSheng @ZaynJarvis Could you review this PR when you get a chance? Thanks!

@hanxiao hanxiao force-pushed the feat/jina-embedding-provider branch from c7cd135 to 658d757 Compare February 22, 2026 03:47
"model_name": cfg.model,
"api_key": cfg.api_key,
"api_base": cfg.api_base,
"dimension": cfg.dimension,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like jina cannot set other dimension except the configured ones, consider remove this config? (includes docs) also does JinaEmbedder has dimension value validation?

Other lgtm

Copy link
Contributor Author

@hanxiao hanxiao Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v5 supports arbitrary dimensions via MRL - any value (1, 33, 34, 512, etc.) up to model max works. max is per-model: 1024 for small, 768 for nano. validation added.

Copy link
Contributor Author

@hanxiao hanxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Regarding the dimension config:

Jina v5 models support Matryoshka Representation Learning (MRL), so the dimensions parameter can truncate embeddings to any value up to the max dimension (1024 for small, 768 for nano). The API handles truncation + L2 renormalization server-side.

I will add dimension validation in JinaDenseEmbedder.__init__ to ensure the requested dimension does not exceed the model max. Pushing the fix now.

@hanxiao hanxiao force-pushed the feat/jina-embedding-provider branch from 658d757 to 79957ff Compare February 22, 2026 04:47
Copy link
Collaborator

@ZaynJarvis ZaynJarvis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, lgtm @MaojiaSheng

@MaojiaSheng
Copy link
Collaborator

Thanks, will be merged

@MaojiaSheng MaojiaSheng merged commit ea2a508 into volcengine:main Feb 22, 2026
1 check passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants