Skip to content

Add text2vec-digitalocean vectorizer module#2041

Merged
dirkkul merged 2 commits into
mainfrom
text2vec-digitalocean
May 19, 2026
Merged

Add text2vec-digitalocean vectorizer module#2041
dirkkul merged 2 commits into
mainfrom
text2vec-digitalocean

Conversation

@mpartipilo
Copy link
Copy Markdown
Contributor

@mpartipilo mpartipilo commented May 18, 2026

Summary

Adds support for the new text2vec-digitalocean vectorizer module exposed by Weaviate server. The shape mirrors text2vec-mistral (model + baseURL + vectorizeClassName), so the existing serialization path and Pydantic URL normalization are reused unchanged.

Following review, only the supported Configure.Vectors API surface is extended (matching the precedent set by text2vec-morph and other recent additions). Configure.Vectorizer and Configure.NamedVectors are not touched — they're the deprecated paths.

API surface added

  • Vectorizers.TEXT2VEC_DIGITALOCEAN enum entry
  • _Text2VecDigitalOceanConfig Pydantic config class
  • Configure.Vectors.text2vec_digitalocean(name=..., model=..., ...) — the only factory entry point

Parameters

Python (snake_case) Wire (camelCase) Required Default
model model yes (server requires it; client enforces too) none
base_url baseURL no server default https://inference.do-ai.run
vectorize_collection_name vectorizeClassName no True

Test plan

  • Unit test added to test/collection/test_config.py (in TEST_CONFIG_WITH_VECTORS_PARAMETERS) asserting wire-format serialization, including the trailing-slash URL normalization Pydantic AnyHttpUrl applies
  • pytest test/collection/test_config.py — 185 pass
  • ruff check + ruff format --check clean on all 4 changed files
  • Runtime smoke-test via REPL: factory produces the expected wire payload; calling without model= raises TypeError from Pydantic

Review-feedback changes

Second commit (22aa8fce) addresses @dirkkul's review:

  • Make model required on the Pydantic class and on the factory parameter
  • Remove the Configure.Vectorizer.text2vec_digitalocean factory (deprecated API surface; not extended for new modules)
  • Remove the Configure.NamedVectors.text2vec_digitalocean factory (same reason)
  • Drop the now-orphan _Text2VecDigitalOceanConfig import from config_named_vectors.py
  • Remove the two corresponding test cases; update the Vectors test to pass the required model

Closes #2038

Sibling PRs (same feature, fanned out across SDKs):

🤖 Generated with Claude Code

Adds `text2vec-digitalocean` to the `Vectorizers` enum and exposes
factory methods on `Configure.Vectorizer`, `Configure.NamedVectors`,
and `Configure.Vectors`. The module accepts an optional `base_url`
(server default `https://inference.do-ai.run`) and a `model`
(required by the server, e.g. `qwen3-embedding-0.6b`).

The shape mirrors `text2vec-mistral` exactly (model + baseURL +
vectorizeClassName), so serialization, URL normalization, and the
existing _to_dict baseURL-stripping path are reused unchanged.

Closes #2038

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@orca-security-eu orca-security-eu Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

Comment thread weaviate/collections/classes/config_vectors.py Outdated
Comment thread weaviate/collections/classes/config_vectorizers.py Outdated
Comment thread weaviate/collections/classes/config_named_vectors.py Outdated
@mpartipilo mpartipilo marked this pull request as ready for review May 18, 2026 11:36
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 86.72%. Comparing base (78fa5f7) to head (22aa8fc).
⚠️ Report is 163 commits behind head on main.

Files with missing lines Patch % Lines
weaviate/collections/classes/config_vectorizers.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2041      +/-   ##
==========================================
- Coverage   87.95%   86.72%   -1.23%     
==========================================
  Files         280      299      +19     
  Lines       21664    22928    +1264     
==========================================
+ Hits        19054    19884     +830     
- Misses       2610     3044     +434     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dirkkul dirkkul merged commit f26bee0 into main May 19, 2026
125 checks passed
@dirkkul dirkkul deleted the text2vec-digitalocean branch May 19, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for text2vec-digitalocean module

3 participants