Skip to content

Improve vector storage backend persistence#2367

Merged
Mijamind719 merged 1 commit into
volcengine:mainfrom
vincentsunx:61
Jun 1, 2026
Merged

Improve vector storage backend persistence#2367
Mijamind719 merged 1 commit into
volcengine:mainfrom
vincentsunx:61

Conversation

@vincentsunx
Copy link
Copy Markdown
Contributor

@vincentsunx vincentsunx commented Jun 1, 2026

Description

Improve vector storage backend for OpenViking, enabling OpenViking memory storage and retrieval through native openGauss vector tables. This PR includes an openGauss collection adapter, backend configuration support, HNSW vector index creation, metadata persistence, documentation updates, and tests for openGauss-specific behavior.

Related Issue

N/A

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Added an openGauss CollectionAdapter implementation using native SQL and the openGauss vector type.
  • Added support for collection lifecycle, dense vector search, scalar filtering, keyword search, fetch, upsert, delete, count, and index-related operations.
  • Added native HNSW physical vector index creation for openGauss.
  • Normalized OpenViking vector index metadata to hnsw so metadata matches the actual physical index created in openGauss.
  • Added support for cosine, l2, and ip distance metrics:
    • cosine uses vector_cosine_ops
    • l2 uses vector_l2_ops
    • ip uses vector_ip_ops
  • Added fail-fast behavior for physical vector index creation failures to avoid saving metadata for indexes that were not actually created.
  • Added openGauss backend configuration support, including host, port, user, password, database, schema, deployment mode, connection timeout, and vector column names.
  • Added optional dependency group for openGauss via openviking[opengauss].
  • Added openGauss adapter registration in the VectorDB adapter factory.
  • Added English and Chinese configuration documentation for the openGauss backend.
  • Added tests covering openGauss config validation, adapter creation, SQL/index generation, distance validation, score conversion, metadata behavior, and integration smoke paths.

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
  • Linux
  • macOS
  • Windows

Tested locally on Windows with:

  • py_compile for modified Python files
  • Local openGauss container
  • OpenViking started from current source
  • OpenClaw + OpenViking + openGauss end-to-end LoCoMo small run
  • Real openGauss smoke test for HNSW index creation

LoCoMo small result with openGauss + l2 + HNSW:

Overall accuracy: 33/35 = 94.29%
OpenViking memory rows: 52
Vector index: HNSW using vector_l2_ops

Token usage:

QA total tokens: 905,836
OpenViking LLM total tokens: 59,541
Embedding tokens: 19,341
Extracted memories: 40

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

N/A

Additional Notes

openGauss is treated as an optional backend dependency. Users need to install the optional driver when using backend: opengauss:

pip install "openviking[opengauss]"

Example VectorDB config:

{
  "backend": "opengauss",
  "name": "context_volcengine",
  "project": "default",
  "index_name": "default",
  "dimension": 1024,
  "distance_metric": "l2",
  "opengauss": {
    "host": "127.0.0.1",
    "port": 5432,
    "user": "omm",
    "password": "<password>",
    "db_name": "postgres",
    "schema": "public",
    "mode": "standalone",
    "connect_timeout": 10,
    "dense_vector_name": "vector",
    "sparse_vector_name": "sparse_vector"
  }
}

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 75
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Critical SQL Syntax Bug for Upserting Metadata

The _save_collection_meta and _save_index_meta methods use MySQL-specific ON DUPLICATE KEY UPDATE syntax, which is not compatible with openGauss (PostgreSQL-compatible). This will cause errors when saving collection and index metadata.

def _save_collection_meta(self, meta: Dict[str, Any]) -> None:
    self._execute(
        f"""
        INSERT INTO {self._meta_table_ref(_COLLECTION_META_TABLE)}
            (table_name, logical_collection_name, project_name, meta_json, updated_at)
        VALUES (%s, %s, %s, %s, CURRENT_TIMESTAMP)
        ON DUPLICATE KEY UPDATE
            logical_collection_name = VALUES(logical_collection_name),
            project_name = VALUES(project_name),
            meta_json = VALUES(meta_json),
            updated_at = CURRENT_TIMESTAMP
        """,
        [
            self.collection_key,
            self._logical_collection_name,
            self._project_name,
            _json_dumps(meta),
        ],
    )

def _save_index_meta(self, index_name: str, meta: Dict[str, Any]) -> None:
    self._execute(
        f"""
        INSERT INTO {self._meta_table_ref(_INDEX_META_TABLE)}
            (table_name, index_name, meta_json, updated_at)
        VALUES (%s, %s, %s, CURRENT_TIMESTAMP)
        ON DUPLICATE KEY UPDATE
            meta_json = VALUES(meta_json),
            updated_at = CURRENT_TIMESTAMP
        """,
        [self.collection_key, index_name, _json_dumps(meta)],
    )

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix upsert syntax for collection metadata table

Replace MySQL-specific ON DUPLICATE KEY UPDATE with PostgreSQL/openGauss-compatible
ON CONFLICT DO UPDATE syntax, using EXCLUDED to reference inserted values.

openviking/storage/vectordb_adapters/opengauss_adapter.py [393-410]

 self._execute(
     f"""
     INSERT INTO {self._meta_table_ref(_COLLECTION_META_TABLE)}
         (table_name, logical_collection_name, project_name, meta_json, updated_at)
     VALUES (%s, %s, %s, %s, CURRENT_TIMESTAMP)
-    ON DUPLICATE KEY UPDATE
-        logical_collection_name = VALUES(logical_collection_name),
-        project_name = VALUES(project_name),
-        meta_json = VALUES(meta_json),
+    ON CONFLICT (table_name) DO UPDATE SET
+        logical_collection_name = EXCLUDED.logical_collection_name,
+        project_name = EXCLUDED.project_name,
+        meta_json = EXCLUDED.meta_json,
         updated_at = CURRENT_TIMESTAMP
     """,
     [
         self.collection_key,
         self._logical_collection_name,
         self._project_name,
         _json_dumps(meta),
     ],
 )
Suggestion importance[1-10]: 9

__

Why: Replaces MySQL-specific ON DUPLICATE KEY UPDATE with PostgreSQL/openGauss-compatible ON CONFLICT DO UPDATE syntax, which is critical for the adapter to work correctly.

High
Fix upsert syntax for index metadata table

Replace MySQL-specific ON DUPLICATE KEY UPDATE with PostgreSQL/openGauss-compatible
ON CONFLICT DO UPDATE syntax for the index metadata table.

openviking/storage/vectordb_adapters/opengauss_adapter.py [413-423]

 self._execute(
     f"""
     INSERT INTO {self._meta_table_ref(_INDEX_META_TABLE)}
         (table_name, index_name, meta_json, updated_at)
     VALUES (%s, %s, %s, CURRENT_TIMESTAMP)
-    ON DUPLICATE KEY UPDATE
-        meta_json = VALUES(meta_json),
+    ON CONFLICT (table_name, index_name) DO UPDATE SET
+        meta_json = EXCLUDED.meta_json,
         updated_at = CURRENT_TIMESTAMP
     """,
     [self.collection_key, index_name, _json_dumps(meta)],
 )
Suggestion importance[1-10]: 9

__

Why: Replaces MySQL-specific ON DUPLICATE KEY UPDATE with PostgreSQL/openGauss-compatible ON CONFLICT DO UPDATE syntax for the index metadata table, which is critical for the adapter to work correctly.

High

@vincentsunx
Copy link
Copy Markdown
Contributor Author

PR Code Suggestions ✨

Explore these optional code suggestions:

Category **Suggestion                                                                                                                                    ** Impact
Possible issue
Fix upsert syntax for collection metadata table
Replace MySQL-specific ON DUPLICATE KEY UPDATE with PostgreSQL/openGauss-compatible ON CONFLICT DO UPDATE syntax, using EXCLUDED to reference inserted values.

openviking/storage/vectordb_adapters/opengauss_adapter.py [393-410]

 self._execute(
     f"""
     INSERT INTO {self._meta_table_ref(_COLLECTION_META_TABLE)}
         (table_name, logical_collection_name, project_name, meta_json, updated_at)
     VALUES (%s, %s, %s, %s, CURRENT_TIMESTAMP)
-    ON DUPLICATE KEY UPDATE
-        logical_collection_name = VALUES(logical_collection_name),
-        project_name = VALUES(project_name),
-        meta_json = VALUES(meta_json),
+    ON CONFLICT (table_name) DO UPDATE SET
+        logical_collection_name = EXCLUDED.logical_collection_name,
+        project_name = EXCLUDED.project_name,
+        meta_json = EXCLUDED.meta_json,
         updated_at = CURRENT_TIMESTAMP
     """,
     [
         self.collection_key,
         self._logical_collection_name,
         self._project_name,
         _json_dumps(meta),
     ],
 )

Suggestion importance[1-10]: 9
__

Why: Replaces MySQL-specific ON DUPLICATE KEY UPDATE with PostgreSQL/openGauss-compatible ON CONFLICT DO UPDATE syntax, which is critical for the adapter to work correctly.

High
Fix upsert syntax for index metadata table
Replace MySQL-specific ON DUPLICATE KEY UPDATE with PostgreSQL/openGauss-compatible ON CONFLICT DO UPDATE syntax for the index metadata table.

openviking/storage/vectordb_adapters/opengauss_adapter.py [413-423]

 self._execute(
     f"""
     INSERT INTO {self._meta_table_ref(_INDEX_META_TABLE)}
         (table_name, index_name, meta_json, updated_at)
     VALUES (%s, %s, %s, CURRENT_TIMESTAMP)
-    ON DUPLICATE KEY UPDATE
-        meta_json = VALUES(meta_json),
+    ON CONFLICT (table_name, index_name) DO UPDATE SET
+        meta_json = EXCLUDED.meta_json,
         updated_at = CURRENT_TIMESTAMP
     """,
     [self.collection_key, index_name, _json_dumps(meta)],
 )

Suggestion importance[1-10]: 9
__

Why: Replaces MySQL-specific ON DUPLICATE KEY UPDATE with PostgreSQL/openGauss-compatible ON CONFLICT DO UPDATE syntax for the index metadata table, which is critical for the adapter to work correctly.

High

建议里的 ON CONFLICT ... DO UPDATE 在当前 openGauss 里直接报错:syntax error at or near "CONFLICT"。
我已经改成 openGauss 可执行的 MERGE INTO,同时覆盖 collection metadata 和 index metadata 两处 upsert。

@vincentsunx vincentsunx changed the title add openGauss vectordb support vector data storage optimization Jun 1, 2026
@vincentsunx vincentsunx changed the title vector data storage optimization Improve vector storage backend persistence Jun 1, 2026
@Mijamind719 Mijamind719 merged commit c0abcb9 into volcengine:main Jun 1, 2026
11 of 12 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project Jun 1, 2026
r266-tech added a commit to r266-tech/OpenViking that referenced this pull request Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants