Fix: Map "ivf_flat" to "ivfflat" for pgvector index access method#763
Conversation
- IndexType.IVFFlat.value="IVF_FLAT" → .lower()="ivf_flat" caused SQL to fail with "access method 'ivf_flat' does not exist" - pgvector PostgreSQL extension expects "ivfflat" (no underscore), not "ivf_flat" - Added explicit mapping after lowercase normalization: if index_type_lower == "ivf_flat": index_type_lower = "ivfflat"
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: NagarajuReddyBoggala The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/assign @XuanYang-cn |
|
@NagarajuReddyBoggala Please run |
|
@XuanYang-cn Thanks for pointing this out! Note: These linting issues (long comments at lines 338-340, commented code at 377/394) were introduced by [PR #760 ] which got merged earlier. My PR scope: Only added the Should I fix all lint issues caused by PR #760? |
|
@NagarajuReddyBoggala Thanks for the clarification — you're right, those are from #760, not this PR. Since you're already in the file, it'd be great if you could include the cleanup here — but totally your call. If you'd rather keep this PR scoped to the |
|
Done @XuanYang-cn ! Pushed lint cleanup + fixes to the PR Changes in 2nd commit:
Thanks for the quick feedback |
For non-thread-safe DBs (e.g. PgVector), ConcurrentInsertRunner clamps max_workers to 1, so there is always exactly one worker thread. There is no need to deepcopy self.db per thread — the single worker can use self.db directly via the connection already opened by task()'s `with self.db.init():`. The original code called deepcopy(self.db) inside _get_thread_db() after task() had already opened a live psycopg C-extension Connection on self.db. C-extension objects cannot be deep-copied, causing: TypeError: no default __reduce__ due to non-trivial __cinit__ Fix: remove the deepcopy branch entirely. All workers (thread-safe or not) now use self.db directly; thread-safety is guaranteed for non-thread-safe DBs by the max_workers=1 clamp. Also clean up stale comments in pgvector.py left over from zilliztech#760/zilliztech#763. Adds tests/test_pgvector.py with: - unit test that reproduces the bug (fails on original, passes on fix) - e2e regression test via ConcurrentInsertRunner + OpenAI 50K dataset See also: zilliztech#756 Signed-off-by: yangxuan <xuan.yang@zilliz.com>
For non-thread-safe DBs (e.g. PgVector), ConcurrentInsertRunner clamps max_workers to 1, so there is always exactly one worker thread. There is no need to deepcopy self.db per thread — the single worker can use self.db directly via the connection already opened by task()'s `with self.db.init():`. The original code called deepcopy(self.db) inside _get_thread_db() after task() had already opened a live psycopg C-extension Connection on self.db. C-extension objects cannot be deep-copied, causing: TypeError: no default __reduce__ due to non-trivial __cinit__ Fix: remove the deepcopy branch entirely. All workers (thread-safe or not) now use self.db directly; thread-safety is guaranteed for non-thread-safe DBs by the max_workers=1 clamp. Also clean up stale comments in pgvector.py left over from #760/#763. Adds tests/test_pgvector.py with: - unit test that reproduces the bug (fails on original, passes on fix) - e2e regression test via ConcurrentInsertRunner + OpenAI 50K dataset See also: #756 Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Problem
PGVector IVFFlat index creation fails via UI with:
But works fine via CLI.
Root Cause
The frontend config sends the wrong IndexType values:
IndexType.IVFFlat.value= "IVF_FLAT" instead of "ivfflat"pgvector PostgreSQL extension expects "ivfflat" (no underscore), not "ivf_flat"
Solution
Before:
"IVF_FLAT"→"ivf_flat"→ ERRORAfter:
"IVF_FLAT"→"ivf_flat"→"ivfflat"→ ✅CREATE INDEX USING ivfflatEvidence
UI logs (broken):
