Clarification on Multi-Vector Embedding Implementation in Milvus #40424
Unanswered
ranjith502
asked this question in
Q&A and General discussion
Replies: 2 comments 7 replies
-
|
Beta Was this translation helpful? Give feedback.
6 replies
-
embedding list support is on our roadmap and hopefully it can be released in the next 2 months. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Team,
I've implemented multi-vector embedding in Milvus using concepts from the following resources:
🔹 ColPali with Milvus
🔹 FastEmbed ColBERT
The embeddings I’m using are generated using the ColBERT model, which produces a multi-vector representation (instead of a single vector per document). I wanted to verify whether my approach is correct and whether there are any optimizations or best practices I should follow.
1️⃣ Dataset & Embeddings:
I have 20 descriptions, and each description is processed using ColBERT, generating 48 embeddings of shape (48, 128).
Since Milvus does not allow inserting (48,128) at once, I insert 48 separate vectors per description.
Final storage: Instead of 20 entries, my collection ends up with 960 (48 × 20) vectors.
2️⃣ Schema Definition & Indexing
client = MilvusClient(uri=f"http://{host}:19530")
Define schema
schema1 = MilvusClient.create_schema()
schema1.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema1.add_field(field_name="doc_id", datatype=DataType.INT64)
schema1.add_field(field_name="descriptions_embeddings", datatype=DataType.FLOAT_VECTOR, dim=128)
schema1.add_field(field_name="descriptions", datatype=DataType.VARCHAR, max_length=10000, nullable=True)
Create collection
client.create_collection(collection_name=collection_name, schema=schema1)
Insert embeddings (flattening multi-vector representation)
data_to_insert = [
{
"id": i * 1000 + j, # Unique ID per vector
"doc_id": i, # Keeps track of document ID
"descriptions_embeddings": vector, # Single 128-dimensional vector
"descriptions": description # Corresponding description
}
for i, (embedding_list, description) in enumerate(zip(descriptions_embeddings, descriptions))
for j, vector in enumerate(embedding_list)
]
client.insert(collection_name=collection_name, data=data_to_insert)
client.flush(collection_name=collection_name)
Create an index
index_params = client.prepare_index_params()
index_params.add_index(field_name='descriptions_embeddings', index_type='IVF_FLAT', metric_type='L2', params={})
client.create_index(collection_name=collection_name, index_params=index_params)
1️⃣ Query Embedding: Using ColBERT, a single query returns 32 embeddings of shape (32, 128).
2️⃣ Search: Each query embedding searches for 50 closest matches, leading to a max of 32 × 50 = 1600 embeddings.
3️⃣ Filter Unique doc_id: After removing duplicates, I retrieve all 48 embeddings per document.
4️⃣ Reranking: I apply MaxSim (ColBERT-style scoring) to compute final relevance scores.
Step 1: Perform Vector Search
results = client.search(
collection_name,
query_vector,
limit=50,
output_fields=["doc_id"], # Retrieve doc_id only
search_params={"metric_type": "L2", "params": {}},
)
Step 2: Extract Unique Document IDs
doc_ids = set()
for res in results:
for match in res:
doc_ids.add(match["entity"]["doc_id"])
Step 3: Fetch Full Document Embeddings and Compute MaxSim Score
def rerank_single_doc(doc_id, query_vector, client, collection_name):
doc_colbert_vecs = client.query(
collection_name=collection_name,
filter=f"doc_id == {doc_id}",
output_fields=["descriptions_embeddings"],
limit=5000,
)
doc_vecs = np.vstack([doc_colbert_vecs[i]["descriptions_embeddings"] for i in range(len(doc_colbert_vecs))])
score = np.dot(query_vector, doc_vecs.T).max(1).sum()
return (score, doc_id)
Reranking using ThreadPool
with concurrent.futures.ThreadPoolExecutor(max_workers=100) as executor:
futures = {executor.submit(rerank_single_doc, doc_id, query_vector, client, collection_name): doc_id for doc_id in doc_ids}
scores = [future.result() for future in concurrent.futures.as_completed(futures)]
Sort and return top results
scores.sort(key=lambda x: x[0], reverse=True)
topk_results = scores[:5]
Output
print(f"✅ Top-K Retrieved Documents:")
for rank, (score, doc_id) in enumerate(topk_results, start=1):
print(f"🏆 Rank {rank}: Document ID {doc_id} with score {score}")
output results:-
✅ Query embedding shape: (32, 128)
✅ Unique document IDs retrieved: {0, 1, 2, 3, 4, 5, 6, 18, 19}
📌 Document 0 retrieved 48 vectors.
📌 Document 18 retrieved 48 vectors.
📌 Document 3 retrieved 48 vectors.
📌 Document 19 retrieved 48 vectors.
📌 Document 6 retrieved 48 vectors.
✅ Top-K retrieved documents:
🏆 Rank 1: Document ID 4 with score 12.06
🏆 Rank 2: Document ID 19 with score 10.75
🏆 Rank 3: Document ID 6 with score 10.04
🏆 Rank 4: Document ID 1 with score 9.64
🏆 Rank 5: Document ID 18 with score 7.40
🎉 Search & Reranking Completed!
🔹 Questions for the Milvus Team
1️⃣ Is my approach of inserting embeddings correct? Since Milvus does not allow (48,128) insertion directly, I flattened the embeddings by inserting 48 separate vectors per document. Is this the recommended way?
2️⃣ Are there any optimizations to reduce storage while keeping multi-vector retrieval performance efficient? Right now, I am storing 48× more entities than the original document count.
3️⃣ Is there a better way to retrieve all embeddings belonging to a document (doc_id) in a more efficient manner? Currently, I use filter="doc_id == {doc_id}" during reranking, which retrieves all 48 vectors per document.
Beta Was this translation helpful? Give feedback.
All reactions