# Load any Vector Stores + Get Top K

**Tools:**

1. LangChain: standardize way to implement (set up, create, and query) multiple vector stores
2. Vector Stores supported:
    1. Chroma
3. Embedding Models supported:
    1. HuggingFace

**References:**

1. [LangChain-Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma/)

In [1]:
import os
import sys
import chromadb

import pandas as pd

from tqdm import tqdm
from uuid import uuid4

from langchain_chroma import Chroma
from langchain_huggingface import HuggingFaceEmbeddings

from langchain_core.documents import Document

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from vector_stores import ChromaVectorStore, VectorStoreDirector

In [2]:
pd.set_option('max_colwidth', 800)
# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)

## Load Vector Store

In [3]:
collection_name = "prediction_collection-real_data"
persist_directory = "../data/chroma/chroma_langchain_db"
chroma_loader = ChromaVectorStore(collection_name, persist_directory)
chroma_loader

	Collection Name: prediction_collection-real_data
	Persist Directory: ../data/chroma/chroma_langchain_db
	Vector Store: None
	Docments: []
	UUIDS: None
	Embedding Model: None


<vector_stores.ChromaVectorStore at 0x147c4f74ba10>

In [4]:
chroma_director = VectorStoreDirector(loader=chroma_loader)
embedding_model_name = "Hugging Face"
query_string = "Hey"
k = 3
chroma_director.query(embedding_model_name, query_string, k)

### LOADER ###
	<vector_stores.ChromaVectorStore object at 0x147c4f74ba10>
### INITIALIZE CLIENT VECTOR STORE ###
	Vector Store (Prediction's Wrapper): None
### LOAD EMBEDDING MODEL ###


2025-10-01 17:07:37.219114: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-10-01 17:07:37.227586: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1759352857.236496 3669562 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1759352857.239136 3669562 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1759352857.246935 3669562 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

	Hugging Face
### LOAD VECTOR STORE ###
	Collection Name: prediction_collection-real_data
	Embedding Model: model_name='sentence-transformers/all-mpnet-base-v2' cache_folder=None model_kwargs={} encode_kwargs={} query_encode_kwargs={} multi_process=False show_progress=False
	Persist Directory: ../data/chroma/chroma_langchain_db
	Vector Store (Original): <langchain_chroma.vectorstores.Chroma object at 0x147a7c93e890>
	Vector Store (Prediction's Wrapper): <vector_stores.ChromaVectorStore object at 0x147c4f74ba10>
	Documents (D) 40
### TOP K ###
	1. Similarity
		* A purchase agreement for 7,200 tons of gasoline with delivery at the Hamina terminal , Finland , was signed with Neste Oil OYj at the average Platts index for this September plus eight US dollars per month . [{'sentiment': 'positive', 'domain': 'financial'}]

		* 10 February 2011 - Finnish media company Sanoma Oyj HEL : SAA1V said yesterday its 2010 net profit almost tripled to EUR297 .3 m from EUR107 .1 m for 2009 and announced

In [5]:
for idx, word in enumerate(words):
    print(f"\t\t\t-------{idx}, {word}-------")
    print("1. Similarity")
    results = vector_store_from_client.similarity_search(
        word,
        k=3,
    )
    for res in results:
        print(f"\t* {res.page_content} [{res.metadata}]\n")

    print("2. Similarity with score")
    results = vector_store_from_client.similarity_search_with_score(
        word, k=3,
    )
    for res, score in results:
        print(f"\t* [SIM={score:3f}] {res.page_content} [{res.metadata}]\n")

    print("3. Similarity by vector")
    results = vector_store_from_client.similarity_search_by_vector(
        embedding=embeddings.embed_query(word), k=3
    )
    for doc in results:
        print(f"\t* {doc.page_content} [{doc.metadata}]\n")

    print("4. Retriever")
    retriever = vector_store_from_client.as_retriever(
        search_type="mmr", search_kwargs={"k": 3, "fetch_k": 5}
    )
    retriever.invoke(word)

NameError: name 'words' is not defined