# Build any Vector Store with any Embedding Model

**Tools:**

1. LangChain: standardize way to implement (set up, create, and query) multiple vector stores
2. Vector Stores supported:
    1. Chroma
3. Embedding Models supported:
    1. HuggingFace

**References:**

1. [LangChain-Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma/)

In [1]:
import os
import sys
import warnings

import pandas as pd

from tqdm import tqdm
from uuid import uuid4

from langchain_chroma import Chroma
from langchain_huggingface import HuggingFaceEmbeddings

from langchain_core.documents import Document

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

import log_files
from data_processing import DataProcessing
from vector_stores import ChromaVectorStore, VectorStoreDirector

In [2]:
pd.set_option('max_colwidth', 800)
# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)
warnings.simplefilter(action='ignore', category=Warning)
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)

## Load Data

In [3]:
base_path = os.path.join(notebook_dir, '../data/')
financial_full_path = os.path.join(base_path, 'financial_phrase_bank/all_data-adjusted_header.csv')
financial_df = pd.read_csv(financial_full_path, encoding_errors = 'ignore')

In [4]:
financial_df['domain'] = 'financial'
financial_df = financial_df[:33]
financial_df

Unnamed: 0,sentiment,sentence,domain
0,neutral,"According to Gran , the company has no plans to move all production to Russia , although that is where the company is growing .",financial
1,neutral,"Technopolis plans to develop in stages an area of no less than 100,000 square meters in order to host companies working in computer technologies and telecommunications , the statement said .",financial
2,negative,"The international electronic industry company Elcoteq has laid off tens of employees from its Tallinn facility ; contrary to earlier layoffs the company contracted the ranks of its office workers , the daily Postimees reported .",financial
3,positive,With the new production plant the company would increase its capacity to meet the expected increase in demand and would improve the use of raw materials and therefore increase the production profitability .,financial
4,positive,"According to the company 's updated strategy for the years 2009-2012 , Basware targets a long-term net sales growth in the range of 20 % -40 % with an operating profit margin of 10 % -20 % of net sales .",financial
5,positive,FINANCING OF ASPOCOMP 'S GROWTH Aspocomp is aggressively pursuing its growth strategy by increasingly focusing on technologically more demanding HDI printed circuit boards PCBs .,financial
6,positive,"For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .",financial
7,positive,"In the third quarter of 2010 , net sales increased by 5.2 % to EUR 205.5 mn , and operating profit by 34.9 % to EUR 23.5 mn .",financial
8,positive,Operating profit rose to EUR 13.1 mn from EUR 8.7 mn in the corresponding period in 2007 representing 7.7 % of net sales .,financial
9,positive,"Operating profit totalled EUR 21.1 mn , up from EUR 18.6 mn in 2007 , representing 9.7 % of net sales .",financial


## Vector Store

In [5]:
collection_name = "prediction_collection-real_data"
persist_directory = "../data/chroma/chroma_langchain_db"
chroma_builder = ChromaVectorStore(collection_name, persist_directory)
chroma_builder

	Collection Name: prediction_collection-real_data
	Persist Directory: ../data/chroma/chroma_langchain_db
	Vector Store: None
	Docments: []
	UUIDS: None
	Embedding Model: None


<vector_stores.ChromaVectorStore at 0x147953c87750>

In [6]:
chroma_director = VectorStoreDirector(builder=chroma_builder)
embedding_model_name = "Hugging Face"
chroma_director.construct(embedding_model_name, financial_df)

### BUILDER ###
	<vector_stores.ChromaVectorStore object at 0x147953c87750>
### EMBEDDING MODEL ###


2025-10-01 12:00:24.813916: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-10-01 12:00:24.823428: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1759334424.832729 3598323 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1759334424.835539 3598323 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1759334424.843629 3598323 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

	Hugging Face
### INITIALIZE VECTOR STORE ###
	Collection Name: prediction_collection-real_data
	Embedding Model: model_name='sentence-transformers/all-mpnet-base-v2' cache_folder=None model_kwargs={} encode_kwargs={} query_encode_kwargs={} multi_process=False show_progress=False
	Persist Directory: ../data/chroma/chroma_langchain_db
	Vector Store (Original): <langchain_chroma.vectorstores.Chroma object at 0x147953d3f050>
	Vector Store (Prediction's Wrapper): <vector_stores.ChromaVectorStore object at 0x147953c87750>
### BUILD DOCUMENT ###
	Metadata Columns: ['sentiment', 'domain']


33it [00:00, 35830.19it/s]

	UUIDS (N = D): 33
	Documents (D) 33
### ADD DOCUMENTS TO VECTOR STORE ###





	Documents added: <langchain_chroma.vectorstores.Chroma object at 0x147953d3f050>
