# Vector Store

This notebook demonstrates how to work with vector stores for semantic search and retrieval.

We'll cover:
- Loading and chunking text documents
- Storing document embeddings in different vector databases
- Querying and retrieving relevant documents
- Using vector stores as configurable LangChain runnables

In [1]:
from dotenv import load_dotenv
from rich import print

load_dotenv(verbose=True)

# %load_ext autoreload
# %autoreload 2

# import sys

# sys.path.append(".")

True

### Split the text into chunks

First we load a text document and split it into smaller chunks for processing:

- Uses LangChain's `TextLoader` to load the file
- Applies `RecursiveCharacterTextSplitter` to break text into 2000-character chunks
- No overlap between chunks is configured

In [2]:
from genai_tk.core.embeddings_factory import EmbeddingsFactory
from genai_tk.core.embeddings_store import VECTOR_STORE_ENGINE, EmbeddingsStore
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = TextLoader("use_case_data/other/state_of_the_union.txt")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print(texts)

### Store document embeddings in a vector database

We use our `EmbeddingsStore` to:

1. Select a vector store backend (default is in-memory)
2. Configure the embedding model (default from config)
3. Add our document chunks to the store

Key benefits of the factory pattern:
- Easy switching between vector store implementations
- Consistent interface regardless of backend
- Centralized configuration management

In [3]:
embeddings_store = EmbeddingsStore.create_from_config("in_memory")

print(embeddings_store)

db = embeddings_store.get()
db.add_documents(texts)

  db = embeddings_store.get()
[32m2026-02-03 14:26:01.127[0m | [34m[1mDEBUG   [0m | [36mgenai_tk.core.embeddings_store[0m:[36mget_vector_store[0m:[36m428[0m - [34m[1mget vector store  : InMemory/embeddings_qwen3_06b[0m


['7c29b305-1c34-4788-85fc-bd7d8a48c299',
 '729fefeb-6439-4c43-a759-622e8e4e03fd',
 '80613923-4f5e-46c0-aa95-8feed3c9d333',
 'a4ff6945-06b6-46ec-87f0-49f8ff5f2d82',
 '385078f9-fbe7-45c5-a0fb-c0f5ced28712',
 '6036cdb0-63f8-40f0-b1f4-540ff753e307',
 'de26b62a-f1b9-445d-9600-768ffa32dccd',
 'db4042b2-a7cb-4370-a13d-6fb093b67bfe',
 '21daba96-edbc-4f91-b1d2-c1eec738091f',
 '0f3fe55a-8139-4ffe-bc85-7b2342adcb67',
 '1a1d1347-84c4-40ef-b111-6adc179ab949',
 '5651b05e-68db-4a69-a181-c0c56c5230c8',
 'eac6d557-0b86-490c-81c6-2636c5d3acc7',
 '9bbd515b-4846-4de1-a317-29332eaeb66d',
 'bb5278f7-ac03-4871-85ca-6b492fa7811a',
 'd5f9fbfd-bee8-44f5-b320-c3d9bab8d00f',
 '51ff74d7-9363-422c-ba1a-b14b1cae7477',
 'b19f6620-2e8b-4991-a754-98682f74a936',
 '79cf9b58-e291-45c8-a515-d4e1169e4b25',
 'accd506a-b886-4765-aa83-aea1b8f7793b',
 '04cd92ad-b6e8-4500-acdd-92f563a37b2f']

### Test semantic search queries

We'll search for content related to:
1. "What did the president say about Ketanji Brown Jackson" (English)
2. "Qu'as dit le président sur Ketanji Brown Jackson" (French)

This demonstrates:
- The vector store finds relevant content regardless of query language
- Semantic similarity works across languages when using multilingual embeddings

In [4]:
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query, k=3)
print(docs)

In [5]:
query = "Qu'as dit le président sur Ketanji Brown Jackson"
docs = db.similarity_search(query, k=3)
print(docs)

### Vector Store as Runnable

LangChain's `as_retriever()` converts the vector store into a runnable component that can:

- Be chained with other LangChain components
- Support streaming and async operations
- Be configured with search parameters

In [6]:
retriever = db.as_retriever()

a = retriever.invoke(query, k=1)
print(a)