## Expert Knowledge Worker

### A Question-Answering Agent for Insurellm Employees  

This project aims to develop a highly accurate and cost-effective question-answering agent tailored for employees of **Insurellm**, an **Insurance Tech** company. The agent will serve as an **expert knowledge worker**, ensuring that employees receive precise and reliable responses to their inquiries.

### Approach: Retrieval-Augmented Generation (RAG)  

To achieve high accuracy, the system will leverage **Retrieval-Augmented Generation (RAG)**, which combines information retrieval with generative AI. This approach ensures that answers are grounded in **relevant and verified knowledge sources**, reducing the risk of hallucinations and inaccuracies.

### Implementation Strategy  

For the initial phase, we will implement a **brute-force RAG model**, which will:  
- **Index** internal documentation, policies, and other relevant knowledge bases.  
- **Retrieve** the most relevant context for each query.  
- **Generate** responses using a **pre-trained language model** fine-tuned for insurance-related queries.  

This method prioritizes **cost-efficiency** while maintaining **high accuracy**. As the system evolves, we may explore **optimized retrieval strategies, embedding-based search, and knowledge graph integration** to enhance performance further.

### Key Objectives  
✔ **Accurate** responses based on company-specific knowledge.  
✔ **Low-cost** implementation with minimal infrastructure overhead.  
✔ **Scalability** for future improvements and AI advancements.  

This first implementation will serve as the foundation for refining and expanding the **Insurellm Expert Knowledge Worker** to meet the company's evolving needs.  


In [1]:
# imports for langchain, plotly and Chroma
import warnings
warnings.simplefilter("ignore")

import os
import glob
import numpy as np
import gradio as gr
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from sklearn.manifold import TSNE
from langchain_chroma import Chroma
from langchain.schema import Document
from langchain.memory import ConversationBufferMemory
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ConversationalRetrievalChain
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.document_loaders import DirectoryLoader, TextLoader
from dotenv import load_dotenv

In [3]:
load_dotenv() 
MODEL = "gpt-4o-mini"
db_name = "vector_db"
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')

In [4]:
# Read in documents using LangChain's loaders
# Take everything in all the sub-folders of our knowledgebase
folders = glob.glob("knowledge-base/*")
def add_metadata(doc, doc_type):
    doc.metadata["doc_type"] = doc_type
    return doc

# With thanks to CG and Jon R, students on the course, for this fix needed for some users 
text_loader_kwargs = {'encoding': 'utf-8'}
# If that doesn't work, some Windows users might need to uncomment the next line instead
# text_loader_kwargs={'autodetect_encoding': True}
documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    documents.extend([add_metadata(doc, doc_type) for doc in folder_docs])

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

print(f"Total number of chunks: {len(chunks)}")
print(f"Document types found: {set(doc.metadata['doc_type'] for doc in documents)}")

Created a chunk of size 1088, which is longer than the specified 1000
Created a chunk of size 1031, which is longer than the specified 1000


Total number of chunks: 129
Document types found: {'products', 'employees', 'company', 'contracts'}


## A sidenote on Embeddings, and "Auto-Encoding LLMs"

We will be mapping each chunk of text into a Vector that represents the meaning of the text, known as an embedding.

OpenAI offers a model to do this, which we will use by calling their API with some LangChain code.

This model is an example of an "Auto-Encoding LLM" which generates an output given a complete input.
It's different to all the other LLMs we've discussed today, which are known as "Auto-Regressive LLMs", and generate future tokens based only on past context.

Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification.

### Sidenote

In week 8 we will return to RAG and vector embeddings, and we will use an open-source vector encoder so that the data never leaves our computer - that's an important consideration when building enterprise systems and the data needs to remain internal.

In [5]:
# Put the chunks of data into a Vector Store that associates a Vector Embedding with each chunk
# Chroma is a popular open source Vector Database based on SQLLite

embeddings = OpenAIEmbeddings()

# If you would rather use the free Vector Embeddings from HuggingFace sentence-transformers
# Then replace embeddings = OpenAIEmbeddings()
# with:
# from langchain.embeddings import HuggingFaceEmbeddings
# embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
# Delete if already exists

if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()

# Create vectorstore
vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
print(f"Vectorstore created with {vectorstore._collection.count()} documents")

Vectorstore created with 129 documents


In [6]:
# Let's investigate the vectors
collection = vectorstore._collection
count = collection.count()

sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)
print(f"There are {count:,} vectors with {dimensions:,} dimensions in the vector store")

There are 129 vectors with 1,536 dimensions in the vector store


## Visualizing the Vector Store

Let's take a minute to look at the documents and their embedding vectors to see what's going on.

In [7]:
# Prework (with thanks to Jon R for identifying and fixing a bug in this!)
result = collection.get(include=['embeddings', 'documents', 'metadatas'])
vectors = np.array(result['embeddings'])
documents = result['documents']
metadatas = result['metadatas']
doc_types = [metadata['doc_type'] for metadata in metadatas]
colors = [['blue', 'green', 'red', 'orange'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]

In [8]:
# Create the 2D scatter plot without t-SNE
fig = go.Figure(data=[go.Scatter(
    x=vectors[:, 0],
    y=vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x',yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [9]:
# We humans find it easier to visalize things in 2D!
# Reduce the dimensionality of the vectors to 2D using t-SNE
# (t-distributed stochastic neighbor embedding)

tsne = TSNE(n_components=2, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 2D scatter plot
fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x',yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [10]:
# Let's try 3D!
tsne = TSNE(n_components=3, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)
# Create the 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='3D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

## Time to use LangChain to bring it all together

In [11]:
# create a new Chat with OpenAI
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)
# Alternative - if you'd like to use Ollama locally, uncomment this line instead
# llm = ChatOpenAI(temperature=0.7, model_name='llama3.2', base_url='http://localhost:11434/v1', api_key='ollama')
# set up the conversation memory for the chat

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
# the retriever is an abstraction over the VectorStore that will be used during RAG

retriever = vectorstore.as_retriever()
# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory

conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)


Please see the migration guide at: https://python.langchain.com/docs/versions/migrating_memory/



In [12]:
# Let's try a simple question
query = "Please explain what Insurellm is in a couple of sentences"
result = conversation_chain.invoke({"question": query})
print(result["answer"])

Insurellm is an innovative insurance tech startup founded in 2015, specializing in developing software solutions for the insurance industry. With a range of products including platforms for auto, home, and reinsurance, as well as a marketplace connecting consumers with insurance providers, Insurellm aims to disrupt and enhance the insurance landscape.


In [11]:
# set up a new conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# putting it together: set up the conversation chain with the GPT 4o-mini LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

## Now we will bring this up in Gradio using the Chat interface -

A quick and easy way to prototype a chat with an LLM

In [13]:
# Wrapping that in a function
def chat(question, history):
    result = conversation_chain.invoke({"question": question})
    return result["answer"]

In [14]:
# And in Gradio:

view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

* Running on local URL:  http://127.0.0.1:7863

To create a public link, set `share=True` in `launch()`.


In [None]:
# Let's investigate what gets sent behind the scenes

from langchain_core.callbacks import StdOutCallbackHandler # this is the key to seeing what happens behind the scenes 
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
retriever = vectorstore.as_retriever()
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory, callbacks=[StdOutCallbackHandler()])

query = "Who received the prestigious IIOTY award in 2023?"
result = conversation_chain.invoke({"question": query})
answer = result["answer"]
print("\nAnswer:", answer)

In [17]:
# create a new Chat with OpenAI
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)
# set up the conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# the retriever is an abstraction over the VectorStore that will be used during RAG; k is how many chunks to use
retriever = vectorstore.as_retriever(search_kwargs={"k": 25})

# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [18]:
def chat(question, history):
    result = conversation_chain.invoke({"question": question})
    return result["answer"]

In [None]:
view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

# Summary
## Chatbot with:-
* langchain - openai
* 

[Under Langchain_documents there is github](https://python.langchain.com/docs/integrations/document_loaders/)

In [None]:
# imports for langchain, plotly and Chroma
import warnings
warnings.simplefilter("ignore")
import os
import glob
import numpy as np
import gradio as gr
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from sklearn.manifold import TSNE
from langchain_chroma import Chroma
from langchain.schema import Document
from langchain.vectorstores import Pinecone
from langchain.memory import ConversationBufferMemory
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ConversationalRetrievalChain
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.document_loaders import DirectoryLoader, TextLoader
from dotenv import load_dotenv

load_dotenv() 
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')

###### Loading the documents
##############################################################################
folders = glob.glob("knowledge-base/*")
def add_metadata(doc, doc_type):
    doc.metadata["doc_type"] = doc_type
    return doc

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs={'encoding': 'utf-8'})
    folder_docs = loader.load()
    documents.extend([add_metadata(doc, doc_type) for doc in folder_docs])

##### Spliting the documents into chunks
########################################################################
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

##### Embedding and Vector Store
######################################################################
embeddings = OpenAIEmbeddings()
db_name = "vector_db"
if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()
vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)

##### Retrieval
####################################################################
llm = ChatOpenAI(temperature=0.7, model_name="gpt-4o-mini")
memory=ConversationBufferMemory(memory_key='chat_history', return_messages=True)
retriever=vectorstore.as_retriever(search_kwargs={"k": 25})
conversation_chain=ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)
query = "you are the special agent you know about the secret mission and asnwer the question"

def chat(question, history):
    result=conversation_chain.invoke({"question": question})
    return result["answer"]

view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)


Created a chunk of size 1088, which is longer than the specified 1000
Created a chunk of size 1031, which is longer than the specified 1000


AttributeError: module 'openai' has no attribute 'OpenAI'

#####  RAG (Retrieval-Augmented Generation) can be done without LangChain by directly integrating OpenAI's models, a vector database (like Chroma), and retrieval logic. Below is a rewritten version of your code that achieves the same functionality without LangChain

In [None]:
import os
import glob
import warnings
import numpy as np
import gradio as gr
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from sklearn.manifold import TSNE
from dotenv import load_dotenv
import openai
from chromadb import PersistentClient
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
from sentence_transformers import SentenceTransformer

warnings.simplefilter("ignore")

# Load API key
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

# Define constants
MODEL = "gpt-4o-mini"
DB_NAME = "vector_db"

# Initialize ChromaDB
client = PersistentClient(path=DB_NAME)
collection_name = "rag_collection"

if collection_name in client.list_collections():
    client.delete_collection(collection_name)

collection = client.get_or_create_collection(collection_name, embedding_function=OpenAIEmbeddingFunction())

# Load documents
def load_documents():
    folders = glob.glob("knowledge-base/*")
    text_loader_kwargs = {'encoding': 'utf-8'}
    documents = []
    
    for folder in folders:
        doc_type = os.path.basename(folder)
        for filepath in glob.glob(f"{folder}/**/*.md", recursive=True):
            with open(filepath, "r", encoding="utf-8") as file:
                documents.append({"text": file.read(), "doc_type": doc_type})
    
    return documents

documents = load_documents()

# Split text into chunks
def split_text(text, chunk_size=1000, overlap=200):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i:i+chunk_size])
    return chunks

# Process and store embeddings
for i, doc in enumerate(documents):
    chunks = split_text(doc["text"])
    for j, chunk in enumerate(chunks):
        chunk_id = f"{i}_{j}"
        collection.add(ids=[chunk_id], documents=[chunk], metadatas=[{"doc_type": doc["doc_type"]}])

# Retrieval function
def retrieve_relevant_chunks(query, k=5):
    results = collection.query(query_texts=[query], n_results=k)
    retrieved_texts = [doc for doc in results["documents"][0]]
    return "\n\n".join(retrieved_texts)

# Chat function
def chat(question, history):
    relevant_text = retrieve_relevant_chunks(question)
    prompt = f"Context:\n{relevant_text}\n\nQuestion: {question}\nAnswer:"
    
    response = openai.ChatCompletion.create(
        model=MODEL,
        messages=[{"role": "system", "content": "You are an AI assistant with access to a knowledge base."},
                  {"role": "user", "content": prompt}]
    )
    
    return response["choices"][0]["message"]["content"]

# Gradio Chat Interface
view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)




###  Understanding `as_retriever()` in LangChain

### **1. What is Embedding?**
Embedding is the process of converting text into **numerical vectors** that represent the meaning of the text in a high-dimensional space.

#### **Example of Embedding:**
Let's say we have three text chunks:
1️⃣ `"LangChain is great for building LLM apps."`
2️⃣ `"Vector databases store embeddings for search."`
3️⃣ `"I love using LangChain and OpenAI."`

When we pass these chunks through an **embedding model**, they get converted into **vectors** (arrays of numbers):

```
"LangChain is great for building LLM apps."     -->  [0.45, 0.23, 0.89, ..., 0.12]
"Vector databases store embeddings for search." --> [0.78, 0.33, 0.65, ..., 0.92]
"I love using LangChain and OpenAI."            --> [0.56, 0.88, 0.33, ..., 0.75]
```

These vectors are stored inside a **VectorStore** like **ChromaDB, FAISS, Pinecone, or Weaviate**.

---

### **2. What Happens When We Call `as_retriever()`?**
`as_retriever()` does **not** convert back to text chunks directly. Instead, it provides an interface to **retrieve the most relevant stored text chunks based on a search query.**

#### **How does it work?**
```python
retriever = vectorstore.as_retriever()
retrieved_docs = retriever.get_relevant_documents("What is LangChain?")
```
Steps happening in the background:
1️⃣ **Convert the Query to an Embedding:**  
   - `"What is LangChain?"` → `[0.65, 0.12, 0.78, ..., 0.33]`  

2️⃣ **Find Similar Vectors:**  
   - The retriever **searches the vector store** (e.g., ChromaDB) for the most similar **stored** embeddings.

3️⃣ **Return the Original Text Chunks:**  
   - The retriever fetches the **text chunks** (documents) that were originally converted into embeddings.

---

### **3. Why Use `as_retriever()`?**
✅ **Standardized Interface:** Works like any other retriever in LangChain  
✅ **Efficient Similarity Search:** Finds relevant chunks quickly  
✅ **Easy Querying:** No need to manually compute embeddings

---

### **4. Deep-Dive with a Code Example**
#### **Step 1: Store Documents in a Vector Database**
```python
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain_openai import OpenAI
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

documents = [
    "LangChain is an open-source framework for building LLM-powered applications.",
    "Vector databases store embeddings for efficient search.",
    "Retrievers help fetch relevant documents for queries."
]

text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=5)
docs = text_splitter.create_documents(documents)

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
```

#### **Step 2: Convert the Vector Store into a Retriever**
```python
retriever = vectorstore.as_retriever()
```

#### **Step 3: Retrieve Relevant Documents**
```python
query = "What is LangChain?"
retrieved_docs = retriever.get_relevant_documents(query)

for doc in retrieved_docs:
    print(doc.page_content)
```
✅ **Final Output**
```
LangChain is an open-source framework for building LLM-powered applications.
```

---

### **5. Recap: What `as_retriever()` Does**
✅ Converts a **VectorStore** into a **Retriever**.  
✅ Fetches stored text chunks using **vector similarity search**.  
✅ Works seamlessly with RAG pipelines and LLMs.

---

### **6. Visualizing the Process**
```
Text Chunks  --->  Embeddings  --->  VectorStore (Chroma/FAISS)  
   ^                                      |
   |                                      |
Query --->  Embedding --->  Retriever (Finds Similar Vectors) --->  Returns Original Text Chunks
```

---

### **Final Answer to Your Question**
**❌ Does `as_retriever()` convert embeddings back to text chunks?**  
No, it **retrieves** the most relevant text chunks based on similarity search.

**✅ What does `as_retriever()` actually do?**  
It provides a **high-level interface** to **query a vector database** and get **the most relevant original text chunks**.



## Langchain --- General (which help's the above) 

### 1. Data Ingestion 

In [16]:
# import
## Data Ingestging
import bs4
from langchain_community.document_loaders import TextLoader, PyPDFLoader, WebBaseLoader, ArxivLoader, WikipediaLoader

In [17]:
docs_pdf = PyPDFLoader('attention.pdf').load()
loader = WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
                     bs_kwargs=dict(parse_only=bs4.SoupStrainer(
                         class_=("post-title","post-content","post-header")
                     )))
docs_ark = ArxivLoader(query="1706.03762", load_max_docs=2).load()
docs_wk = WikipediaLoader(query="Generative AI", load_max_docs=2).load()

text_documents = TextLoader('speech.txt').load()

In [3]:
# text_documents
# docs_pdf
# len(docs_ark)
# print(docs_wk)

### 2. Data Transformation

**Text Splitting from Documents- RecursiveCharacter Text Splitters**

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

- **How the text is split:** by list of characters.
- **How the chunk size is measured:** by number of characters.

In [18]:
# Import 
## Data transformation/Chunk
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_text_splitters import CharacterTextSplitter
from langchain_text_splitters import HTMLHeaderTextSplitter

In [5]:
# print(final_documents[0])
# print(text[0])

In [19]:

text_splitter = CharacterTextSplitter(separator="\n\n",  chunk_size=100, chunk_overlap=20)
final_documents = text_splitter.split_documents(text_documents)

Created a chunk of size 470, which is longer than the specified 100
Created a chunk of size 347, which is longer than the specified 100
Created a chunk of size 668, which is longer than the specified 100
Created a chunk of size 982, which is longer than the specified 100
Created a chunk of size 789, which is longer than the specified 100


**How to split by HTML header**

**How to split JSON data**

### 3. Embeddings

In [20]:
# Import
import os
from dotenv import load_dotenv

# OpenaAi embedding 
from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import OpenAIEmbeddings

# Ollama embedding
from langchain_community.embeddings import OllamaEmbeddings
# Huggingface embedding

load_dotenv() 
os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")
os.environ['HF_TOKEN']=os.getenv("HF_TOKEN")

**1. OpenAI Embedding**

[OpenAIEmbedding-Reference](https://platform.openai.com/docs/guides/embeddings)

In [21]:
embeddings_1024=OpenAIEmbeddings(model="text-embedding-3-large", dimensions=1024)

# Example
text="This is a tutorial on OPENAI embedding"
query_result=embeddings_1024.embed_query(text)
#query_result

**2. Ollama Embedding**

[OllamaEmbedding-Reference](https://api.python.langchain.com/en/latest/ollama/embeddings/langchain_ollama.embeddings.OllamaEmbeddings.html)

In [9]:
#!pip install langchain ollama

In [22]:
from langchain.embeddings import OllamaEmbeddings

embeddings_ollama = OllamaEmbeddings(model="llama3:latest")
r1 = [embeddings_ollama.embed_query(text) for text in [
    "Alpha is the first letter of Greek alphabet", 
    "Beta is the second letter of Greek alphabet"
]]
#r1

In [23]:
embeddings_deepseek=(OllamaEmbeddings(model="deepseek-r1:latest"))  
r2=embeddings_deepseek.embed_documents(["Alpha is the first letter of Greek alphabet", "Beta is the second letter of Greek alphabet"])
#r2

**3. Huggingface Embedding**

[huggingFace-Embedding-Reference](https://api.python.langchain.com/en/latest/huggingface/embeddings/langchain_huggingface.embeddings.huggingface.HuggingFaceEmbeddings.html)

In [24]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
text="this is atest documents"
query_result=embeddings.embed_query(text)
doc_result = embeddings.embed_documents([text, "This is not a test document."])
#doc_result[0]

### 4. VectorStore

[chroma-reference](https://api.python.langchain.com/en/latest/vectorstores/langchain_chroma.vectorstores.Chroma.html)

[faiss-reference](https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.faiss.FAISS.html)

In [25]:
# Import
from langchain_chroma import Chroma
from langchain_community.vectorstores import FAISS

**1. FAISS**

In [26]:
# Vectorstore
db = FAISS.from_documents(final_documents, embeddings)

### querying 
query = "How does the speaker describe the desired outcome of the war?"
docs = db.similarity_search(query)
docs[0].page_content

'Just because we fight without rancor and without selfish object, seeking nothing for ourselves but what we shall wish to share with all free peoples, we shall, I feel confident, conduct our operations as belligerents without passion and ourselves observe with proud punctilio the principles of right and of fair play we profess to be fighting for.'

As a Retriever

We can also convert the vectorstore into a Retriever class. This allows us to easily use it in other LangChain methods, which largely work with retrievers

In [27]:
retriever = db.as_retriever()
docs = retriever.invoke(query)
docs[0].page_content

'Just because we fight without rancor and without selfish object, seeking nothing for ourselves but what we shall wish to share with all free peoples, we shall, I feel confident, conduct our operations as belligerents without passion and ourselves observe with proud punctilio the principles of right and of fair play we profess to be fighting for.'

Similarity Search with score

There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better.

In [28]:
docs_and_score = db.similarity_search_with_score(query)

### Saving And Loading
db.save_local("faiss_index")

new_db = FAISS.load_local("faiss_index", embeddings,allow_dangerous_deserialization = True)
docs = new_db.similarity_search(query)

**2. Chroma**

In [29]:
vectordb=Chroma.from_documents(documents=final_documents,embedding=embeddings,persist_directory="./chroma_db")
# load from disk
db2 = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)
docs=db2.similarity_search(query)
print(docs[0].page_content)

## similarity Search With Score
docs = vectordb.similarity_search_with_score(query)

### Retriever option
retriever=vectordb.as_retriever()
retriever.invoke(query)[0].page_content

Just because we fight without rancor and without selfish object, seeking nothing for ourselves but what we shall wish to share with all free peoples, we shall, I feel confident, conduct our operations as belligerents without passion and ourselves observe with proud punctilio the principles of right and of fair play we profess to be fighting for.


'Just because we fight without rancor and without selfish object, seeking nothing for ourselves but what we shall wish to share with all free peoples, we shall, I feel confident, conduct our operations as belligerents without passion and ourselves observe with proud punctilio the principles of right and of fair play we profess to be fighting for.'