# 🔹 Embedding a Query using LangChain OpenAIEmbeddings

This notebook demonstrates how to use `OpenAIEmbeddings` from `langchain_openai`  
to embed a query into a numerical vector representation.

We will:
1. Load environment variables (API keys).
2. Initialize the embedding model.
3. Generate embeddings for a sample query.

In [1]:
#!pip install langchain-openai -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/74.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.5/74.5 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
# Step 1: Import required libraries
from langchain_openai import OpenAIEmbeddings

In [3]:
# Step 1: Set the OpenAI API key directly in the notebook
import os
os.environ["OPENAI_API_KEY"] = "sk-xxxxxxxxxxxxxxxxxxx"  # 🔒 Replace with your actual key


In [13]:
# Step 3: Initialize the embedding model

embedding = OpenAIEmbeddings(
    model="text-embedding-ada-002",

)


In [14]:
# Step 4: Generate embeddings for a query
query = "Delhi is the capital of India"
result = embedding.embed_query(query)


In [15]:
# Step 5: Display the result
print("Embedding Vector:\n")
print(result)



Embedding Vector:

[0.007037357892841101, 0.003714336548000574, -0.0333501361310482, -0.005516279023140669, -0.02994190715253353, 0.011329200118780136, -0.033047180622816086, 0.013266840018332005, -0.013557170517742634, -0.0362534373998642, 0.021370846778154373, 0.017293596640229225, -0.007649576757103205, 0.013935862109065056, -0.016864413395524025, -0.0052196369506418705, 0.02323906123638153, -0.0462256595492363, 0.018429672345519066, -0.0067533389665186405, -0.01454176940023899, 0.016094407066702843, 0.007687445729970932, 0.026584172621369362, -0.026154987514019012, -0.0027739182114601135, 0.00038756750291213393, -0.020588217303156853, 0.012099206447601318, 0.008533190935850143, -0.01603129133582115, -0.0014145721215754747, -0.012572571635246277, -0.02566268853843212, 0.0021159411408007145, -0.0007057081675156951, -0.033198658376932144, -0.004661066457629204, 0.01004795916378498, -0.0095177898183465, 0.04122692719101906, -0.004689468070864677, -0.011184034869074821, -0.0178995039314

In [16]:
print(f"\nVector length: {len(result)}")


Vector length: 1536


# 🔹 Embedding Multiple Documents with LangChain and OpenAI
This notebook shows how to use `langchain_openai.OpenAIEmbeddings` to generate vector embeddings for a list of documents using the `text-embedding-3-large` model.


In [18]:
from langchain_openai import OpenAIEmbeddings


In [19]:
# Using the 'text-embedding-3-large' model with 32 dimensions (demo purposes)
embedding = OpenAIEmbeddings(
    model="text-embedding-3-large",
    dimensions=32
)


In [20]:
# List of documents to embed
documents = [
    "Delhi is the capital of India",
    "Kolkata is the capital of West Bengal",
    "Paris is the capital of France"
]

# Generate embeddings
result = embedding.embed_documents(documents)


In [33]:
#result

In [22]:
# Show the embeddings
for idx, vector in enumerate(result):
    print(f"\nDocument {idx+1}: \"{documents[idx]}\"")
    print(f"Embedding Vector (length {len(vector)}):\n{vector}")



Document 1: "Delhi is the capital of India"
Embedding Vector (length 32):
[-0.16005347669124603, 0.27303239703178406, -0.00838515441864729, 0.45374637842178345, -0.012062854133546352, 0.11951705813407898, -0.02134700119495392, 0.07139640301465988, -0.08891859650611877, 0.013885358348488808, -0.028833163902163506, 0.14540806412696838, -0.015454510226845741, -0.13560086488723755, -0.18607524037361145, -0.2043820172548294, -0.18359075486660004, 0.11402502655982971, 0.06995801627635956, -0.27198630571365356, 0.027084212750196457, -0.02922545187175274, 0.06884653121232986, 0.007240981794893742, -0.319583922624588, 0.41870200634002686, 0.32690662145614624, -0.050310928374528885, -0.10395630449056625, 0.0054307361133396626, 0.048676393926143646, 0.12958578765392303]

Document 2: "Kolkata is the capital of West Bengal"
Embedding Vector (length 32):
[0.078705795109272, 0.061247631907463074, -0.031199991703033447, 0.3892652690410614, -0.1799977421760559, 0.06090192496776581, -0.0343113504350185

# 🔹 Semantic Search using OpenAI Embeddings (LangChain)
In this notebook, we:
- Use OpenAI's `text-embedding-3-large` to embed cricket-related documents.
- Embed a user query.
- Compare the query with all documents using cosine similarity.
- Retrieve the most semantically similar document.


In [23]:
from langchain_openai import OpenAIEmbeddings
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np


In [24]:
# Initialize OpenAI embeddings with 300 dimensions
embedding = OpenAIEmbeddings(
    model='text-embedding-3-large',
    dimensions=300
)


In [34]:
documents = [
    "Virat Kohli is an Indian cricketer known for his aggressive batting and leadership.",
    "MS Dhoni is a former Indian captain famous for his calm demeanor and finishing skills.",
    "Sachin Tendulkar, also known as the 'God of Cricket', holds many batting records.",
    "Rohit Sharma is known for his elegant batting and record-breaking double centuries.",
    "Jasprit Bumrah is an Indian fast bowler known for his unorthodox action and yorkers."
]


In [41]:
query = 'tell me about Sachin Tendulkar'

In [42]:
# Embed all documents and the query
doc_embeddings = embedding.embed_documents(documents)
query_embedding = embedding.embed_query(query)


In [43]:
# Compute cosine similarity
scores = cosine_similarity([query_embedding], doc_embeddings)[0]

In [44]:
scores

array([0.26503051, 0.34946292, 0.57518005, 0.35049235, 0.29104616])

In [45]:
# Get the most similar document
index, score = sorted(enumerate(scores), key=lambda x: x[1])[-1]

In [46]:
#display the result
print("Query:", query)
print("Best Matching Document:", documents[index])
print("Similarity Score:", score)


Query: tell me about Sachin Tendulkar
Best Matching Document: Sachin Tendulkar, also known as the 'God of Cricket', holds many batting records.
Similarity Score: 0.5751800547779635
