# 02 Embeddings

In this lab, we'll explore how we can bring our own data into the models used by Azure OpenAI.

We'll start as usual by initiating a connection to the Azure OpenAI service.

**NOTE**: As with previous labs, we'll use the values from the `.env` file in the root of this repository.

In [18]:
import os
from langchain.llms import AzureOpenAI
from langchain_openai import AzureChatOpenAI
from langchain.schema import HumanMessage
from dotenv import load_dotenv

# Load environment variables
if load_dotenv(dotenv_path="../../../.env"):
    print("Found Azure OpenAI Endpoint: " + os.getenv("AZURE_OPENAI_ENDPOINT"))
else: 
    print("No file .env found")

# Create an instance of Azure OpenAI
llm = AzureChatOpenAI(
    azure_deployment = os.getenv("AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME")
)


Found Azure OpenAI Endpoint: https://ai-admin6169ai032713463879.cognitiveservices.azure.com


Let's begin by asking the AI a simple question.

In [68]:
r = llm.invoke("Tell me about the latest Deadpool movie. When was it released? What is it about?")

# Print the response
print(r.content)

As of my knowledge cut-off in October 2023, "Deadpool 3" is the latest installment in the Deadpool film series. The movie is anticipated to be released on May 3, 2024. Details about the plot are not fully disclosed, but it's expected to continue the mix of humor, action, and breaking the fourth wall that the series is known for. Ryan Reynolds reprises his role as the titular anti-hero, Wade Wilson/Deadpool. Excitingly, it's been confirmed that Hugh Jackman will return as Wolverine, which has sparked significant interest and speculation among fans about the storyline.

While specific plot details are under wraps, given the series' previous installments, it's likely to feature Deadpool’s signature blend of comedy and ultra-violence, along with a new twist to integrate Wolverine into the narrative. Additionally, "Deadpool 3" will be the first film to officially bring the character into the Marvel Cinematic Universe (MCU) following Disney's acquisition of 21st Century Fox. Keep an eye on o

What do you notice about the response?

The latest "Deadpool" movie is called "Deadpool and Wolverine". Depending on the model and version you are using, it may tell you that one of the previous movies is the latest, or it may be aware of the new movie but think it hasn't been released yet.

OpenAI models are trained on a large set of data, but that happened at a specific point in time depending on the model. So, many of the models have no information about events that took place in very recent months or years.

To help the AI out, we can provide additional information. This is the same process you would follow if you want the AI to work with your own company data. The AI won't know about information that isn't publicly available, so if you want the AI to work with that information, then you'll need to get that information into the model.

The thing is, you can't actually do that. The models are pre-trained, so the only way to get more information in is to retrain the model, which is an expensive and time consuming process.

However, there *are* ways to get the AI models to work with new data. The most popular of these methods is to use *embeddings*, which we'll explore in the next sections.

## Bring Your Own Data

Langchain provides a number of useful tools, which include tools to simplify the process of working with external documents. Below, we'll use the `DirectoryLoader` which can read multiple files from a directory and the `UnstructuredMarkdownLoader` which can process files in Markdown format. We'll use these to process a bunch of markdown formatted files that contain details of movies that were released more recently.

In [75]:
from langchain.document_loaders import DirectoryLoader, UnstructuredMarkdownLoader

data_dir = "data/movies"

documents = DirectoryLoader(path=data_dir, glob="*.md", show_progress=True, loader_cls=UnstructuredMarkdownLoader).load()

100%|██████████| 17/17 [00:00<00:00, 128.80it/s]


We now have a `documents` object which contains all of the information from our markdown documents about movies.

We can use the `question_answering` chain to provide the AI with access to our documents and then ask the same question about Deadpool movies again.

In [76]:
# Question answering chain
from langchain.chains.question_answering import load_qa_chain

# Prepare the chain and the query
chain = load_qa_chain(llm)
query = "Tell me about the latest Deadpool movie. When was it released? What is it about?"

result = chain.invoke({'input_documents': documents, 'question': query})

print (result['output_text'])

The latest Deadpool movie, titled "Deadpool & Wolverine," was released on July 24, 2024. In this movie, Wade Wilson, also known as Deadpool, finds himself reluctantly returning to his mercenary ways when his homeworld faces an existential threat. This time, he teams up with Wolverine, who is also quite reluctant to join the fray. The movie blends action, comedy, and science fiction, featuring a storyline filled with Deadpool's characteristic humor, cameos, and action sequences. Despite the lackluster villains, the movie is praised for its comedic elements, making it an entertaining watch for fans of the franchise.


Great! The model now knows the correct details for the latest Deadpool movie.

However, there's something lurking! Let's take a look at what happened behind the scenes.

We'll do two things here. First we'll add the `verbose=True` parameter to the chain, and we'll wrap the chain execution in a callback, which will allow us to capture the number of tokens consumed.

In [77]:
# Support for callbacks
from langchain.callbacks import get_openai_callback

# Prepare the chain and the query
chain = load_qa_chain(llm, verbose=True)
query = "Tell me about the latest Deadpool movie. When was it released? What is it about?"

# Run the chain, using the callback to capture the number of tokens used
with get_openai_callback() as callback:
    chain.invoke({'input_documents': documents, 'question': query})
    total_tokens = callback.total_tokens

print(f"Total tokens used: {total_tokens}")



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following pieces of context to answer the user's question. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
iNumber Number: Jozi Gold

Overview

When an undercover cop is tasked with investigating a historic gold heist in Johannesburg, he’s forced to choose between his conscience and the law.

Details

Release Date: 2023-06-23

Genres: Crime, Action, Thriller

Popularity: 719.085

Vote Average: 6.3

Keywords:

Extraction 2

Overview

Tasked with extracting a family who is at the mercy of a Georgian gangster, Tyler Rake infiltrates one of the world's deadliest prisons in order to save them. But when the extraction gets hot, and the gangster dies in the heat of battle, his equally ruthless brother tracks down Rake and his team to Vienna, in order to get revenge.

Details

Release

In the output from the last code section, you should see a lot of information. At the end, you should see a count of the number of tokens used. You might be surprised to see that the query uses anywhere from 2,500 to 6,000 tokens, depending on the model used. That's a lot of tokens!

With the verbose option enabled, the rest of the output shows the prompt that was constructed for the query. If you scroll back through the output, you'll see that the prompt included **all** of the information from our documents, so this is why the query used so many tokens.

As we've discussed previously, AI models have a maximum number of tokens you can use and a charging model based on the number of tokens consumed. In this example, the documents are relatively small in size and there's only 20 of them, but if we wanted to work with larger documents and more of them, then this method would quickly become expensive and eventually we'd hit the token limit.

## Embeddings

The solution to working with large amounts of external information is to use *embeddings*. OpenAI provide embedding models which allow human readable information to be analysed for meaning and intent. The output from an embedding model is data in a numeric format, known as *vectors*. These allow computers to group pieces of similar information together. The vectors are then kept in a *vector store*. When you want to ask a question, an embedding model is again used to convert the query text into vectors and the vector data that represents your query can then be searched in the vector store. Any similar vectors that are found in the database are likely to be a good response to your query.

To prevent overloading a prompt with a large number of tokens, instead of sending all of our documents to the AI, we can perform a vector search first to narrow down to a set of interesting results, and then use that smaller subset of information as part of a prompt.

Let's walk through the process of using embeddings to give the AI some details about our movies. We'll start by initiating an instance of an embeddings model. You'll notice this is similar to when we initialise one of our model deployments to run a query, but in this case we specify an embedding model. Typically the embedding model used has been `text-embedding-ada-002`, but there are newer alternatives available now.

In [52]:
# Step 3: Read Markdown Files and Store Them in CosmosDB
from langchain.embeddings import AzureOpenAIEmbeddings


AZURE_OPENAI_EMBEDDING_URL = os.getenv("AZURE_OPENAI_EMBEDDING_URL")
AZURE_OPENAI_EMBEDDING_MODEL = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL")
AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME")
AZURE_OPENAI_EMBEDDING_MODEL_VERSION = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL_VERSION")

embeddings_model = AzureOpenAIEmbeddings(    
    azure_deployment = AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME,
    openai_api_version = AZURE_OPENAI_EMBEDDING_MODEL_VERSION,
    model= AZURE_OPENAI_EMBEDDING_MODEL,
    chunk_size=1000,
)

def get_embeddings(text: str) -> list[float]:
    """
    Return an embedding vector for the given text using a pre-initialized embedding_model.
    """
    return embeddings_model.embed_query(text)



Now that we've initialised a model to create embeddings, let's go ahead and embed some documents.

As we did in the previous example, we'll use Langchain's built-in loaders to read the documents from a directory.

The next step is to use a *splitter*. A splitter enables us to break up larger documents into chunks, so that we don't risk hitting the token limit when submitting our data to the embedding model.

TODO
connect to azure cosmosdb


enable local-auth using cosmos db key

In [None]:
# Step 2: Connect to Azure CosmosDB
from azure.cosmos import CosmosClient, PartitionKey
from azure.identity import DefaultAzureCredential
import os

# Azure CosmosDB Connection Details
COSMOS_DB_URL = os.getenv("COSMOS_DB_URL")
COSMOS_DB_KEY = os.getenv("COSMOS_DB_KEY")
DATABASE_NAME = os.getenv("COSMOS_DB_NAME")
CONTAINER_NAME = os.getenv("COSMOS_DB_CONTAINER_NAME")

# Connect to CosmosDB
client = CosmosClient(COSMOS_DB_URL, COSMOS_DB_KEY)
database = client.create_database_if_not_exists(DATABASE_NAME)

# Add partition_key reference
container = database.create_container_if_not_exists(
    id=CONTAINER_NAME,
    partition_key=PartitionKey(path="/id"),  
)

print("CosmosDB connection successful and container initialized!")

CosmosDB connection successful and container initialized!


In [7]:


# Path to Markdown files
DATA_DIRECTORY = "./data/movies"  

def read_markdown_files(directory):
    """Reads all markdown files in a directory."""
    markdown_documents = []
    for filename in os.listdir(directory):
        if filename.endswith(".md"):
            filepath = os.path.join(directory, filename)
            with open(filepath, "r", encoding="utf-8") as file:
                content = file.read()
                markdown_documents.append({"filename": filename, "content": content})
    return markdown_documents

markdown_files = read_markdown_files(DATA_DIRECTORY)

In [14]:
import uuid 

# batch upload of movies files
def batch_store_documents(documents):
    """Batch store markdown documents into CosmosDB with embeddings."""
    batch = []
    for doc in documents:
        document_id = str(uuid.uuid4())
        embedding = embeddings_model.embed_query(doc["content"])  # Generate vector embedding

        document = {
            "id": document_id,
            "content": doc["content"],
            "vector_embedding": embedding,
            "metadata": {"source": "markdown", "filename": doc["filename"]},
        }
        batch.append(document)

    # Insert into CosmosDB in batch mode
    for doc in batch:
        container.upsert_item(doc)

    print(f"Stored {len(batch)} markdown files in CosmosDB!")


# Execute batch upload
batch_store_documents(markdown_files)


Stored 17 markdown files in CosmosDB!


In [15]:
def fetch_documents():
    """Retrieve and print stored documents from CosmosDB."""
    query = "SELECT * FROM c"
    results = list(container.query_items(query=query, enable_cross_partition_query=True))

    for doc in results:
        print(f"ID: {doc['id']}, Filename: {doc['metadata']['filename']}, Content Preview: {doc['content'][:100]}...")

# Fetch and display stored documents
fetch_documents()


ID: 18b0c56f-38b2-4f8f-876f-76576815dc8a, Filename: iNumber Number - Jozi Gold.md, Content Preview: # iNumber Number: Jozi Gold

## Overview

 When an undercover cop is tasked with investigating a his...
ID: 178fea87-bec8-49f9-a5bc-2476fc7f1528, Filename: Extraction 2.md, Content Preview: # Extraction 2

## Overview

 Tasked with extracting a family who is at the mercy of a Georgian gang...
ID: fc2b192d-d14a-4315-a147-a58d3fa90657, Filename: The Flash.md, Content Preview: # The Flash

## Overview

 When his attempt to save his family inadvertently alters the future, Barr...
ID: cd2fcc74-85ee-4817-a5cb-9c6fd776339a, Filename: Guy Ritchie's The Covenant.md, Content Preview: # Guy Ritchie's The Covenant

## Overview

 During the war in Afghanistan, a local interpreter risks...
ID: 95b8ea72-ac76-4549-9db7-5f6ea1c10ad4, Filename: Creed III.md, Content Preview: # Creed III

## Overview

 After dominating the boxing world, Adonis Creed has been thriving in both...
ID: bc17abde-2cd6-4591-8be3-

Attach AI Search to the cosmos DB

Trigger Search in AI Search

In [88]:
load_dotenv(dotenv_path="../../../.env")
# FUNCTION TO QUERY AZURE SEARCH
# ----------------------------
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

# Replace with your own values
SEARCH_ENDPOINT = os.getenv("SEARCH_ENDPOINT")
SEARCH_API_KEY = os.getenv("SEARCH_API_KEY")
INDEX_NAME = os.getenv("SEARCH_INDEX")
print(INDEX_NAME)

if SEARCH_API_KEY is None:
    raise ValueError("SEARCH_API_KEY environment variable is not set")

search_client = SearchClient(
    endpoint=SEARCH_ENDPOINT,
    index_name=INDEX_NAME,
    credential=AzureKeyCredential(SEARCH_API_KEY)
)

# test connection
count = search_client.get_document_count()
print("Number of documents in the index:", count)


cosmosdb-movies-index
Number of documents in the index: 34


send a vectorized query

In [94]:
def single_vector_search(search_client):
    """
    Demonstrates a purely vector-based search:
    We'll search for 'What is the most recent movie about Deadpool?'
    and retrieve the top matches based on vector similarity alone.
    """
    query = "What is the most recent movie about Deadpool?"

    vector_query = VectorizedQuery(
        vector=get_embeddings(query),
        k_nearest_neighbors=2,                # number of top documents
        fields="vector_embedding"             # must match your vector field name
    )

    results = search_client.search(
        vector_queries=[vector_query],
        select=["id", "content"]
    )

    print("\n--- Single Vector Search Results ---")
    for result in results:
        print(f"ID: {result['id']}\nContent:\n{result['content']}\n")


# 3) Run searches
single_vector_search(search_client)



--- Single Vector Search Results ---
ID: 4794338d-4433-4834-9884-eafafddd7a62
Content:
# Deadpool & Wolverine

## Overview

 A listless Wade Wilson toils away in civilian life with his days as the morally flexible mercenary, Deadpool, behind him. But when his homeworld faces an existential threat, Wade must reluctantly suit-up again with an even more reluctant Wolverine.

## Details

**Release Date:** 2024-07-24

**Genres:** Action, Comedy, Science Fiction

**Popularity:** 2617.082

**Vote Average:** 7.728

**Keywords:** hero, superhero, anti hero, mutant, breaking the fourth wall, mutants, superhero teamup

## Reviews

**Review by** shammahrashad

**Rating:** 6.0

Theres not much of a plot and the villains weren't that great. It was a good laugh though and amazing cameos and fight scenes.

---

**Review by** r96sk

**Rating:** 9.0

Its story may not be the strongest, but the comedy makes <em>'Deadpool & Wolverine'</em> an excellent watch!

There are some top notch gags in there, part

Now, we'll run our query again. However, we'll make one small change.

You may be thinking that it's not surprising that the AI now knows about the latest Deadpool movie, because we told it about the latest Deadpool movie! So, let's try and show that the AI is actually doing some work here, after all it is a reasoning engine.

If you're not a fan of these movies, Deadpool originates from Marvel comic books. And the collection of movies that originate from Marvel comic books are said to be part of the Marvel Cinematic Universe, sometimes referred to as the MCU. We haven't mentioned Marvel or MCU in the data we've provided, so if we modify the query slightly and ask the AI about the MCU instead of specifically about Deadpool, it should be able to use reasoning to figure out what we mean.

In [None]:
import requests
import json 
load_dotenv(dotenv_path="../../../.env")
def answer_with_rag(query: str, search_client) -> str:
    """
    1) Embed the query (already done in single_vector_search).
    2) Retrieve top documents from Azure Cognitive Search via vector search.
    3) Build a prompt that includes the retrieved documents as 'context.'
    4) Send that prompt to an LLM (e.g. GPT-3.5/4).
    5) Return the LLM's generated answer.
    """
    # 1) Get the query embedding
    query_embedding = get_embeddings(query)

    # 2) Build VectorizedQuery for Azure Cognitive Search
    vector_query = VectorizedQuery(
        vector=query_embedding,
        k_nearest_neighbors=3,
        fields="vector_embedding"
    )

    # 3) Run vector search to get top documents
    results = search_client.search(
        vector_queries=[vector_query],
        select=["id", "content"]
    )

    # Collect the content from each result
    context_snippets = []
    for doc in results:
        # doc is a SearchResult-like object containing fields
        snippet = doc["content"]  # or doc.content, depending on how your fields are accessible
        context_snippets.append(snippet)

    # 4) Create a combined 'context' string
    combined_context = "\n".join(context_snippets)

    # 5) Build a final prompt to the LLM
    prompt = f"""
            You are a helpful AI movie assistant. 
            Use the following context to answer the user's question.

            Context: 
            {combined_context}

            Question: {query}

            Answer:
            """

    # 6) Call the LLM (e.g., Azure OpenAI or OpenAI)

    llm_response = requests.post(
                        url=os.getenv("AZURE_OPENAI_COMPLETION_URL"),
                        headers={"api-key": os.getenv("AZURE_OPENAI_COMPLETION_API_KEY")},
                        json={
                            "messages": [
                                {"role": "user", "content": prompt}
                            ]
                        }
                    )
    parsed_response = llm_response.json()
    # Error handling
    if "status_code" in parsed_response and parsed_response["status_code"] != 200:
        # Could be rate limit (429) or any other error
        error_msg = f"Request failed with status code {parsed_response.status_code}: {parsed_response.text}"
        return error_msg, 0
    if "choices" not in parsed_response or len(parsed_response["choices"]) == 0:
        # Handle or raise an error if the structure is unexpected
        return (f"Unexpected response structure, no 'choices' found: {json.dumps(parsed_response)}"), 0

    # Extract content from the first choice
    
    answer = parsed_response["choices"][0]["message"]["content"]
    token = parsed_response["usage"]["total_tokens"]
    return answer, token

https://aoai-ibit-hackathon.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-08-01-preview


('The most recent movie about Deadpool is **"Deadpool & Wolverine"**, released on **July 24, 2024**. The film features Wade Wilson (Deadpool) reluctantly returning as a superhero to face an existential threat to his homeworld, teaming up with Wolverine (played by Hugh Jackman). Combining action, comedy, and science fiction, this installment is known for its humor, fourth-wall-breaking moments, and intense action sequences.',
 8042)

In [169]:
def rag_demo(search_client):
    user_query = "What is the most recent MUC movie?"
    answer, token = answer_with_rag(user_query, search_client)
    print("User Query:", user_query)
    print("LLM Answer:", answer)

# Then call this in your main flow:
rag_demo(search_client)

User Query: What is the most recent MUC movie?
LLM Answer: The most recent Marvel Cinematic Universe (MCU) movie is **"Deadpool & Wolverine"**, released on **2024-07-24**. This film integrates Deadpool into the broader MCU universe while maintaining his unique tone and style.


## Next Section

Choose one or more of the following Vector Store and AI Orchestration options:

📣 [Implement Retrieval Augmented Generation with Qdrant as vector store](../03-VectorStore/qdrant.ipynb)

📣 [Implement Retrieval Augmented Generation with Azure CosmosDB as vector store and as semantic cache](../03-VectorStore/mongo.ipynb)

📣 [Implement Retrieval Augmented Generation with Azure AI Search as vector store with semantic ranking](../03-VectorStore/aisearch.ipynb)