
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>



# LAB - Assembling a RAG Application

In this lab, we will assemble a Retrieval-augmented Generation (RAG) application using the components we previously created. The primary goal is to create a seamless pipeline where users can ask questions, and our system retrieves relevant documents from a Vector Search index to generate informative responses.


**Lab Outline:**

In this lab, you will need to complete the following tasks;

* **Task 1 :** Setup the Retriever Component

* **Task 2 :** Setup the Foundation Model

* **Task 3 :** Assemble the Complete RAG Solution

* **Task 4 :** Save the Model to Model Registry in Unity Catalog

**📝 Your task:** Complete the **`<FILL_IN>`** sections in the code blocks and follow the other steps as instructed.

## REQUIRED - SELECT CLASSIC COMPUTE
Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:
1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.
   
   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.

## Requirements

Please review the following requirements before starting the lesson:

* To run this notebook, you need to use one of the following Databricks runtime(s): **17.3.x-cpu-ml-scala2.13**

**🚨 Important:** This lab relies on the resources established in the previous one. Please ensure you have completed the prior lab before starting this one.


## Classroom Setup

Before starting the lab, run the provided classroom setup script. This script will define configuration variables necessary for the lab. Execute the following cell:

In [0]:
%pip install -U -qqq databricks-sdk databricks-vectorsearch 'mlflow-skinny[databricks]==3.4.0' langchain==0.3.26 databricks-langchain==0.8.0 PyPDF2==3.0.0 flashrank
%restart_python

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
%run ../Includes/Classroom-Setup-04


The examples and models presented in this course are intended solely for demonstration and educational purposes.
 Please note that the models and prompt examples may sometimes contain offensive, inaccurate, biased, or harmful content.


**Other Conventions:**

Throughout this demo, we'll refer to the object `DA`. This object, provided by Databricks Academy, contains variables such as your username, catalog name, schema name, working directory, and dataset locations. Run the code block below to view these details:

In [0]:
print(f"Username:          {DA.username}")
print(f"Catalog Name:      {DA.catalog_name}")
print(f"Schema Name:       {DA.schema_name}")
print(f"Working Directory: {DA.paths.working_dir}")
print(f"Dataset Location:  {DA.paths.datasets}")

Username:          labuser12209929_1761968096@vocareum.com
Catalog Name:      dbacademy
Schema Name:       labuser12209929_1761968096
Working Directory: /Volumes/dbacademy/ops/labuser12209929_1761968096@vocareum_com
Dataset Location:  NestedNamespace (arxiv='/Volumes/dbacademy_arxiv/v01', dais='/Volumes/dbacademy_dais/v01', news='/Volumes/dbacademy_news/v01', docs='/Volumes/dbacademy_docs/v01')


## Task 1: Setup the Retriever Component
**Steps:**
1. Define the embedding model.
1. Get the vector search index that was created in the previous lab.
1. Generate a **retriever** from the vector store. The retriever should return **three results.**
1. Write a test prompt and show the returned search results.


In [0]:
## Components we created before
vs_endpoint_prefix = "vs_endpoint_"
vs_endpoint_name = vs_endpoint_prefix+str(get_fixed_integer(DA.unique_name("_")))
print(f"Assigned Vector Search endpoint name: {vs_endpoint_name}.")

vs_index_fullname = f"{DA.catalog_name}.{DA.schema_name}.lab_pdf_text_managed_vs_index"

Assigned Vector Search endpoint name: vs_endpoint_2.


In [0]:
from databricks.vector_search.client import VectorSearchClient
from databricks_langchain import DatabricksEmbeddings
from langchain_core.runnables import RunnableLambda
from langchain.docstore.document import Document
from flashrank import Ranker, RerankRequest

def get_retriever(cache_dir=f"{DA.paths.working_dir}/opt"):

    def retrieve(query, k: int=10):
        if isinstance(query, dict):
            query = next(iter(query.values()))

        ## get the vector search index
        vsc = VectorSearchClient(disable_notice=True)
        vs_index = vsc.get_index(endpoint_name=vs_endpoint_name, index_name=vs_index_fullname)
        
        # get the query vector
        embeddings = DatabricksEmbeddings(endpoint="databricks-gte-large-en")
        query_vector = embeddings.embed_query(query)

        ## get similar k documents
        return query, vs_index.similarity_search(
            query_vector=query_vector,
            columns=["pdf_name", "content"],
            num_results=k)

    def rerank(query, retrieved, cache_dir, k: int=2):
        ## format result to align with reranker lib format 
        passages = []
        for doc in retrieved.get("result", {}).get("data_array", []):
            new_doc = {"file": doc[0], "text": doc[1]}
            passages.append(new_doc)       
        ## Load the flashrank ranker
        ranker = Ranker(model_name="rank-T5-flan", cache_dir=cache_dir)

        ## rerank the retrieved documents
        rerankrequest = RerankRequest(query=query, passages=passages)
        results = ranker.rerank(rerankrequest)[:k]

        ## format the results of rerank to be ready for prompt
        return [Document(page_content=r.get("text"), metadata={"source": r.get("file")}) for r in results]

    ## the retriever is a runnable sequence of retrieving and reranking.
    return RunnableLambda(retrieve) | RunnableLambda(lambda x: rerank(x[0], x[1], cache_dir))

## test our retriever
question = {"input": "How does Generative AI impact humans?"}
vectorstore = get_retriever(cache_dir = f"{DA.paths.working_dir}/opt")
similar_documents = vectorstore.invoke(question)
print(f"Relevant documents: {similar_documents}")

E0000 00:00:1761970140.406406    5386 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1761970140.412237    5386 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1761970140.426807    5386 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1761970140.426824    5386 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1761970140.426853    5386 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1761970140.426855    5386 computation_placer.cc:177] computation placer already registered. Please check linka

[2025-11-01 04:09:02,851] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cpu (auto detect)


/usr/bin/ld: cannot find -laio: No such file or directory
collect2: error: ld returned 1 exit status


[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.
Relevant documents: [Document(metadata={'source': 'dbfs:/Volumes/dbacademy_arxiv/v01/arxiv-articles/2303.10130.pdf'}, page_content='(Autoretal.,2022a)findsas\nwell that automation and augmentation exposures tend to be positively correlated. There is also a growing set\nof studies examining specific economic impacts and opportunities for LLMs (Bommasani et al., 2021; Felten\net al., 2023; Korinek, 2023; Mollick and Mollick, 2022; Noy and Zhang, 2023; Peng et al., 2023). Alongside\nthis work, our measurements help characterize the broader potential relevance of language models to the\nlabor market.\nGeneral-purpose technologies (e.g. printing, the steam engine) are characterized by widespread prolifera-\ntion, continuous improvement, and the generation of complementary innovations 

Trace(trace_id=tr-e2da98babdb53d7bddd25d631ada8729)

## Task 2: Setup the Foundation Model
**Steps:**
1. Define the foundation model for generating responses. Use `llama-3.3` as foundation model. 
2. Test the foundation model to ensure it provides accurate responses.

In [0]:
## import necessary libraries
from databricks_langchain import ChatDatabricks

## define foundation model for generating responses
chat_model = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct", max_tokens = 300)

## test foundation model
print(f"Test chat model: {chat_model.invoke('What is Generative AI?')}")

INFO:httpx:HTTP Request: POST https://oregon.cloud.databricks.com/serving-endpoints/chat/completions "HTTP/1.1 200 OK"


Test chat model: content='Generative AI refers to a type of artificial intelligence that is capable of generating new, original content, such as images, videos, music, text, or even entire datasets. This is in contrast to traditional AI, which is typically designed to analyze and process existing data.\n\nGenerative AI uses complex algorithms, such as deep learning and neural networks, to learn patterns and structures within existing data. It then uses this knowledge to generate new, synthetic data that is similar in style and structure to the original data. This can be done for a variety of purposes, such as:\n\n1. **Content creation**: Generating new images, videos, music, or text that is similar in style to existing content.\n2. **Data augmentation**: Generating new data to supplement existing datasets, which can be useful for training machine learning models.\n3. **Style transfer**: Transferring the style of one image or video to another, while preserving the original content.\n4. 

Trace(trace_id=tr-29552315f2d21d43e1510452d2ee9df4)

##Task 3: Assemble the Complete RAG Solution
**Steps:**
1. Merge the retriever and foundation model into a single Langchain chain.
2. Configure the Langchain chain with proper templates and context for generating responses.
3. Test the complete RAG solution with sample queries.

In [0]:
from operator import itemgetter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda, RunnableParallel, RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Prompt template
TEMPLATE = """You are an assistant for GENAI teaching class. You are answering questions related to Generative AI and how it impacts humans life. If the question is not related to one of these topics, kindly decline to answer. 
Use the following pieces of context to answer the question at the end:

<context>
{context}
</context>

Question: {input}

Answer:
"""

prompt = ChatPromptTemplate.from_template(TEMPLATE)      

# Helper functions
def format_docs(docs):
    # what the model sees in {context}
    return "\n\n".join(d.page_content for d in docs)

def unwrap(payload):
    # return both answer and normalized context (dicts) like you wanted
    docs = payload["docs"]
    return {
        "answer": payload["answer"],
        "context": [{"metadata": getattr(d, "metadata", {}), "page_content": getattr(d, "page_content", "")}
                    for d in docs],
    }

# ---- build the chain ----
# Step 1: retrieve docs
retrieve = RunnableParallel(input=RunnablePassthrough(), docs=get_retriever())

# Step 2: pass formatted context + input to the model
rag = retrieve | {
    "input": itemgetter("input"),
    "context": RunnableLambda(lambda x: format_docs(x["docs"]))
} | prompt | chat_model | StrOutputParser()

# Keep docs for postprocessing
chain = retrieve | {
    "answer": ({"input": itemgetter("input"), "context": RunnableLambda(lambda x: format_docs(x["docs"]))}
               | prompt | chat_model | StrOutputParser()),
    "docs": itemgetter("docs"),
} | RunnableLambda(unwrap)

# Test the complete RAG solution with sample query
question = {"input": "What are the generative AI's economical implications?"}
response = chain.invoke(question)
print(response['answer'])

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.


INFO:httpx:HTTP Request: POST https://oregon.cloud.databricks.com/serving-endpoints/chat/completions "HTTP/1.1 200 OK"


The economic implications of Generative AI (GENAI) are significant and pervasive. According to our analysis, the impacts of Large Language Models (LLMs) like GPT-4 are likely to be widespread and persistent, even if the development of new capabilities is halted. The economic effect of GENAI is expected to increase over time, with potential productivity gains that may not exacerbate cost disease effects.

Our research indicates that the variance explained by previous technology exposure measurements ranges from 60 to 72%, leaving 28 to 40% of the variation in our AI exposure measure unaccounted for. This suggests that GENAI has unique economic implications that are not fully captured by previous technology exposure measurements.

Furthermore, our analysis by industry reveals that information processing industries exhibit high exposure to GENAI, while manufacturing, agriculture, and mining demonstrate lower exposure. This implies that GENAI is likely to have a significant impact on certa

Trace(trace_id=tr-b3deb05b8df01ecf8f0da531a2b868fa)

##Task 4: Save the Model to Model Registry in Unity Catalog
**Steps:**
1. Register the assembled RAG model in the Model Registry with Unity Catalog.
2. Ensure that all necessary dependencies and requirements are included.
3. Provide an input example and infer the signature for the model.

In [0]:
## import necessary libraries
from mlflow.models import infer_signature
import mlflow
import langchain

## set Model Registry URI to Unity Catalog
mlflow.set_registry_uri("databricks-uc")
model_name = f"{DA.catalog_name}.{DA.schema_name}.rag_app_lab_4"

## register the assembled RAG model in Model Registry with Unity Catalog
with mlflow.start_run(run_name="rag_app_lab_4") as run:
    signature = infer_signature(question, response)
    model_info = mlflow.langchain.log_model(
        chain,
        loader_fn=get_retriever,
        name="chain",
        registered_model_name=model_name,
        pip_requirements=[
            "mlflow==" + mlflow.__version__,
            "langchain==" + langchain.__version__,
            "databricks-vectorsearch",
        ],
        input_example=question,
        signature=signature
    )

🔗 View Logged Model at: https://dbc-6a912028-eced.cloud.databricks.com/ml/experiments/3613252000274365/models/m-fbd48e8dba464c9f9a6cce1d2a225354?o=1236668491373261
2025/11/01 04:09:32 INFO mlflow: Attempting to auto-detect Databricks resource dependencies for the current langchain model. Dependency auto-detection is best-effort and may not capture all dependencies of your langchain model, resulting in authorization errors when serving or querying your model. We recommend that you explicitly pass `resources` to mlflow.langchain.log_model() to ensure authorization to dependent resources succeeds when the model is deployed.
2025/11/01 04:09:33 INFO mlflow.tracking.fluent: Active model is set to the logged model with ID: m-fbd48e8dba464c9f9a6cce1d2a225354
2025/11/01 04:09:33 INFO mlflow.tracking.fluent: Use `mlflow.set_active_model` to set the active model to a different one if needed.


[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.


INFO:httpx:HTTP Request: POST https://oregon.cloud.databricks.com/serving-endpoints/chat/completions "HTTP/1.1 200 OK"
Successfully registered model 'dbacademy.labuser12209929_1761968096.rag_app_lab_4'.


Uploading artifacts:   0%|          | 0/26 [00:00<?, ?it/s]

🔗 Created version '1' of model 'dbacademy.labuser12209929_1761968096.rag_app_lab_4': https://dbc-6a912028-eced.cloud.databricks.com/explore/data/models/dbacademy/labuser12209929_1761968096/rag_app_lab_4/version/1?o=1236668491373261



## Clean up Resources

This was the final lab. You can delete all resources created in this course.


## Conclusion

In this lab, you learned how to assemble a Retrieval-augmented Generation (RAG) application using Databricks components. By integrating Vector Search for document retrieval and a foundational model for response generation, you created a powerful tool for answering user queries. This lab provided hands-on experience in building end-to-end AI applications and demonstrated the capabilities of Databricks for natural language processing tasks.

&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>