![image](https://raw.githubusercontent.com/IBM/watsonx-ai-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use AutoAI RAG and Chroma to create a pattern and get information from `ibm-watsonx-ai` SDK documentation

#### Disclaimers

- Use only Projects and Spaces that are available in the watsonx context.


## Notebook content

This notebook contains the steps and code to demonstrate the usage of IBM AutoAI RAG. The AutoAI RAG experiment conducted in this notebook uses data scraped from the `ibm-watsonx-ai` SDK documentation.

Some familiarity with Python is helpful. This notebook uses Python 3.12.


## Learning goal

The learning goals of this notebook are:

- Create an AutoAI RAG job that will find the best RAG pattern based on provided data


## Table of Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Define the RAG Optimizer](#definition)
- [Run the RAG Experiment](#run)
- [Compare and test RAG Patterns](#comparison)
- [Historical runs](#runs)
- [Clean up](#cleanup)
- [Summary and next steps](#summary)

<a id="setup"></a>
## Set up the environment

Before you use the sample code in this notebook, you must perform the following setup task:

-  Contact your Cloud Pak for Data administrator and ask them for your account credentials

### Install dependencies
**Note:** `ibm-watsonx-ai` documentation can be found <a href="https://ibm.github.io/watsonx-ai-python-sdk/index.html" target="_blank" rel="noopener no referrer">here</a>.

In [1]:
%pip install -U "ibm-watsonx-ai[rag]>=1.2.4" | tail -n 1

[1A[2KSuccessfully installed Pillow-11.2.1 SQLAlchemy-2.0.41 XlsxWriter-3.2.3 aiohappyeyeballs-2.6.1 aiohttp-3.12.4 aiosignal-1.3.2 annotated-types-0.7.0 anyio-4.9.0 asgiref-3.8.1 attrs-25.3.0 backoff-2.2.1 bcrypt-4.3.0 beautifulsoup4-4.12.3 build-1.2.2.post1 cachetools-5.5.2 certifi-2025.4.26 charset-normalizer-3.4.2 chroma-hnswlib-0.7.6 chromadb-0.5.23 click-8.2.1 coloredlogs-15.0.1 dataclasses-json-0.6.7 deprecated-1.2.18 distro-1.9.0 durationpy-0.10 elastic-transport-8.17.1 elasticsearch-8.18.1 fastapi-0.115.12 filelock-3.18.0 flatbuffers-25.2.10 frozenlist-1.6.0 fsspec-2025.5.1 google-auth-2.40.2 googleapis-common-protos-1.70.0 grpcio-1.67.1 h11-0.16.0 hf-xet-1.1.2 httpcore-1.0.9 httptools-0.6.4 httpx-0.28.1 httpx-sse-0.4.0 huggingface-hub-0.32.3 humanfriendly-10.0 ibm-cos-sdk-2.14.1 ibm-cos-sdk-core-2.14.1 ibm-cos-sdk-s3transfer-2.14.1 ibm-watsonx-ai-1.3.23 idna-3.10 importlib-metadata-8.6.1 importlib-resources-6.5.2 jmespath-1.0.1 jsonpatch-1.33 jsonpointer-3.0.0 kubernetes-32

#### Define credentials

Authenticate the watsonx.ai Runtime service on IBM Cloud Pak for Data. You need to provide the **admin's** `username` and the platform `url`.

In [2]:
username = "PASTE YOUR USERNAME HERE"
url = "PASTE THE PLATFORM URL HERE"

Use the **admin's** `api_key` to authenticate watsonx.ai Runtime services:

In [None]:
import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    username=username,
    api_key=getpass.getpass("Enter your watsonx.ai API key and hit enter: "),
    url=url,
    instance_id="openshift",
    version="5.2",
)

Alternatively you can use the **admin's** `password`:

In [3]:
import getpass
from ibm_watsonx_ai import Credentials

if "credentials" not in locals() or not credentials.api_key:
    credentials = Credentials(
        username=username,
        password=getpass.getpass("Enter your watsonx.ai password and hit enter: "),
        url=url,
        instance_id="openshift",
        version="5.2",
    )

#### Create `APIClient` instance

In [4]:
from ibm_watsonx_ai import APIClient

client = APIClient(credentials)

### Working with projects

First, you need to create a project for your work. If you do not have a project already, create one by following these steps:

- Open IBM Cloud Pak for Data
- From the menu, click **View all projects**
- Create a new project
- Go to the **Manage** tab
- Copy the `project_id`

**Action**: Assign the project ID below

In [5]:
import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

To be able to interact with all resources available in watsonx.ai, you need to set the **project** which you will be using.

In [6]:
client.set.default_project(project_id)

'SUCCESS'

<a id="definition"></a>

## RAG Optimizer definition

### Define a connection to the training data

Upload the training data to the project as a data asset and then define a connection to the file. This example uses the `ModelInference` description from the [`ibm_watsonx_ai`](https://ibm.github.io/watsonx-ai-python-sdk/fm_model_inference.html) documentation.

In [7]:
from langchain_community.document_loaders import WebBaseLoader

url = "https://ibm.github.io/watsonx-ai-python-sdk/fm_model_inference.html"

docs = WebBaseLoader(url).load()
model_inference_content = docs[0].page_content

USER_AGENT environment variable not set, consider setting it to identify your requests.


Upload the training data to the project as a data asset.

In [8]:
import os

document_filename = "ModelInference.txt"

if not os.path.isfile(document_filename):
    with open(document_filename, "w") as file:
        file.write(model_inference_content)

document_asset_details = client.data_assets.create(
    name=document_filename, file_path=document_filename
)

document_asset_id = client.data_assets.get_id(document_asset_details)
document_asset_id

Creating data asset...
SUCCESS


'f7c0615d-176a-4af8-879b-bdae1f84bfbe'

Define a connection to the training data.

In [9]:
from ibm_watsonx_ai.helpers import DataConnection

input_data_references = [DataConnection(data_asset_id=document_asset_id)]

### Define a connection to the test data

Upload a `json` file that you want to use as a benchmark to the project as a data asset and then define a connection to the file. This example uses content from the [`ibm_watsonx_ai`](https://ibm.github.io/watsonx-ai-python-sdk/index.html) SDK documentation.

In [10]:
benchmarking_data_IBM_page_content = [
    {
        "question": "What is path to ModelInference class?",
        "correct_answer": "ibm_watsonx_ai.foundation_models.inference.ModelInference",
        "correct_answer_document_ids": ["ModelInference.txt"],
    },
    {
        "question": "What is method for get model inference details?",
        "correct_answer": "get_details()",
        "correct_answer_document_ids": ["ModelInference.txt"],
    },
]

Upload the benchmark testing data to the project as a data asset with `json` extension.

In [11]:
import json

test_filename = "benchmarking_data_ModelInference.json"

if not os.path.isfile(test_filename):
    with open(test_filename, "w") as json_file:
        json.dump(benchmarking_data_IBM_page_content, json_file, indent=4)

test_asset_details = client.data_assets.create(
    name=test_filename, file_path=test_filename
)

test_asset_id = client.data_assets.get_id(test_asset_details)
test_asset_id

Creating data asset...
SUCCESS


'fde6e891-8e6e-4f84-8637-0cb6fecd2142'

Define a connection to the benchmark testing data.

In [12]:
test_data_references = [DataConnection(data_asset_id=test_asset_id)]

### Configure the RAG Optimizer

Provide the input information for the AutoAI RAG optimizer:
- `name` - experiment name
- `description` - experiment description
- `max_number_of_rag_patterns` - maximum number of RAG patterns to create
- `optimization_metrics` - target optimization metrics

In [13]:
from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(credentials, project_id=project_id)

rag_optimizer = experiment.rag_optimizer(
    name="AutoAI RAG run - ModelInference documentation",
    description="AutoAI RAG Optimizer on ibm_watsonx_ai ModelInference documentation",
    max_number_of_rag_patterns=4,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS],
)

To retrieve the configuration parameters, use `get_params()`.

In [14]:
rag_optimizer.get_params()

{'name': 'AutoAI RAG run - ModelInference documentation',
 'description': 'AutoAI RAG Optimizer on ibm_watsonx_ai ModelInference documentation',
 'max_number_of_rag_patterns': 4,
 'optimization_metrics': ['answer_correctness']}

<a id="run"></a>
## Run the RAG Experiment

Call the `run()` method to trigger the AutoAI RAG experiment. Choose one of two modes: 

- To use the **interactive mode** (synchronous job), specify `background_mode=False` 
- To use the **background mode** (asynchronous job), specify `background_mode=True`

In [15]:
run_details = rag_optimizer.run(
    input_data_references=input_data_references,
    test_data_references=test_data_references,
    background_mode=False,
)



##############################################

Running 'fc391c80-4ceb-4961-b6ed-5063cab855f5'

##############################################


pending..............
running..
completed
Training of 'fc391c80-4ceb-4961-b6ed-5063cab855f5' finished successfully.


To monitor the AutoAI RAG jobs in background mode, use the `get_run_status()` method.

In [16]:
rag_optimizer.get_run_status()

'completed'

<a id="comparison"></a>
## Compare and test RAG Patterns

You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the `summary()` method. Use the DataFrame to compare all discovered patterns and select the one you want for further testing.

In [17]:
summary = rag_optimizer.summary()
summary

Unnamed: 0_level_0,mean_answer_correctness,mean_faithfulness,mean_context_correctness,chunking.method,chunking.chunk_size,chunking.chunk_overlap,embeddings.model_id,vector_store.distance_metric,retrieval.method,retrieval.number_of_chunks,retrieval.hybrid_ranker,generation.model_id
Pattern_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Pattern1,0.0,0.5833,1.0,recursive,512,128,ibm/slate-125m-english-rtrvr,cosine,simple,3,,ibm/granite-13b-instruct-v2
Pattern2,0.0,0.8864,1.0,recursive,1024,256,ibm/slate-125m-english-rtrvr,cosine,window,5,,ibm/granite-13b-instruct-v2
Pattern3,0.0,0.7045,1.0,recursive,512,256,ibm/slate-125m-english-rtrvr,cosine,window,3,,ibm/granite-13b-instruct-v2
Pattern4,0.0,0.7237,1.0,recursive,1024,256,ibm/slate-125m-english-rtrvr,cosine,simple,5,,ibm/granite-13b-instruct-v2


Additionally, you can pass the `scoring` parameter to the summary method to filter RAG patterns, starting with the best.

In [18]:
summary = rag_optimizer.summary(scoring="faithfulness")

### Get the selected pattern

Get the RAGPattern object from the RAG Optimizer experiment. By default, the RAGPattern of the best pattern is returned.

In [19]:
best_pattern_name = summary.index.values[0]
print("Best pattern is:", best_pattern_name)

best_pattern = rag_optimizer.get_pattern(pattern_name="Pattern1")
best_pattern

Best pattern is: Pattern2


<ibm_watsonx_ai.foundation_models.extensions.rag.pattern.pattern.RAGPattern at 0x144a3f0e0>

To retrieve the pattern details, use the `get_pattern_details` method.

In [20]:
rag_optimizer.get_pattern_details(pattern_name="Pattern2")

{'composition_steps': ['model_selection',
  'chunking',
  'embeddings',
  'retrieval',
  'generation'],
 'duration_seconds': 4,
 'location': {'evaluation_results': '/projects/d940d2db-e37e-4c6a-b646-beb97e76250c/assets/auto_ml/auto_ml.fd93b72d-b43c-4e0a-badc-7ffb466342e2/wml_data/fc391c80-4ceb-4961-b6ed-5063cab855f5/Pattern2/evaluation_results.json',
  'indexing_notebook': '/projects/d940d2db-e37e-4c6a-b646-beb97e76250c/assets/auto_ml/auto_ml.fd93b72d-b43c-4e0a-badc-7ffb466342e2/wml_data/fc391c80-4ceb-4961-b6ed-5063cab855f5/Pattern2/indexing_inference_notebook.ipynb',
  'inference_notebook': '/projects/d940d2db-e37e-4c6a-b646-beb97e76250c/assets/auto_ml/auto_ml.fd93b72d-b43c-4e0a-badc-7ffb466342e2/wml_data/fc391c80-4ceb-4961-b6ed-5063cab855f5/Pattern2/indexing_inference_notebook.ipynb',
  'inference_service_code': '/projects/d940d2db-e37e-4c6a-b646-beb97e76250c/assets/auto_ml/auto_ml.fd93b72d-b43c-4e0a-badc-7ffb466342e2/wml_data/fc391c80-4ceb-4961-b6ed-5063cab855f5/Pattern2/inference_a

### Create the index/collection

Build a solution using the best pattern with additional document indexing.

To check the `index_name` that you are working on, use the `best_pattern` method. 

In [21]:
best_pattern.vector_store._index_name

'autoai_rag_fc391c80_20250530105632'

In [22]:
urls = [
    "https://ibm.github.io/watsonx-ai-python-sdk/fm_embeddings.html",
    "https://ibm.github.io/watsonx-ai-python-sdk/fm_custom_models.html",
    "https://ibm.github.io/watsonx-ai-python-sdk/fm_text_extraction.html",
]
docs_list = WebBaseLoader(urls).load()
doc_splits = best_pattern.chunker.split_documents(docs_list)

In [23]:
best_pattern.indexing_function(doc_splits)

['a82844b130d06b51e44648fdc74143602a552fd34d5419397e9f7225f40f36ce',
 '39e675e63899d66333b5cb49bdae94ff07abc18bc4889aa1ba4fc0e2285d52fa',
 'ceb9173fe07e2aabc1b9ed70b21c619153da6b1f4b374e487cd23e3a27a98dac',
 'fa01210dd08d98b49345f215d03182a0f4297f7d7060efeae4f0b2cbab2ebc90',
 '27256afee0899a720bdfaa8c630958a98a8b79d7ac102627aef353c35a74755b',
 'ec8e2661748745a6bedb0ec0589043332a82a585f4750bd1c77041d2dcf103c8',
 '67afb8c0898213fbe54593ffe8c3a139164f86cf07fa02e023f88f850eac840e',
 'bac740d77e22d5d5dc3538f2a35df16cea82677e1ecb6fd499aa92aa1904865a',
 'e006e431a0c03fddf36f5cd99acdc178dfc2262a85646c226e5b726a7955b3f0',
 'cb1251665b793077c4687e1c852a75fc47b77c448a599ade924217e27427094b',
 'a582fb7649e34bdcc43deaf244ec78e4fa5ca42e91c157aed7217a4f106eb9e1',
 '82082085128a22276f4eafd78305d7e9b340f52d97b8f62e2337daa9bebdd51f',
 '79fbdaf9cafafd53ca1b1ac0bfce108d87c82abf34cd11b10847bb205abecf8f',
 'a9c34ffa073ef2bca97787749af7ae4aeb7ce19859cc244833b77c41f0f38027',
 'b3d4068f131da72cc054a2c1ffb7b389

Query the RAGPattern locally to test it.

In [24]:
questions = ["How to add Task Credentials?"]

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {"values": questions, "access_token": client.token}
    ]
}

best_pattern.inference_function()(payload)

{'predictions': [{'fields': ['answer', 'reference_documents'],
     [{'page_content': 'If the list is empty, you can create new task credentials with the store method:\nclient.task_credentials.store()\n\n\nTo get the status of available task credentials, use the get_details method:\nclient.task_credentials.get_details()',
       'metadata': {'document_id': '338243661372903145',
        'language': 'en',
        'sequence_number': 8,
        'source': 'https://ibm.github.io/watsonx-ai-python-sdk/fm_custom_models.html',
        'start_index': 0,
        'title': 'Custom models - IBM watsonx.ai'}},
       'metadata': {'document_id': '338243661372903145',
        'language': 'en',
        'sequence_number': 7,
        'source': 'https://ibm.github.io/watsonx-ai-python-sdk/fm_custom_models.html',
        'start_index': 0,
        'title': 'Custom models - IBM watsonx.ai'}},
      {'page_content': 'Note\nWhen the credentials parameter is passed, one of these parameters is required: [project_

<a id="runs"></a>
## Historical runs

In this section, you will learn how to work with historical RAG Optimizer jobs (runs).

To list historical runs, use the `list()` method and provide the `'rag_optimizer'` filter.

In [25]:
experiment.runs(filter="rag_optimizer").list()

Unnamed: 0,timestamp,run_id,state,auto_pipeline_optimizer name
0,2025-05-30T10:56:47.230Z,fc391c80-4ceb-4961-b6ed-5063cab855f5,completed,AutoAI RAG run - ModelInference documentation


In [26]:
run_id = run_details["metadata"]["id"]
run_id

'fc391c80-4ceb-4961-b6ed-5063cab855f5'

### Get the executed optimizer's configuration parameters

In [27]:
experiment.runs.get_rag_params(run_id=run_id)

{'name': 'AutoAI RAG run - ModelInference documentation',
 'description': 'AutoAI RAG Optimizer on ibm_watsonx_ai ModelInference documentation',
 'max_number_of_rag_patterns': 4,
 'optimization_metrics': ['answer_correctness']}

### Get the historical `rag_optimizer` instance and training details

In [28]:
historical_opt = experiment.runs.get_rag_optimizer(run_id)

### List trained patterns for the selected optimizer

In [29]:
historical_opt.summary()

Unnamed: 0_level_0,mean_answer_correctness,mean_faithfulness,mean_context_correctness,chunking.method,chunking.chunk_size,chunking.chunk_overlap,embeddings.model_id,vector_store.distance_metric,retrieval.method,retrieval.number_of_chunks,retrieval.hybrid_ranker,generation.model_id
Pattern_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Pattern1,0.0,0.5833,1.0,recursive,512,128,ibm/slate-125m-english-rtrvr,cosine,simple,3,,ibm/granite-13b-instruct-v2
Pattern2,0.0,0.8864,1.0,recursive,1024,256,ibm/slate-125m-english-rtrvr,cosine,window,5,,ibm/granite-13b-instruct-v2
Pattern3,0.0,0.7045,1.0,recursive,512,256,ibm/slate-125m-english-rtrvr,cosine,window,3,,ibm/granite-13b-instruct-v2
Pattern4,0.0,0.7237,1.0,recursive,1024,256,ibm/slate-125m-english-rtrvr,cosine,simple,5,,ibm/granite-13b-instruct-v2


<a id="cleanup"></a>
## Clean up

To delete the current experiment, use the `cancel_run(hard_delete=True)` method.

**Warning:** Be careful: once you delete an experiment, you will no longer be able to refer to it.

In [30]:
rag_optimizer.cancel_run(hard_delete=True)

'SUCCESS'

To clean up all of the created assets:
- experiments
- trainings
- pipelines
- model definitions
- models
- functions
- deployments

follow the steps in this sample [notebook](https://github.com/IBM/watsonx-ai-samples/blob/master/cpd5.1/notebooks/python_sdk/instance-management/Machine%20Learning%20artifacts%20management.ipynb).

<a id="summary"></a>
## Summary and next steps

You successfully completed this notebook!

You learned how to use `ibm-watsonx-ai` to run AutoAI RAG experiments. 

 Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### Authors

**Mateusz Szewczyk**, Software Engineer at watsonx.ai

Copyright © 2024-2025 IBM. This notebook and its source code are released under the terms of the MIT License.