![image](https://raw.githubusercontent.com/IBM/watsonx-ai-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx, Chroma, and LangChain to answer questions (RAG)

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.

## Notebook content
This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.12.

### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Document data loading](#data)
- [Build up knowledge base](#build_base)
- [Foundation Models on watsonx](#models)
- [Generate a retrieval-augmented response to a question](#predict)
- [Summary and next steps](#summary)


<a id="setup"></a>
## Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Contact with your Cloud Pak for Data administrator and ask them for your account credentials


### Install dependencies
**Note:** `ibm-watsonx-ai` documentation can be found <a href="https://ibm.github.io/watsonx-ai-python-sdk/index.html" target="_blank" rel="noopener no referrer">here</a>.

In [1]:
%pip install -U "langchain>=0.3,<0.4" | tail -n 1
%pip install -U "langchain_ibm>=0.3,<0.4" | tail -n 1
%pip install -U "langchain_community>=0.3,<0.4" | tail -n 1
%pip install -U "langchain_chroma>=0.2,<0.3" | tail -n 1
%pip install -U wget | tail -n 1
%pip install -U sentence-transformers | tail -n 1

[1A[2KSuccessfully installed PyYAML-6.0.2 SQLAlchemy-2.0.41 annotated-types-0.7.0 anyio-4.9.0 certifi-2025.4.26 charset-normalizer-3.4.2 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 idna-3.10 jsonpatch-1.33 jsonpointer-3.0.0 langchain-0.3.25 langchain-core-0.3.63 langchain-text-splitters-0.3.8 langsmith-0.3.43 orjson-3.10.18 packaging-24.2 pydantic-2.11.5 pydantic-core-2.33.2 requests-2.32.3 requests-toolbelt-1.0.0 sniffio-1.3.1 tenacity-9.1.2 typing-extensions-4.13.2 typing-inspection-0.4.1 urllib3-2.4.0 zstandard-0.23.0
[1A[2KSuccessfully installed ibm-cos-sdk-2.14.1 ibm-cos-sdk-core-2.14.1 ibm-cos-sdk-s3transfer-2.14.1 ibm-watsonx-ai-1.3.23 jmespath-1.0.1 langchain_ibm-0.3.11 lomond-0.3.3 numpy-2.2.6 pandas-2.2.3 pytz-2025.2 requests-2.32.2 tabulate-0.9.0 tzdata-2025.2
[1A[2KSuccessfully installed aiohappyeyeballs-2.6.1 aiohttp-3.12.4 aiosignal-1.3.2 attrs-25.3.0 dataclasses-json-0.6.7 frozenlist-1.6.0 httpx-sse-0.4.0 langchain_community-0.3.24 marshmallow-3.26.1 multidict-6.4.4 myp

#### Define credentials

Authenticate the watsonx.ai Runtime service on IBM Cloud Pak for Data. You need to provide the **admin's** `username` and the platform `url`.

In [2]:
username = "PASTE YOUR USERNAME HERE"
url = "PASTE THE PLATFORM URL HERE"

Use the **admin's** `api_key` to authenticate watsonx.ai Runtime services:

In [None]:
import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    username=username,
    api_key=getpass.getpass("Enter your watsonx.ai API key and hit enter: "),
    url=url,
    instance_id="openshift",
    version="5.2",
)

Alternatively you can use the **admin's** `password`:

In [3]:
import getpass
from ibm_watsonx_ai import Credentials

if "credentials" not in locals() or not credentials.api_key:
    credentials = Credentials(
        username=username,
        password=getpass.getpass("Enter your watsonx.ai password and hit enter: "),
        url=url,
        instance_id="openshift",
        version="5.2",
    )

### Working with projects

First of all, you need to create a project that will be used for your work. If you do not have a project created already, follow the steps below:

- Open IBM Cloud Pak main page
- Click all projects
- Create an empty project
- Copy `project_id` from url and paste it below

**Action**: Assign project ID below

In [4]:
import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

#### Create `APIClient` instance

In [5]:
from ibm_watsonx_ai import APIClient

client = APIClient(credentials, project_id)

<a id="data"></a>
## Document data loading

Download the file with State of the Union.

In [6]:
import os
import wget

filename = "state_of_the_union.txt"
url = "https://raw.github.com/IBM/watsonx-ai-samples/master/cpd5.2/data/foundation_models/state_of_the_union.txt"

if not os.path.isfile(filename):
    wget.download(url, out=filename)

<a id="build_base"></a>
## Build up knowledge base

The current state-of-the-art in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

In this basic example, we take the State of the Union speech content (filename), split it into chunks, embed it using an open-source embedding model, load it into <a href="https://www.trychroma.com/" target="_blank" rel="noopener no referrer">Chroma</a>, and then query it.

In [7]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_chroma import Chroma

loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

The dataset we are using is already split into self-contained passages that can be ingested by Chroma.

#### Specify embedding model

This notebook uses embedding model `ibm/slate-125m-english-rtrvr`, which has to be available on your Cloud Pak for Data environment for this notebook to run successfully.  
You can list available embedding models by running the cell below.

**Note**: You can feed a custom embedding function to be used by `chromadb`. The performance of `chromadb` may differ depending on the embedding model used. In following example we use watsonx.ai Embedding service.

In [8]:
if len(client.foundation_models.EmbeddingModels):
    print(*client.foundation_models.EmbeddingModels, sep="\n")
else:
    print(
        "Embedding models are missing in this environment. Install embedding models to proceed."
    )

ibm/slate-125m-english-rtrvr


In [9]:
from ibm_watsonx_ai.foundation_models import Embeddings

embeddings = Embeddings(
    model_id=client.foundation_models.EmbeddingModels.SLATE_125M_ENGLISH_RTRVR,
    credentials=credentials,
    project_id=project_id,
)
docsearch = Chroma.from_documents(texts, embeddings)

#### Compatibility watsonx.ai Embeddings with LangChain

 LangChain retrievals use `embed_documents` and `embed_query` under the hood to generate embedding vectors for uploaded documents and user query respectively. watsonx.ai python sdk `Embeddings` class has these methods implemented.

In [10]:
help(Embeddings)

Help on class Embeddings in module ibm_watsonx_ai.foundation_models.embeddings.embeddings:

class Embeddings(ibm_watsonx_ai.foundation_models.embeddings.base_embeddings.BaseEmbeddings, ibm_watsonx_ai.wml_resource.WMLResource)
 |  Embeddings(*, model_id: 'str', params: 'ParamsType | None' = None, credentials: 'Credentials | dict[str, str] | None' = None, project_id: 'str | None' = None, space_id: 'str | None' = None, api_client: 'APIClient | None' = None, verify: 'bool | str | None' = None, persistent_connection: 'bool' = True, batch_size: 'int' = 1000, concurrency_limit: 'int' = 5, max_retries: 'int | None' = None, delay_time: 'float | None' = None, retry_status_codes: 'list[int] | None' = None) -> 'None'
 |
 |  Instantiate the embeddings service.
 |
 |  :param model_id: the type of model to use
 |  :type model_id: str, optional
 |
 |  :param params: parameters to use during generate requests, use ``ibm_watsonx_ai.metanames.EmbedTextParamsMetaNames().show()`` to view the list of MetaNa

<a id="models"></a>
## Foundation Models on `watsonx.ai`

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/ibm_watsonx" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

#### Specify text model

This notebook uses text model `ibm/granite-3-2b-instruct`, which has to be available on your Cloud Pak for Data environment for this notebook to run successfully.  
You can list available text models by running the cell below.

In [11]:
if len(client.foundation_models.TextModels):
    print(*client.foundation_models.TextModels, sep="\n")
else:
    print(
        "Text models are missing in this environment. Install text models to proceed."
    )

ibm/granite-13b-instruct-v2
ibm/granite-3-2b-instruct


In [12]:
model_id = client.foundation_models.TextModels.GRANITE_3_2B_INSTRUCT

### Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [13]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"],
}

### LangChain CustomLLM wrapper for watsonx model
Initialize the `WatsonxLLM` class from LangChain with defined parameters and `ibm/granite-3-2b-instruct`. 

In [14]:
from langchain_ibm import WatsonxLLM

if credentials.get("apikey"):
    watsonx_granite = WatsonxLLM(
        model_id=model_id.value,
        url=credentials.get("url"),
        username=credentials.get("username"),
        apikey=credentials.get("apikey"),
        instance_id=credentials.get("instance_id"),
        project_id=project_id,
        params=parameters,
    )
else:
    watsonx_granite = WatsonxLLM(
        model_id=model_id.value,
        url=credentials.get("url"),
        username=credentials.get("username"),
        password=credentials.get("password"),
        instance_id=credentials.get("instance_id"),
        project_id=project_id,
        params=parameters,
    )

<a id="predict"></a>
## Generate a retrieval-augmented response to a question

Build the `RetrievalQA` (question answering chain) to automate the RAG task.

In [15]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm=watsonx_granite, chain_type="stuff", retriever=docsearch.as_retriever()
)

### Select questions

Get questions from the previously loaded test dataset.

In [16]:
query = "What did the president say about Ketanji Brown Jackson"
qa.invoke(query)

{'query': 'What did the president say about Ketanji Brown Jackson',
 'result': " The president nominated Circuit Court of Appeals Judge Ketanji Brown Jackson as a replacement for Justice Stephen Breyer on the United States Supreme Court. He described her as one of the nation's top legal minds and expressed confidence that she would continue Justice Breyer's legacy of excellence."}

---

<a id="summary"></a>
## Summary and next steps

You successfully completed this notebook!
 
You learned how to answer question using RAG using watsonx and LangChain.
 
Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

Copyright © 2023-2025 IBM. This notebook and its source code are released under the terms of the MIT License.