![image](https://raw.githubusercontent.com/IBM/watsonx-ai-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx, Elasticsearch, and LangChain to answer questions (RAG)

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.

## Notebook content

This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.12.

#### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Data (test) loading](#data)
- [Foundation Models on watsonx](#models)
- [Basic information how to connect to Elasticsearch](#elastic_conn)
- **[Set up ElasticsearchStore (Langchain)](#elasticsearchstore)**
    - [Embed and index documents with Elasticsearch](#elasticsearchstore_index)
    - [Generate a retrieval-augmented response to a question](#predict)



<a id="setup"></a>
## Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Contact with your Cloud Pak for Data administrator and ask them for your account credentials


### Install dependencies
**Note:** `ibm-watsonx-ai` documentation can be found <a href="https://ibm.github.io/watsonx-ai-python-sdk/index.html" target="_blank" rel="noopener no referrer">here</a>.

In [1]:
%pip install -U "langchain>=0.3,<0.4" | tail -n 1
%pip install -U "langchain_ibm>=0.3,<0.4" | tail -n 1
%pip install -U "langchain-community>=0.3,<0.4" | tail -n 1
%pip install -U "langchain_elasticsearch>=0.3,<0.4" | tail -n 1
%pip install -U "langchain-huggingface>=0.2,<0.3" | tail -n 1
%pip install -U humanize | tail -n 1
%pip install -U ipywidgets | tail -n 1
%pip install -U wget | tail -n 1

[1A[2KSuccessfully installed PyYAML-6.0.2 SQLAlchemy-2.0.41 annotated-types-0.7.0 anyio-4.9.0 certifi-2025.4.26 charset-normalizer-3.4.2 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 idna-3.10 jsonpatch-1.33 jsonpointer-3.0.0 langchain-0.3.25 langchain-core-0.3.63 langchain-text-splitters-0.3.8 langsmith-0.3.43 orjson-3.10.18 packaging-24.2 pydantic-2.11.5 pydantic-core-2.33.2 requests-2.32.3 requests-toolbelt-1.0.0 sniffio-1.3.1 tenacity-9.1.2 typing-extensions-4.13.2 typing-inspection-0.4.1 urllib3-2.4.0 zstandard-0.23.0
[1A[2KSuccessfully installed ibm-cos-sdk-2.14.1 ibm-cos-sdk-core-2.14.1 ibm-cos-sdk-s3transfer-2.14.1 ibm-watsonx-ai-1.3.23 jmespath-1.0.1 langchain_ibm-0.3.11 lomond-0.3.3 numpy-2.2.6 pandas-2.2.3 pytz-2025.2 requests-2.32.2 tabulate-0.9.0 tzdata-2025.2
[1A[2KSuccessfully installed aiohappyeyeballs-2.6.1 aiohttp-3.12.6 aiosignal-1.3.2 attrs-25.3.0 dataclasses-json-0.6.7 frozenlist-1.6.0 httpx-sse-0.4.0 langchain-community-0.3.24 marshmallow-3.26.1 multidict-6.4.4 myp

#### Define credentials

Authenticate the watsonx.ai Runtime service on IBM Cloud Pak for Data. You need to provide the **admin's** `username` and the platform `url`.

In [2]:
username = "PASTE YOUR USERNAME HERE"
url = "PASTE THE PLATFORM URL HERE"

Use the **admin's** `api_key` to authenticate watsonx.ai Runtime services:

In [None]:
import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    username=username,
    api_key=getpass.getpass("Enter your watsonx.ai API key and hit enter: "),
    url=url,
    instance_id="openshift",
    version="5.2",
)

Alternatively you can use the **admin's** `password`:

In [3]:
import getpass
from ibm_watsonx_ai import Credentials

if "credentials" not in locals() or not credentials.api_key:
    credentials = Credentials(
        username=username,
        password=getpass.getpass("Enter your watsonx.ai password and hit enter: "),
        url=url,
        instance_id="openshift",
        version="5.2",
    )

### Working with projects

First of all, you need to create a project that will be used for your work. If you do not have project already created follow bellow steps.

- Open IBM Cloud Pak main page
- Click all projects
- Create an empty project
- Copy `project_id` from url and paste it below

**Action**: Assign project ID below

In [4]:
import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

#### Create `APIClient` instance

In [5]:
from ibm_watsonx_ai import APIClient

client = APIClient(credentials, project_id)

<a id="data"></a>
## Data (test) loading

Download the test dataset. This dataset is used to calculate the metrics score for selected model, defined prompts and parameters.

In [6]:
import wget

questions_test_filename = "questions_test.csv"
questions_train_filename = "questions_train.csv"
questions_test_url = "https://raw.github.com/IBM/watsonx-ai-samples/master/cpd5.2/data/RAG/questions_test.csv"
questions_train_url = "https://raw.github.com/IBM/watsonx-ai-samples/master/cpd5.2/data/RAG/questions_train.csv"

if not os.path.isfile(questions_test_filename):
    wget.download(questions_test_url, out=questions_test_filename)

if not os.path.isfile(questions_train_filename):
    wget.download(questions_train_url, out=questions_train_filename)

In [7]:
import pandas as pd

filename_test = "./questions_test.csv"
filename_train = "./questions_train.csv"

test_data = pd.read_csv(filename_test)
train_data = pd.read_csv(filename_train)

Inspect data sample

In [8]:
train_data.head()

Unnamed: 0,qid,question,answers
0,1961,where does diffusion occur in the excretory sy...,diffusion
1,7528,when did the us join world war one,"April 6 , 1917"
2,8685,who played wilma in the movie the flintstones,Elizabeth Perkins
3,6716,when was the office of the vice president created,1787
4,2916,where does carbon fixation occur in c4 plants,in the mesophyll cells


### Build up knowledge base

The current state-of-the-art in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

We can generate dense vector representations using embedding models. In this notebook, we use <a href="https://www.sbert.net/" target="_blank" rel="noopener no referrer">Sentence Transformers</a> <a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2" target="_blank" rel="noopener no referrer">all-MiniLM-L6-v2</a> to embed both the knowledge base passages and user queries. `all-MiniLM-L6-v2` is a performant open-source model that is small enough to run locally.

A vector database is optimized for dense vector indexing and retrieval. This notebook uses <a href="https://python.langchain.com/docs/integrations/vectorstores/elasticsearch#basic-example" target="_blank" rel="noopener no referrer">Elasticsearch</a>, a distributed, RESTful search and analytics engine, capable of performing both vector and lexical search. It is built on top of the Apache Lucene library, which offers good speed and performance with all-MiniLM-L6-v2 embedding model.

The dataset we are using is already split into self-contained passages that can be ingested by Elasticsearch. 

The size of each passage is limited by the embedding model's context window (which is 256 tokens for `all-MiniLM-L6-v2`).

### Load knowledge base documents

Load set of documents used further to build knowledge base. 

In [9]:
knowledge_base_dir = "./knowledge_base"

In [10]:
my_path = f"{os.getcwd()}/knowledge_base"
if not os.path.isdir(my_path):
    os.makedirs(my_path)

In [11]:
documents_filename = "knowledge_base/psgs.tsv"
documents_url = (
    "https://raw.github.com/IBM/watsonx-ai-samples/master/cpd5.2/data/RAG/psgs.tsv"
)

if not os.path.isfile(documents_filename):
    wget.download(documents_url, out=documents_filename)

In [12]:
documents = pd.read_csv(f"{knowledge_base_dir}/psgs.tsv", sep="\t", header=0)
documents["indextext"] = documents["title"].astype(str) + "\n" + documents["text"]
documents = documents[:1000]

### Create an embedding function

Note that you can feed a custom embedding function to be used by Elasticsearch. The performance of Elasticsearch may differ depending on the embedding model used.

In [13]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

emb_func = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

<a id="models"></a>
## Foundation Models on watsonx

#### Specify model

This notebook uses text model `google/flan-ul2`, which has to be available on your Cloud Pak for Data environment for this notebook to run successfully.  
You can list available text models by running the cell below.

In [14]:
if len(client.foundation_models.TextModels):
    print(*client.foundation_models.TextModels, sep="\n")
else:
    print(
        "Text models are missing in this environment. Install text models to proceed."
    )

google/flan-ul2
ibm/granite-guardian-3-2b


In [15]:
model_id = client.foundation_models.TextModels.FLAN_UL2

### Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [16]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 50,
}

### Initialize the `WatsonxLLM` class.

In [17]:
from langchain_ibm import WatsonxLLM

if credentials.get("apikey"):
    watsonx_granite = WatsonxLLM(
        model_id=model_id.value,
        url=credentials.get("url"),
        username=credentials.get("username"),
        apikey=credentials.get("apikey"),
        instance_id=credentials.get("instance_id"),
        project_id=project_id,
        params=parameters,
    )
else:
    watsonx_granite = WatsonxLLM(
        model_id=model_id.value,
        url=credentials.get("url"),
        username=credentials.get("username"),
        password=credentials.get("password"),
        instance_id=credentials.get("instance_id"),
        project_id=project_id,
        params=parameters,
    )

<a id="elastic_conn"></a>
## Set up connectivity information to Elasticsearch

**This notebook focuses on self-managed cluster using <a href="https://cloud.ibm.com/docs/databases-for-elasticsearch?topic=databases-for-elasticsearch-getting-started" target="_blank" rel="noopener no referrer">IBM Cloud® Databases for Elasticsearch.</a>**

The following cell retrieves the Elasticsearch users, password, host and port from the environment if available and prompts you otherwise.

In [18]:
try:
    esuser = os.environ["ESUSER"]
except KeyError:
    esuser = input("Please enter your Elasticsearch user name (hit enter): ")

try:
    espassword = os.environ["ESPASSWORD"]
except KeyError:
    espassword = getpass.getpass(
        "Please enter your Elasticsearch password (hit enter): "
    )

try:
    eshost = os.environ["ESHOST"]
except KeyError:
    eshost = input("Please enter your Elasticsearch hostname (hit enter): ")

try:
    esport = os.environ["ESPORT"]
except KeyError:
    esport = input("Please enter your Elasticsearch port number (hit enter): ")

By default Elasticsearch will start with security features like authentication and TLS enabled. To connect to the Elasticsearch cluster you’ll need to configure the Python Elasticsearch client to use HTTPS with the generated CA certificate in order to make requests successfully. Details can be found <a href="https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new" target="_blank" rel="noopener no referrer">here</a>. In this notebook certificate fingerprints will be used for authentication. 

**Verifying HTTPS with certificate fingerprints (Python 3.10 or later)** If you don’t have access to the generated CA file from Elasticsearch you can use the following script to output the root CA fingerprint of the Elasticsearch instance with openssl s_client <a href="https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#_verifying_https_with_certificate_fingerprints_python_3_10_or_later" target="_blank" rel="noopener no referrer"> (docs)</a>:


The following cell retrieves the fingerprint information using a shell command and stores it in variable `ssl_assert_fingerprint`.

In [19]:
es_ssl_fingerprint = !openssl s_client -connect $eshost:$esport  -showcerts </dev/null 2>/dev/null | openssl x509 -fingerprint -sha256 -noout -in /dev/stdin
es_ssl_fingerprint = es_ssl_fingerprint[0].split("=")[1]
es_ssl_fingerprint

'91:A6:EC:18:AC:0C:35:EB:F9:B9:B3:57:F8:9E:2D:4F:EE:3C:A4:F0:73:60:17:75:27:0C:38:94:11:51:91:33'

<a id="elasticsearchstore"></a>
## Set up ElasticsearchStore connector from Langchain


We first create a regular Elasticsearch Python client connection. Then we pass it into LangChain's ElasticsearchStore wrapper together with the WatsonX model based embedding function.

Consult the LangChain documentation For more information about <a href="https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html" target="_blank" rel="noopener no referrer">ElasticsearchStore</a> connector.

In [20]:
from langchain_elasticsearch import ElasticsearchStore
from langchain_elasticsearch.client import create_elasticsearch_client

es_connection = create_elasticsearch_client(
    f"https://{esuser}:{espassword}@{eshost}:{esport}",
    username=esuser,
    password=espassword,
    params={"request_timeout": None, "ssl_assert_fingerprint": es_ssl_fingerprint},
)

knowledge_base = ElasticsearchStore(
    es_connection=es_connection,
    index_name="test_index",
    embedding=emb_func,
    strategy=ElasticsearchStore.ApproxRetrievalStrategy(),
    distance_strategy="DOT_PRODUCT",
)

<a id="elasticsearchstore_index"></a>
### Embed and index documents with Elasticsearch

**Note: Could take several minutes if you don't have pre-built indices**

In [21]:
if es_connection.indices.exists(index="test_index"):
    es_connection.indices.delete(index="test_index")

knowledge_base.add_texts(
    texts=documents.indextext.tolist(),
    metadatas=[
        {"title": title, "id": doc_id}
        for (title, doc_id) in zip(documents.title, documents.id)
    ],  # filter on these!
    index_name="test_index",
    ids=[str(i) for i in documents.id],  # unique for each doc
)

Let's take a look in Elasticsearch what the LangChain wrapper has created. First we display the newly created index ("tables" in Elasticsearch are always called "index"). Note the field `vector` of type `dense_vector` with `dot_product` similarity.

In [22]:
dict(es_connection.indices.get(index="test_index"))

{'test_index': {'aliases': {},
  'mappings': {'properties': {'metadata': {'properties': {'id': {'type': 'long'},
      'title': {'type': 'text',
       'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}},
    'text': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
    'vector': {'type': 'dense_vector',
     'dims': 384,
     'index': True,
     'similarity': 'dot_product',
     'index_options': {'type': 'int8_hnsw',
      'm': 16,
      'ef_construction': 100}}}},
  'settings': {'index': {'routing': {'allocation': {'include': {'_tier_preference': 'data_content'}}},
    'allocation': {'max_retries': '15'},
    'number_of_shards': '1',
    'provided_name': 'test_index',
    'creation_date': '1748846908180',
    'unassigned': {'node_left': {'delayed_timeout': '60m'}},
    'number_of_replicas': '1',
    'uuid': 'IsBoPDF0Ry2zoLF8BdPfFA',
    'version': {'created': '8512000'}}}}}

Verify the number of documents loaded into the Elasticsearch index.

In [23]:
doc_count = es_connection.count(index="test_index")["count"]
doc_count

1000

Let's retrieve a random document as a sample. Note the embedding in the vector field, that was generated with the WatsonX embedding model.

In [24]:
import random

dict(es_connection.get(index="test_index", id=random.randrange(len(documents))))

Display the total size and indexing time of the new index in Elasticsearch.

In [25]:
import humanize

index_stats = (
    es_connection.indices.stats(index="test_index").get("_all").get("primaries")
)

print(
    "Index size:   ",
    humanize.naturalsize(index_stats.get("store").get("size_in_bytes")),
)
print(
    "Indexing time:",
    humanize.precisedelta(
        index_stats.get("indexing").get("index_time_in_millis") / 1000,
        minimum_unit="minutes",
    ),
)

Index size:    9.8 MB
Indexing time: 0 minutes


<a id="predict"></a>
## Generate a retrieval-augmented response to a question

`RetrievalQA` is a chain to do question answering.

**Hint:** To use Chain interface from LangChain with watsonx.ai models you must call `model.to_langchain()` method. 

It returns `WatsonxLLM` wrapper compatible with LangChain CustomLLM specification.

### Select questions

The prompts we will use to test the RAG flow

In [26]:
questions_and_answers = {
    "names of founding fathers of the united states?": "Thomas Jefferson::James Madison::John Jay::George Washington::John Adams::Benjamin Franklin::Alexander Hamilton",
    "who played in the super bowl in 2013?": "Baltimore Ravens::San Francisco 49ers",
    "when did bucharest become the capital of romania?": "1862",
}

### Retrieve relevant context

Fetch paragraphs similar to the question

In [27]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm=watsonx_granite,
    chain_type="stuff",
    retriever=knowledge_base.as_retriever(),
    return_source_documents=True,
)

In [28]:
results = [qa.invoke({"query": question}) for question in questions_and_answers.keys()]

Get the set of chunks for one of the questions.

In [29]:
for idx, result in enumerate(results):
    print("=========")
    print("Question = ", result["query"])
    print("Answer = ", result["result"])
    print(
        "Expected Answer(s) (may not be appear with exact wording in the dataset) = ",
        questions_and_answers[result["query"]],
    )
    print("\n")
    print("Source documents:")
    print(*(x.page_content for x in result["source_documents"]), sep="\n")
    print("\n")

Question =  names of founding fathers of the united states?
Answer =  John Adams , Benjamin Franklin , Alexander Hamilton , John Jay , Thomas Jefferson , James Madison , and George Washington
Expected Answer(s) (may not be appear with exact wording in the dataset) =  Thomas Jefferson::James Madison::John Jay::George Washington::John Adams::Benjamin Franklin::Alexander Hamilton


Source documents:
Founding Fathers of the United States
^ Burstein , Andrew . `` Politics and Personalities : Garry Wills takes a new look at a forgotten founder , slavery and the shaping of America '' , Chicago Tribune ( November 09 , 2003 ) : `` Forgotten founders such as Pickering and Morris made as many waves as those whose faces stare out from our currency . '' ^ Jump up to : Rafael , Ray . The Complete Idiot 's Guide to the Founding Fathers : And the Birth of Our Nation ( Penguin , 2011 ) . Jump up ^ `` Founding Fathers : Virginia '' . FindLaw Constitutional Law Center . 2008 . Retrieved 2008 - 11 - 14 . 

---

<a id="summary"></a>
## Summary and next steps

You successfully completed this notebook!
 
Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

Copyright © 2023-2025 IBM. This notebook and its source code are released under the terms of the MIT License.