![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx, Elasticsearch, and LangChain to answer questions (RAG)

## Notebook content

This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.10.

#### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Load and index a knowledge base passages (once) - In this case here Elasticsearch
- Retrieve relevant passage(s) from knowledge base
- Generate a response by feeding user prompt augmented by the retrieved passages into a large language model

LangChain simplifies these stepsby providng convenience wrappers for the first step and for the combination of second and third step.

## Contents

This notebook contains the following parts:

- [Set up the environment](#setup)
- [Build up knowledge base](#knowledge_base)
- [Set up Foundation Models on WatsonX](#models)
- [Set up connectivity information to Elasticsearch](#elastic_conn)
- **[Set up ElasticsearchStore connector from Langchain](#elasticsearchstore)**
    - [Embed and index documents with Elasticsearch](#elasticsearchstore_index)
    - [Generate a retrieval-augmented response to a question](#predict)



<a id="setup"></a>
##  Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-service-instance.html?context=analytics" target="_blank" rel="noopener no referrer">here</a>).


### Install and import dependecies

In [1]:
!pip install langchain --upgrade | tail -n 1
!pip install elasticsearch | tail -n 1
!pip install sentence_transformers | tail -n 1
!pip install pandas | tail -n 1
!pip install rouge_score | tail -n 1
!pip install nltk | tail -n 1
!pip install wget | tail -n 1
!pip install "pydantic==1.10.0" | tail -n 1
!pip install "ibm-watson-machine-learning>=1.0.327" | tail -n 1
!pip install humanize | tail -n 1



In [25]:
import os, getpass
import humanize
import random
import pandas as pd
from typing import Optional, Any, Iterable, List

### watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation
Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see
[documentation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).

In [3]:
try:
    apikey = os.environ["IBM_CLOUD_API_KEY"]
except KeyError:
    apikey = getpass.getpass("Please enter your WML api key (hit enter): ")

In [4]:
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": apikey
}

The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

**Hint**: You can find the `project_id` as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be `Projects / <project name> /`. Click on the `<project name>` link. Then get the `project_id` from Project's Manage tab (Project -> Manage -> General -> Details).


In [5]:
try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

<a id="knowledge_base"></a>
## Build up knowledge base

The current state-of-the-art in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

We can generate dense vector representations using embedding models. In this notebook, we use [SentenceTransformers](https://www.google.com/search?client=safari&rls=en&q=sentencetransformers&ie=UTF-8&oe=UTF-8) [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) to embed both the knowledge base passages and user queries. `all-MiniLM-L6-v2` is a performant open-source model that is small enough to run locally.

A vector database is optimized for dense vector indexing and retrieval. This notebook uses [Elasticsearch](https://python.langchain.com/docs/integrations/vectorstores/elasticsearch#basic-example), a distributed, RESTful search and analytics engine, capable of performing both vector and lexical search. It is built on top of the Apache Lucene library, which offers good speed and performance with all-MiniLM-L6-v2 embedding model.

The dataset we are using is already split into self-contained passages that can be ingested by Elasticsearch. 

The size of each passage is limited by the embedding model's context window (which is 256 tokens for `all-MiniLM-L6-v2`).

### Load knowledge base documents

Load set of documents used further to build knowledge base. 

In [6]:
knowledge_base_dir = "./knowledge_base"

In [7]:
my_path = f"{os.getcwd()}/knowledge_base"
if not os.path.isdir(my_path):
   os.makedirs(my_path)

In [11]:
import wget

documents_filename = 'knowledge_base/psgs.tsv'
documents_url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/psgs.tsv'


if not os.path.isfile(documents_filename): 
    wget.download(documents_url, out=documents_filename)

In [8]:
documents = pd.read_csv(f"{knowledge_base_dir}/psgs.tsv", sep='\t', header=0)
documents['indextext'] = documents['title'].astype(str) + "\n" + documents['text']
#documents = documents[:1000]

<a id="models"></a>
## Set up Foundation Models on WatsonX

### Configure an embedding function with a WatsonX Encoder model (sentence transformer)

This will be used to compute the embeddings for the knowledge base documents stored in Elasticsearch.

Note that Elasticsearch also supports the option to deploy and run an embedding model in the database cluster. This notebook does not make use of this.

In [9]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.embeddings.base import Embeddings

emb_func = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

### Configure a WatsonX Encder-Decoder model for Question Answer generation
You need to specify `model_id` that will be used for prompting:

In [10]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.FLAN_UL2

We need to provide a set of model parameters that will influence the result:

In [11]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 50
}

Initialize the `Model` class.

In [12]:
from ibm_watson_machine_learning.foundation_models import Model

model = Model(
    model_id=model_id,
    params=parameters,
    credentials=credentials,
    project_id=project_id
)

<a id="elastic_conn"></a>
## Set up connectivity information to Elasticsearch

**This notebook focuses on IBM-managed cluster using <a href="https://cloud.ibm.com/docs/databases-for-elasticsearch?topic=databases-for-elasticsearch-getting-started" target="_blank" rel="noopener no referrer">IBM Cloud® Databases for Elasticsearch.</a>**

The following cell retrieves the Elasticsearch users, password, host and port from the environment if available and prompts you otherwise.

In [13]:
try:
    esuser = os.environ["ESUSER"]
except KeyError:
    esuser = input("Please enter your Elasticsearch user name (hit enter): ")
try:
    espassword = os.environ["ESPASSWORD"]
except KeyError:
    espassword = getpass.getpass("Please enter your Elasticsearch password (hit enter): ")
try:
    eshost = os.environ["ESHOST"]
except KeyError:
    eshost = input("Please enter your Elasticsearch hostname (hit enter): ")
try:
    esport = os.environ["ESPORT"]
except KeyError:
    esport = input("Please enter your Elasticsearch port number (hit enter): ")

By default Elasticsearch will start with security features like authentication and TLS enabled. To connect to the Elasticsearch cluster you’ll need to configure the Python Elasticsearch client to use HTTPS with the generated CA certificate in order to make requests successfully. Details can be found <a href="https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new" target="_blank" rel="noopener no referrer">here</a>. In this notebook certificate fingerprints will be used for authentication. 

**Verifying HTTPS with certificate fingerprints (Python 3.10 or later)** If you don’t have access to the generated CA file from Elasticsearch you can use the following script to output the root CA fingerprint of the Elasticsearch instance with openssl s_client <a href="https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#_verifying_https_with_certificate_fingerprints_python_3_10_or_later" target="_blank" rel="noopener no referrer"> (docs)</a>:

The following cell retrieves the fingerprint information using a shell command and stores it in variable `ssl_assert_fingerprint`.

In [14]:
es_ssl_fingerprint = !openssl s_client -connect $ESHOST:$ESPORT -showcerts </dev/null 2>/dev/null | openssl x509 -fingerprint -sha256 -noout -in /dev/stdin
es_ssl_fingerprint = es_ssl_fingerprint[0].lstrip("SHA256 Fingerprint=")

<a id="elasticsearchstore"></a>
## Set up ElasticsearchStore connector from Langchain

We first create a regular Elasticsearch Python client connection. Then we pass it into LangChain's ElasticsearchStore wrapper together with the WatsonX model based embedding function.

Consult the LangChain documentation For more information about [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) connector.

In [15]:
from langchain.vectorstores.elasticsearch import ElasticsearchStore
from elasticsearch import Elasticsearch

es_connection = Elasticsearch([f"https://{esuser}:{espassword}@{eshost}:{esport}"],
                              basic_auth=(esuser, espassword),
                              request_timeout=None,
                              ssl_assert_fingerprint=es_ssl_fingerprint)

knowledge_base = ElasticsearchStore(es_connection=es_connection,
                                    index_name="test_index",
                                    embedding=emb_func,
                                    strategy=ElasticsearchStore.ApproxRetrievalStrategy(),
                                    distance_strategy="DOT_PRODUCT")

<a id="elasticsearchstore_index"></a>
### Embed documents, and load and index documents in Elasticsearch

The `add_texts()` function of the ElasticsearchStore wrapper in LangChain is a compound function that prepares the document data, computes the embeddings using the provided WatsonX ebedding model and then loads everything to Elasticsearch.

**Note: This could take 10 - 15 minutes**

In [17]:
if es_connection.indices.exists(index="test_index"):
    es_connection.indices.delete(index="test_index")
_ = knowledge_base.add_texts(texts=documents.indextext.tolist(),
                             metadatas=[{'title': title, 'id': doc_id}
                                for (title, doc_id) in
                                zip(documents.title, documents.id)],  # filter on these!
                             index_name="test_index",
                             ids=[str(i) for i in documents.id]  # unique for each doc
                            )

Let's take a look in Elasticsearch what the LangChain wrapper has created. First we display the newly created index ("tables" in Elasticsearch are always called "index"). Note the field `vector` of type `dense_vector` with `dot_product` similarity.

In [19]:
dict(es_connection.indices.get(index="test_index"))

{'test_index': {'aliases': {},
  'mappings': {'properties': {'metadata': {'properties': {'id': {'type': 'long'},
      'title': {'type': 'text',
       'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}},
    'text': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
    'vector': {'type': 'dense_vector',
     'dims': 384,
     'index': True,
     'similarity': 'dot_product'}}},
  'settings': {'index': {'routing': {'allocation': {'include': {'_tier_preference': 'data_content'}}},
    'allocation': {'max_retries': '15'},
    'number_of_shards': '1',
    'provided_name': 'test_index',
    'creation_date': '1697549983549',
    'unassigned': {'node_left': {'delayed_timeout': '60m'}},
    'number_of_replicas': '1',
    'uuid': 'Suubnrd6QGy6R-MSNbRzwg',
    'version': {'created': '8070099'}}}}}

Verify the number of documents loaded into the Elasticsearch index.

In [22]:
doc_count = es_connection.count(index='test_index')["count"]
doc_count

29042

Let's retrieve a random document as a sample. Note the embedding in the `vector` field, that was generated with the WatsonX embedding model.

In [24]:
dict(es_connection.get(index="test_index", id=random.randint(0, len(documents)-1)))

{'_index': 'test_index',
 '_id': '8293',
 '_version': 1,
 '_seq_no': 8292,
 '_primary_term': 1,
 'found': True,
 '_source': {'text': "Helter Skelter (Manson scenario)\nFamily members who had been released from jail had made their way back to Spahn Ranch . There , on November 25 , 1969 , the LAPD confiscated a door on which someone had written `` Helter Scelter ( sic ) is coming down fast . '' A photograph shows the confiscated door was also inscribed with `` 1 , 2 , 3 , 4 , 5 , 6 , 7 -- ALL GOOD CHILDREN ( Go to Heaven ? ) '' ( sic ) . This children 's rhyme is heard in `` You Never Give Me Your Money '' , a song that appears on Abbey Road . In October 1970 , the prosecution offered testimony about the door during Manson 's trial for the Tate - LaBianca murders ; but only the `` Helter Skelter '' inscription seems to have been noted . In late September or early October 1969 , before the arrests , Tex Watson had left the desert camp and gone on to separate himself from the Family . Late

Display the total size and indexing time of the new index in Elasticsearch.

In [26]:
index_stats = es_connection.indices.stats(index="test_index").get('_all').get('primaries')
print("Index size:    " + humanize.naturalsize(index_stats.get('store').get('size_in_bytes')))
print("Indexing time: " + humanize.precisedelta(index_stats.get('indexing').get('index_time_in_millis')/1000, minimum_unit='minutes'))

Index size:    270.1 MB
Indexing time: 0.35 minutes


<a id="predict"></a>
## Generate a retrieval-augmented response to a question

### Select questions
The prompts we will use to test the RAG flow

In [29]:
questions = [
            'names of founding fathers of the united states?',
            'who played in the super bowl in 2013?', 
            'when did bucharest become the capital of romania?'
            ]

### Set up a RAG chain

Configure a `RetrievalQA` chain (doing question answering) for the FLAN_UL2 WatsonX foundation model and by using LangChain's ElasticsearchStore as document retriever for the knowledge base.

**Hint:** To use `langchain.chain` interface with watsonx.ai models you must use the `to_langchain()` method. It returns `WatsonxLLM` wrapper compatible with LangChain CustomLLM specification.

In [30]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=model.to_langchain(), chain_type="stuff",
                                 retriever=knowledge_base.as_retriever(), return_source_documents=True)

Answer the test prompts with the RAG chain.

In [31]:
results = []

for question in questions:
    result = qa({"query": question})
    results.append(result)

Print the generated results.

In [32]:
for idx, result in enumerate(results):
    print("=========")
    print("Question = ", result['query'])
    print("Answer = ", result['result'])
    print("\n")
    print("Source documents:")
    print(*(x.page_content for x in result['source_documents']), sep='\n')
    print("\n")
    

Question =  names of founding fathers of the united states?
Answer =  John Adams , Benjamin Franklin , Alexander Hamilton , John Jay , Thomas Jefferson , James Madison , and George Washington


Source documents:
Founding Fathers of the United States
^ Burstein , Andrew . `` Politics and Personalities : Garry Wills takes a new look at a forgotten founder , slavery and the shaping of America '' , Chicago Tribune ( November 09 , 2003 ) : `` Forgotten founders such as Pickering and Morris made as many waves as those whose faces stare out from our currency . '' ^ Jump up to : Rafael , Ray . The Complete Idiot 's Guide to the Founding Fathers : And the Birth of Our Nation ( Penguin , 2011 ) . Jump up ^ `` Founding Fathers : Virginia '' . FindLaw Constitutional Law Center . 2008 . Retrieved 2008 - 11 - 14 . Jump up ^ Schwartz , Laurens R. Jews and the American Revolution : Haym Solomon and Others , Jefferson , North Carolina : McFarland & Co. , 1987 . Jump up ^ Kendall , Joshua . The Forgotte

---

Copyright © 2023 IBM. This notebook and its source code are released under the terms of the MIT License.