## Retrieval Augmented Generation (RAG) using Amazon Bedrock and FAISS

In this notebook, we demonstrate a RAG solution that uses [FAISS](https://github.com/facebookresearch/faiss) as a vector store and [Amazon Bedrock](https://aws.amazon.com/bedrock/) for generation.

### Prerequisites
Install the required dependencies 

In [None]:
%pip install --quiet faiss-cpu==1.7.4 langchain==0.0.222 PyYAML pypdf

### Imports
Import the required libraries

In [None]:
import requests
import logging 
import boto3
import yaml
import json
from langchain.embeddings import BedrockEmbeddings
import ipywidgets as ipw
from IPython.display import display, clear_output
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

### Define variables
- Set the Bedrock embedding model
- Set the region

In [None]:
BEDROCK_EMBEDDING_MODEL = "amazon.titan-embed-text-v1" #"amazon.titan-embed-g1-text-02"
REGION_NAME = boto3.session.Session().region_name

### Download Data
Download a dataset that contains FAQs on SageMaker. Each row consists of a question and an answer.

In [None]:
s3_path = "s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv"
!aws s3 cp $s3_path ../data/Amazon_SageMaker_FAQs.csv

### Create a Vector Store using FAISS
This step performs the following actions:
1. Splits the document into chunks
2. Creates a numerical vector representation of each chunk using Amazon Bedrock Titan Embeddings model
3. Creates an index using the chunks and the corresponding embeddings

Steps 2 and 3 are abstracted using the FAISS module from langchain.

In [None]:
boto3_bedrock = boto3.client("bedrock-runtime")

br_embeddings = BedrockEmbeddings(model_id=BEDROCK_EMBEDDING_MODEL, client=boto3_bedrock)

loader = CSVLoader("../data/Amazon_SageMaker_FAQs.csv") # --- > 219 docs with 400 chars, each row consists of a question column and an answer column
documents_aws = loader.load()
print(f"Number of documents={len(documents_aws)}")

docs = CharacterTextSplitter(chunk_size=2000, chunk_overlap=400, separator=",").split_documents(documents_aws)

print(f"Number of documents after split and chunking={len(docs)}")

vectorstore_faiss_aws = FAISS.from_documents(
    documents=docs,
    embedding = br_embeddings
)

print(f"vectorstore_faiss_aws: number of elements in the index={vectorstore_faiss_aws.index.ntotal}::")

### Search the Vector Store

Given a `query`, this step retrieves relevant documents from the FAISS index. The following steps are performed - 

1. The `query` is converted to an embedding using the Bedrock Titan model
2. The embedding of the query is compared with the embeddings in the index
3. Top `k` embeddings are retrieved based on their similarity with `query embedding`

All of the above steps are performed by the langchain module. The `k` relevant documents are then concatenated to form a `context`.

In [None]:
context = ''
query = 'How can I check for imbalances in my model?'
v = br_embeddings.embed_query(query)
print(v[0:10])
results = vectorstore_faiss_aws.similarity_search_by_vector(v, k=4)
for r in results:
    context += r.page_content
    
print(context)

Now create a prompt template to trigger the model with the above context. We explicitly instruct the model to answer only using the context provided.

In [None]:
template = f"""Human: 
        You are a helpful, polite, fact-based agent.
        If you don't know the answer, just say that you don't know.
        Please answer the following question using the context provided. 

        CONTEXT: 
        {context}
        =========
        QUESTION: {query} 
        Assistant: """


In [None]:
prompt = template.format(context=context, question=query)
print(prompt)

## Invoke the endpoint to generate a response from the LLM

In this final step, the prompt (with the context) is passed to the LLM (Bedrock - claudev2).

In [None]:
modelId = 'anthropic.claude-v2' # change this to use a different version from the model provider
accept = 'application/json'
contentType = 'application/json'

body = json.dumps({
                    "prompt": prompt,
                    "max_tokens_to_sample":4096,
                    "temperature":0.5,
                    "top_k":250,
                    "top_p":0.5,
                    "stop_sequences": ["\n\nHuman:"]
                  }) 

response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())


In [None]:
print(response_body.get('completion'))