# Conversational Interface - Chatbot with Claude LLM

> *This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

In this notebook, we will build a chatbot using the Foundation Models (FMs) in Amazon Bedrock. For our use-case we use Claude as our FM for building the chatbot.

## Overview

Conversational interfaces such as chatbots and virtual assistants can be used to enhance the user experience for your customers.Chatbots uses natural language processing (NLP) and machine learning algorithms to understand and respond to user queries. Chatbots can be used in a variety of applications, such as customer service, sales, and e-commerce, to provide quick and efficient responses to users. They can be accessed through various channels such as websites, social media platforms, and messaging apps.


## Chatbot using Amazon Bedrock

![Amazon Bedrock - Conversational Interface](./images/chatbot_bedrock.png)


## Use Cases

1. **Chatbot (Basic)** - Zero Shot chatbot with a FM model
2. **Chatbot using prompt** - template(Langchain) - Chatbot with some context provided in the prompt template
3. **Chatbot with persona** - Chatbot with defined roles. i.e. Career Coach and Human interactions
4. **Contextual-aware chatbot** - Passing in context through an external file by generating embeddings.

## Langchain framework for building Chatbot with Amazon Bedrock
In Conversational interfaces such as chatbots, it is highly important to remember previous interactions, both at a short term but also at a long term level.

LangChain provides memory components in two forms. First, LangChain provides helper utilities for managing and manipulating previous chat messages. These are designed to be modular and useful regardless of how they are used. Secondly, LangChain provides easy ways to incorporate these utilities into chains.
It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots.

## Building Chatbot with Context - Key Elements

The first process in a building a contextual-aware chatbot is to **generate embeddings** for the context. Typically, you will have an ingestion process which will run through your embedding model and generate the embeddings which will be stored in a sort of a vector store. In this example we are using Titan Embeddings model for this

![Embeddings](./images/embeddings_lang.png)

Second process is the user request orchestration , interaction,  invoking and returing the results

![Chatbot](./images/chatbot_lang.png)

## Architecture [Context Aware Chatbot]
![4](./images/context-aware-chatbot.png)


## Setup

Before running the rest of this notebook, you'll need to run the cells below to (ensure necessary libraries are installed and) connect to Bedrock. and you can run

For more details on how the setup works and ⚠️ **whether you might need to make any changes**, refer to the [Bedrock boto3 setup notebook](../00_Intro/bedrock_boto3_setup.ipynb) notebook.

In [None]:
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

In this notebook, we'll also need some extra dependencies:

- [FAISS](https://github.com/facebookresearch/faiss), to store vector embeddings
- [IPyWidgets](https://ipywidgets.readthedocs.io/en/stable/), for interactive UI widgets in the notebook
- [PyPDF](https://pypi.org/project/pypdf/), for handling PDF files

In [None]:
%pip install --quiet "faiss-cpu>=1.7,<2" "ipywidgets>=7,<8" langchain==0.0.309 "pypdf>=3.8,<4"

In [81]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import json
import os
import sys

import boto3

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."


boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

#### Use Claude model to answer the question without any context

We will see it returns a very generic answer

In [85]:
# If you'd like to try your own prompt, edit this parameter!
prompt_data = """Human: How can I check for imbalances in my model?

Assistant:
"""

body = json.dumps({"prompt": prompt_data, "max_tokens_to_sample": 500})
modelId = "anthropic.claude-instant-v1"  # change this to use a different version from the model provider
accept = "application/json"
contentType = "application/json"

response = boto3_bedrock.invoke_model(
    body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())

print(response_body.get("completion"))

 Here are a few things you can do to check for imbalances in your model:

- Evaluate model performance on different demographic groups. See if there are significant differences in accuracy, error rates, etc. between groups like gender, age, race, etc. Large gaps could indicate biases.

- Conduct bias audits where you analyze the model's predictions on examples from different groups and see if any patterns emerge that correlate with protected attributes. 

- Look at the training data distribution and ensure all important demographic groups are adequately represented. Imbalances in the data can lead to unequal performance.

- Try debiasing techniques like data augmentation, reweighting, regularization, adversarial training, etc. and see if performance differences reduce across groups. 

- Analyze the model parameters or embeddings to check for stereotypical patterns that correlate with protected attributes. 

- Test the model on examples designed to elicit unfair behaviors, like swapping

### First step is to choose an embedding model. We will choose Titan embeddings

Show Embeddings it is a list of numbers and size for this is 1536

In [86]:
prompt_data = "How can I check for imbalances in my model?"

body = json.dumps({"inputText": prompt_data})
modelId = "amazon.titan-embed-g1-text-02"  # (Change this to try different embedding models)
accept = "application/json"
contentType = "application/json"

response = boto3_bedrock.invoke_model(
    body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())

embedding = response_body.get("embedding")
print(f"The embedding vector has {len(embedding)} values\n{embedding[0:3]+['...']+embedding[-3:]}")


The embedding vector has 1536 values
[-0.14746094, 0.77734375, 0.26953125, '...', 0.14355469, -0.09472656, -0.34570312]


## Let us build our application  -- we will first try and see how can we plug in a History into the data we send to the Model

We use [CoversationChain](https://python.langchain.com/en/latest/modules/models/llms/integrations/bedrock.html?highlight=ConversationChain#using-in-a-conversation-chain) from LangChain to start the conversation. We also use the [ConversationBufferMemory](https://python.langchain.com/en/latest/modules/memory/types/buffer.html) for storing the messages. We can also get the history as a list of messages (this is very useful in a chat model).

Chatbots needs to remember the previous interactions. Conversational memory allows us to do that. There are several ways that we can implement conversational memory. In the context of LangChain, they are all built on top of the ConversationChain.

**Note:** The model outputs are non-deterministic

What happens here? We said "Hi there!" and the model spat out a several conversations. This is due to the fact that the default prompt used by Langchain ConversationChain is not well designed for Claude. An [effective Claude prompt](https://docs.anthropic.com/claude/docs/introduction-to-prompt-design) should contain `\n\nHuman:` at the beginning and also contain `\n\nAssistant:` in the prompt sometime after the `\n\nHuman:` (optionally followed by other text that you want to [put in Claude's mouth](https://docs.anthropic.com/claude/docs/human-and-assistant-formatting#use-human-and-assistant-to-put-words-in-claudes-mouth)). Let's fix this.

To learn more about how to write prompts for Claude, check [Anthropic documentation](https://docs.anthropic.com/claude/docs/introduction-to-prompt-design).

In [87]:
from langchain.memory import ConversationBufferMemory
from langchain import PromptTemplate
from langchain.chains import ConversationChain
from langchain.llms.bedrock import Bedrock
from langchain.memory import ConversationBufferMemory

cl_llm = Bedrock(
    model_id="anthropic.claude-v2",
    client=boto3_bedrock,
    model_kwargs={"max_tokens_to_sample": 100},
)

# turn verbose to true to see the full logs and documents
conversation= ConversationChain(
    llm=cl_llm, verbose=False, memory=ConversationBufferMemory() #memory_chain
)

# langchain prompts do not always work with all the models. This prompt is tuned for Claude
claude_prompt = PromptTemplate.from_template("""

Human: The following is a friendly conversation between a human and an AI.
The AI is talkative and provides lots of specific details from its context. If the AI does not know
the answer to a question, it truthfully says it does not know.

Current conversation:
<conversation_history>
{history}
</conversation_history>

Here is the human's next reply:
<human_reply>
{input}
</human_reply>

Assistant:
""")

conversation.prompt = claude_prompt

print_ww(conversation.predict(input="Hi there! Will this work for class imbalances"))

 Unfortunately I don't have enough context to know specifically what you're referring to by "this"
or what class imbalances you are concerned about. In general though, certain machine learning
techniques can help address class imbalance issues in datasets. Some options include:

- Over-sampling the minority class or under-sampling the majority class to even out the class
distribution. This involves replicating minority class examples or removing majority class examples.

- Using loss functions that weight the importance of correctly classifying


#### Ok so the model was not aware of what we were thinking. We provide our thoughts as `chat history `

we ask again

In [88]:
print_ww(conversation.predict(input="""
    Human: What are the tools in Amazon SageMaker for Bias detection?
    Assistant:
    Amazon SageMaker Clarify helps improve model transparency by detecting statistical bias across the entire ML workflow. 
    SageMaker Clarify checks for Class imbalances during data preparation, after training, and ongoing over time, and also includes tools to help explain ML models and their predictions. 
    Use this information for answering the question.
                              
    Human: Will this work for class imbalances?
"""))

 Here are some of the key tools in Amazon SageMaker for detecting bias and addressing class
imbalances:

- Bias and Explainability Reports - These provide an analysis of your training data and models to
check for potentially biased predictions or uneven class distributions. The reports highlight areas
like skew in labels, missing feature values, and other anomalies.

- Pre-training Bias Detection - Analyzes your training data before model building to detect
imbalances in labels, missing values, etc. This


#### Build on the questions

Let's ask a question without mentioning the word garden to see if model can understand previous conversation

In [89]:
print_ww(conversation.predict(input="Cool. Will that work with Data Preperation as well?"))

 Yes, Amazon SageMaker's bias detection and mitigation tools can be used in conjunction with data
preparation. Some ways it can help with class imbalances during data prep:

- The Bias and Explainability Reports can analyze your training dataset before any data prep or
model building to highlight label skews, missing values, etc. This gives you insights into what
imbalances exist.

- The data prep tools like Data Wrangler allow you to visualize your data, profile it for


#### Now the Model gave us much more relevant answer, still halucinating but we now have at our hands a way to have a conversation and history. We have our embedding model. 

In [65]:
print_ww(conversation.predict(input="That's all, thank you!"))

 You're welcome! I'm glad I could provide some useful information about how Amazon SageMaker Clarify
can help detect bias across different stages of the machine learning workflow, including data
preparation. Addressing biases and imbalanced classes through tools like Clarify can lead to more
fair and accurate models. Please let me know if you have any other questions!


## Next step is to automate this and use a Vector databases

we will plug in the in memory vector data base

#### Titan embeddings Model

Embeddings are a way to represent words, phrases or any other discrete items as vectors in a continuous vector space. This allows machine learning models to perform mathematical operations on these representations and capture semantic relationships between them.

Embeddings are for example used for the RAG [document search capability](https://labelbox.com/blog/how-vector-similarity-search-works/) 


In [90]:
from langchain.embeddings import BedrockEmbeddings
import ipywidgets as ipw
from IPython.display import display, clear_output

br_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-g1-text-02", client=boto3_bedrock)

#### FAISS as VectorStore

In order to be able to use embeddings for search, we need a store that can efficiently perform vector similarity searches. In this notebook we use FAISS, which is an in memory store. For permanently store vectors, one can use pgVector, Pinecone or Chroma.

The langchain VectorStore API's are available [here](https://python.langchain.com/en/harrison-docs-refactor-3-24/reference/modules/vectorstore.html)

To know more about the FAISS vector store please refer to this [document](https://arxiv.org/pdf/1702.08734.pdf).

In [91]:
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.vectorstores import FAISS

# s3_path = "s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv"
# !aws s3 cp $s3_path ./rag_data/Amazon_SageMaker_FAQs.csv

loader = CSVLoader("./rag_data/Amazon_SageMaker_FAQs.csv") # --- > 219 docs with 400 chars, each row consists in a question column and an answer column
documents_aws = loader.load() #
print(f"Number of documents={len(documents_aws)}")

docs = CharacterTextSplitter(chunk_size=2000, chunk_overlap=400, separator=",").split_documents(documents_aws)

print(f"Number of documents after split and chunking={len(docs)}")

vectorstore_faiss_aws = FAISS.from_documents(
    documents=docs,
    embedding = br_embeddings
)

print(f"vectorstore_faiss_aws: number of elements in the index={vectorstore_faiss_aws.index.ntotal}::")


Number of documents=153
Number of documents after split and chunking=154
vectorstore_faiss_aws: number of elements in the index=154::


#### Semantic search

We can use a Wrapper class provided by LangChain to query the vector data base store and return to us the relevant documents. Behind the scenes this is only going to run a RetrievalQA chain.

Let's see how the semantic search works:
1. First we calculate the embeddings vector for the query, and
2. then we use this vector to do a similarity search on the store

In [92]:
v = br_embeddings.embed_query("How can I check for imbalances in my model?")
print(v[0:10])
results = vectorstore_faiss_aws.similarity_search_by_vector(v, k=4)
for r in results:
    print_ww(r.page_content)
    print('----')

[-0.14746094, 0.77734375, 0.26953125, -0.55859375, 0.047851562, -0.43554688, -0.057617188, -0.00030326843, -0.5703125, -0.33789062]
﻿What is Amazon SageMaker?: How can I check for imbalances in my model?
Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine
learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.:
Amazon SageMaker Clarify helps improve model transparency by detecting statistical bias across the
entire ML workflow. SageMaker Clarify checks for imbalances during data preparation, after training,
and ongoing over time, and also includes tools to help explain ML models and their predictions.
Findings can be shared through explainability reports.
----
﻿What is Amazon SageMaker?: What kind of bias does Amazon SageMaker Clarify detect?
Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine
learning (ML) models for any use case with fully managed infrastru

#### Bring everything together . We will use LangChain to orchesterate our application
In any chatbot we will need a QA Chain with various options which are customized by the use case. But in a chatbot we will always need to keep the history of the conversation so the model can take it into consideration to provide the answer. In this example we use the [ConversationalRetrievalChain](https://python.langchain.com/docs/modules/chains/popular/chat_vector_db) from LangChain, together with a ConversationBufferMemory to keep the history of the conversation.

Source: https://python.langchain.com/docs/modules/chains/popular/chat_vector_db

Set `verbose` to `True` to see all the what is going on behind the scenes.

In [93]:
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT

print_ww(CONDENSE_QUESTION_PROMPT.template)
class ChatUX:
    """ A chat UX using IPWidgets
    """
    def __init__(self, qa, retrievalChain = False):
        self.qa = qa
        self.name = None
        self.b=None
        self.retrievalChain = retrievalChain
        self.out = ipw.Output()


    def start_chat(self):
        print("Starting chat bot")
        display(self.out)
        self.chat(None)


    def chat(self, _):
        if self.name is None:
            prompt = ""
        else: 
            prompt = self.name.value
        if 'q' == prompt or 'quit' == prompt or 'Q' == prompt:
            print("Thank you , that was a nice chat!!")
            return
        elif len(prompt) > 0:
            with self.out:
                thinking = ipw.Label(value="Thinking...")
                display(thinking)
                try:
                    if self.retrievalChain:
                        result = self.qa.run({'question': prompt })
                    else:
                        result = self.qa.run({'input': prompt }) #, 'history':chat_history})
                except:
                    result = "No answer"
                thinking.value=""
                print_ww(f"AI:{result}")
                self.name.disabled = True
                self.b.disabled = True
                self.name = None

        if self.name is None:
            with self.out:
                self.name = ipw.Text(description="You:", placeholder='q to quit')
                self.b = ipw.Button(description="Send")
                self.b.on_click(self.chat)
                display(ipw.Box(children=(self.name, self.b)))

Given the following conversation and a follow up question, rephrase the follow up question to be a
standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:


Your mileage might vary, but after 2 or 3 questions you will start to get some weird answers. In some cases, even in other languages.
This is happening for the same reasons outlined at the beginning of this notebook: the default langchain prompts are not optimal for Claude. 
In the following cell we are going to set two new prompts: one for the question rephrasing, and one to get the answer from that rephrased question.

In [104]:
# turn verbose to true to see the full logs and documents
from langchain.chains import ConversationalRetrievalChain
from langchain.schema import BaseMessage


# We are also providing a different chat history retriever which outputs the history as a Claude chat (ie including the \n\n)
_ROLE_MAP = {"human": "\n\nHuman: ", "ai": "\n\nAssistant: "}
def _get_chat_history(chat_history):
    buffer = ""
    for dialogue_turn in chat_history:
        if isinstance(dialogue_turn, BaseMessage):
            role_prefix = _ROLE_MAP.get(dialogue_turn.type, f"{dialogue_turn.type}: ")
            buffer += f"\n{role_prefix}{dialogue_turn.content}"
        elif isinstance(dialogue_turn, tuple):
            human = "\n\nHuman: " + dialogue_turn[0]
            ai = "\n\nAssistant: " + dialogue_turn[1]
            buffer += "\n" + "\n".join([human, ai])
        else:
            raise ValueError(
                f"Unsupported chat history format: {type(dialogue_turn)}."
                f" Full chat history: {chat_history} "
            )
    return buffer

# the condense prompt for Claude
condense_prompt_claude = PromptTemplate.from_template("""{chat_history}

Answer only with the new question.


Human: How would you ask the question considering the previous conversation: {question}


Assistant: Question:""")

# recreate the Claude LLM with more tokens to sample - this provides longer responses but introduces some latency
cl_llm = Bedrock(model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={"max_tokens_to_sample": 500})
memory_chain = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
qa = ConversationalRetrievalChain.from_llm(
    llm=cl_llm, 
    retriever=vectorstore_faiss_aws.as_retriever(), 
    #retriever=vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={"k": 8}),
    memory=memory_chain,
    get_chat_history=_get_chat_history,
    #verbose=True,
    condense_question_prompt=condense_prompt_claude, 
    chain_type='stuff', # 'refine',
    #max_tokens_limit=300
)

# the LLMChain prompt to get the answer. the ConversationalRetrievalChange does not expose this parameter in the constructor
qa.combine_docs_chain.llm_chain.prompt = PromptTemplate.from_template("""
{context}

Human: Use at maximum 3 sentences to answer the question inside the <q></q> XML tags. 

<q>{question}</q>

Do not use any XML tags in the answer. If the answer is not in the context say "Sorry, I don't know as the answer was not found in the context"

Assistant:""")

Let's start our chat. Feel free to ask the following questions:

You can run the UX or run the Cells with Questions below

1. How can I check for imbalances in my model?
2. What kind of bias gets detected ? 
or What kind of bias does this detect ?  
3. How does this improve model explainability?,

In [105]:
chat = ChatUX(qa, retrievalChain=True)
chat.start_chat()

Starting chat bot


Output()

In [96]:
print_ww(qa.run({'question': 'How can I check for imbalances in my model?' }))

 <q> You can use Amazon SageMaker Clarify to detect statistical bias across the ML workflow by
checking for imbalances during data preparation, after training with feature importance graphs, and
during deployment with bias monitoring.</q>


In [99]:
print_ww(qa.run({'question': 'What kind of bias gets detected ? ' }))

 <q> Amazon SageMaker Clarify can detect different types of bias or imbalance in a dataset or model,
including representation imbalance in the training data and differences in label distribution, error
rates, precision, and recall across groups.</q>


In [101]:
print_ww(qa.run({'question': 'How does this improve model explainability?' }))

 <q> Based on our previous discussion of using Amazon SageMaker Clarify to detect statistical bias
and imbalances, could you explain how SageMaker Clarify helps improve model explainability and
interpretability?</q>

Amazon SageMaker Clarify provides feature importance rankings to explain model predictions, as well
as bias analysis to detect statistical imbalances that could lead to unfair outcomes.
