##### Prerequisites

In [1]:
%pip install faiss-cpu==1.7.4 --quiet

Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install langchain==0.0.222 --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-experimental 0.0.27 requires langchain>=0.0.308, but you have langchain 0.0.222 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [3]:
%%capture 

!pip install PyYAML
!pip install pypdf

#### Imports

In [4]:
import requests
import logging 
import boto3
import yaml
import json
from langchain.embeddings import BedrockEmbeddings
import ipywidgets as ipw
from IPython.display import display, clear_output
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

##### Setup logging

In [5]:
logger = logging.getLogger('sagemaker')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

##### Log versions of dependencies 

In [6]:
logger.info(f'Using requests=={requests.__version__}')
logger.info(f'Using pyyaml=={yaml.__version__}')

Using requests==2.31.0
Using pyyaml==6.0.1


#### Setup essentials

In [7]:
TEXT_EMBEDDING_MODEL_ENDPOINT_NAME = 'jumpstart-dft-hf-textembedding-gpt-j-6b-fp16'
TEXT_GENERATION_MODEL_ENDPOINT_NAME = 'jumpstart-dft-hf-llm-falcon-7b-instruct-bf16'

BEDROCK_EMBEDDING_MODEL = "amazon.titan-embed-g1-text-02"
BEDROCK_GENERATION_MODEL = ''
REGION_NAME = boto3.session.Session().region_name

#### Data Download

In [9]:
s3_path = "s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv"
!aws s3 cp $s3_path ../data/Amazon_SageMaker_FAQs.csv

download: s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv to ../data/Amazon_SageMaker_FAQs.csv


#### Create Vector Store in FAISS 

In [11]:
boto3_bedrock = boto3.client("bedrock-runtime")

br_embeddings = BedrockEmbeddings(model_id=BEDROCK_EMBEDDING_MODEL, client=boto3_bedrock)

loader = CSVLoader("../data/Amazon_SageMaker_FAQs.csv") # --- > 219 docs with 400 chars, each row consists in a question column and an answer column
documents_aws = loader.load() #
print(f"Number of documents={len(documents_aws)}")

docs = CharacterTextSplitter(chunk_size=2000, chunk_overlap=400, separator=",").split_documents(documents_aws)

print(f"Number of documents after split and chunking={len(docs)}")

vectorstore_faiss_aws = FAISS.from_documents(
    documents=docs,
    embedding = br_embeddings
)

print(f"vectorstore_faiss_aws: number of elements in the index={vectorstore_faiss_aws.index.ntotal}::")

Number of documents=153
Number of documents after split and chunking=154
vectorstore_faiss_aws: number of elements in the index=154::


#### Search Vector Store

In [14]:
context = ''
query = 'How can I check for imbalances in my model?'
v = br_embeddings.embed_query(query)
print(v[0:10])
results = vectorstore_faiss_aws.similarity_search_by_vector(v, k=4)
for r in results:
    context += r.page_content
    
print(context)

[-0.14746094, 0.77734375, 0.26953125, -0.55859375, 0.047851562, -0.43554688, -0.057617188, -0.00030326843, -0.5703125, -0.33789062]
﻿What is Amazon SageMaker?: How can I check for imbalances in my model?
Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.: Amazon SageMaker Clarify helps improve model transparency by detecting statistical bias across the entire ML workflow. SageMaker Clarify checks for imbalances during data preparation, after training, and ongoing over time, and also includes tools to help explain ML models and their predictions. Findings can be shared through explainability reports.﻿What is Amazon SageMaker?: What kind of bias does Amazon SageMaker Clarify detect?
Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure,

Now create a prompt template to trigger the model with above context from vector search. We specifically inform the model to answer only using the context provied.

In [15]:
template = f"""Human: 
        You are a helpful, polite, fact-based agent.
        If you don't know the answer, just say that you don't know.
        Please answer the following question using the context provided. 

        CONTEXT: 
        {context}
        =========
        QUESTION: {query} 
        Assistant: """


In [16]:
prompt = template.format(context=context, question=query)
print(prompt)

Human: 
        You are a helpful, polite, fact-based agent.
        If you don't know the answer, just say that you don't know.
        Please answer the following question using the context provided. 

        CONTEXT: 
        ﻿What is Amazon SageMaker?: How can I check for imbalances in my model?
Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.: Amazon SageMaker Clarify helps improve model transparency by detecting statistical bias across the entire ML workflow. SageMaker Clarify checks for imbalances during data preparation, after training, and ongoing over time, and also includes tools to help explain ML models and their predictions. Findings can be shared through explainability reports.﻿What is Amazon SageMaker?: What kind of bias does Amazon SageMaker Clarify detect?
Amazon SageMaker is a fully managed service to prepare data and build, 

Invoke the endpoint to generate a response from the LLM

In [17]:
bedrock = boto3.client("bedrock-runtime")

modelId = 'anthropic.claude-v2' # change this to use a different version from the model provider
accept = 'application/json'
contentType = 'application/json'

body = json.dumps({
                    "prompt": prompt,
                    "max_tokens_to_sample":4096,
                    "temperature":0.5,
                    "top_k":250,
                    "top_p":0.5,
                    "stop_sequences": ["\n\nHuman:"]
                  }) 

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())


In [18]:
print(response_body.get('completion'))

 Here are a few ways to check for imbalances in your machine learning model using Amazon SageMaker Clarify:

- Use bias metrics and explainability reports in SageMaker Clarify to detect statistical bias across the ML workflow - during data preparation, after training, and ongoing monitoring. This can reveal imbalances or differences in model performance between groups.

- Analyze training data with Clarify's data bias metrics to check for imbalances in representation or differences in label distribution across groups before training.

- Evaluate models after training with Clarify's model bias metrics to measure differences in error rates, precision, recall, etc. across groups.

- Monitor models over time during deployment and inference with Clarify's model monitoring features to continuously check for concept drift or changes in bias metrics.

- Generate explainability reports in Clarify to understand feature attributions and how different groups are being treated by the model. Look fo