# Part 1 - Bedrock Configuration

In [36]:
import boto3, json, os
bedrock = boto3.client(service_name='bedrock-runtime')
body = json.dumps({
 "prompt": "\n\nHuman: What is Amazon Bedrock? \n\nAssistant:",
 "max_tokens_to_sample": 300,
 "temperature": 0.1,
 "top_p": 0.9,
})
modelId = 'anthropic.claude-v2'
accept = 'application/json'
contentType = 'application/json'

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)

response_body = json.loads(response.get('body').read())
# text
print(response_body.get('completion'))

 Amazon Bedrock is an internal technology platform developed by Amazon to run and operate many of their services and products. Some key things about Bedrock:

- It provides foundational services like compute, storage, database, networking that power many Amazon products and services. Bedrock acts as a common infrastructure layer.

- It enables Amazon to rapidly launch and scale new products and services by building on top of Bedrock rather than having to develop everything from scratch. 

- It incorporates technologies like containers and microservices to allow for modularity and frequent updates.

- Bedrock is used across many Amazon businesses including Amazon.com, Amazon Web Services, Alexa, Prime Video, etc.

- It aims to increase developer productivity and reduce time-to-market for new products by providing common tools and services out of the box.

- Bedrock is a proprietary technology stack developed internally by Amazon. The details are not public but it is seen as one of Amazo

In [37]:
bedrock_client = boto3.client(service_name='bedrock-runtime')

# Part 2 - Creating embedding and storing in a vector database

In [38]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock

# - create the Anthropic Model
llm = Bedrock(
    model_id="anthropic.claude-v2", 
    client=bedrock_client, 
    model_kwargs={"max_tokens_to_sample": 200}
)
bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=bedrock_client)

In [39]:
from urllib.request import urlretrieve

os.makedirs("data", exist_ok=True)
files = [
    "https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-ug.pdf",
    "https://docs.aws.amazon.com/bedrock/latest/APIReference/bedrock-api.pdf",
]
for url in files:
    file_path = os.path.join("data", url.rpartition("/")[2])
    urlretrieve(url, file_path)

In [40]:
import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("./data/")

documents = loader.load()
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 1000,
    chunk_overlap  = 0,
)
docs = text_splitter.split_documents(documents)

In [41]:
import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("./data/")

documents = loader.load()
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 1000,
    chunk_overlap  = 0,
)
docs = text_splitter.split_documents(documents)

In [42]:
docs[1]

Document(page_content="Amazon Bedrock API ReferenceAmazon Bedrock: API Reference\nCopyright © 2023 Amazon Web Services, Inc. and/or its aﬃliates. All rights reserved.\nAmazon's trademarks and trade dress may not be used in connection with any product or service that is not \nAmazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or \ndiscredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may \nor may not be aﬃliated with, connected to, or sponsored by Amazon.", metadata={'source': 'data/bedrock-api.pdf', 'page': 1})

In [47]:
import pinecone, time

from dotenv import load_dotenv
load_dotenv("keys.env") # Load variables from .env file


# add index name from pinecone.io
index_name = ''
# add Pinecone API key from app.pinecone.io
api_key = os.environ.get("PINECONE_API_KEY")
# set Pinecone environment - find next to API key in console
env = os.environ.get("PINECONE_ENVIRONMENT")

#pinecone.init(api_key=api_key, environment=env)

# Initialize Pinecone
pinecone.init(api_key=api_key, environment=env)

In [48]:
sample_embedding = np.array(bedrock_embeddings.embed_query(docs[1].page_content))
print("Sample embedding of a document chunk: ", sample_embedding)
print("Size of the embedding: ", sample_embedding.shape)

Sample embedding of a document chunk:  [-6.1392784e-06  2.2558594e-01  7.6953125e-01 ...  2.2363281e-01
 -8.2421875e-01 -1.3671875e-01]
Size of the embedding:  (1536,)


In [49]:
# used documentation from https://python.langchain.com/docs/integrations/vectorstores/pinecone
index_name = "demoindex"
index = pinecone.Index(index_name)
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

In [50]:
%%time

from langchain.vectorstores import Pinecone

docsearch = Pinecone.from_documents(docs, bedrock_embeddings, index_name=index_name)

CPU times: user 3.12 s, sys: 171 ms, total: 3.29 s
Wall time: 1min 3s


In [51]:
from langchain.vectorstores import Pinecone

text_field = "text"

# switch back to normal index for langchain
index = pinecone.Index(index_name)

vectorstore = Pinecone(index, bedrock_embeddings, text_field)

# Part 3 - Compare using LLM vs LLM + RAG

Claude 2 vs LangChain + Claude 2 + Vector Database

In [52]:
query = "What is Amazon Bedrock?"

vectorstore.similarity_search(query, k=3)  # our search query  # return 3 most relevant docs

[Document(page_content='Amazon Bedrock User GuideTable of Contents\nWhat is Amazon Bedrock?...................................................................................................................1\nAccess the Amazon Bedrock models.............................................................................................1\nFeatures of Amazon Bedrock.......................................................................................................1\nSupported models in Amazon Bedrock.........................................................................................2\nSupported Regions.....................................................................................................................3\nAmazon Bedrock pricing.............................................................................................................3\nSet up ...............................................................................................................................

In [53]:
# standard prompt based on https://python.langchain.com/docs/use_cases/question_answering/vector_db_qa

from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever())
qa.run(query)

' Based on the provided context, Amazon Bedrock is described as "a fully managed service that makes base models from Amazon and third-party model providers accessible through an API."'

In [54]:
# https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/03_QuestionAnswering/02_rag_claude_titan_pinecone.ipynb

# csutomizable option

from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

prompt_template = """

Human: Use the following pieces of context to provide a detailed answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}

Assistant:"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT},
)
result = qa({"query": query})
print(result["result"])


 Based on the Amazon Bedrock User Guide provided, Amazon Bedrock is a fully managed service that makes base models from Amazon and third-party model providers accessible through an API. It allows users to explore capabilities like text playground for hands-on text generation, image playground for hands-on image generation, and provides access to models through an API after requesting access.


In [55]:
#without RAG
bedrock = boto3.client(service_name='bedrock-runtime')

body = json.dumps({
 "prompt": "\n\nHuman:What are Amazon Bedrock key features?\n\nAssistant:",
 "max_tokens_to_sample": 300,
 "temperature": 0.1,
 "top_p": 0.9,
})
modelId = 'anthropic.claude-v2'
accept = 'application/json'
contentType = 'application/json'

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)

response_body = json.loads(response.get('body').read())
# text
print(response_body.get('completion'))

 Here are some of the key features of Amazon Bedrock:

- Managed blockchain service - Bedrock is a fully managed blockchain service by AWS that makes it easy to build and scale blockchain networks. Developers don't need to provision hardware or install software.

- Support for Ethereum and Hyperledger Fabric - Bedrock supports two of the most popular blockchain frameworks, Ethereum and Hyperledger Fabric. This allows you to build decentralized applications on top of these networks.

- High performance and scalability - Bedrock is designed for performance and scalability. It utilizes AWS infrastructure to deliver fast transaction speeds and the ability to scale networks as usage grows.

- Secure and compliant - Bedrock provides enterprise-grade security features like encryption, access controls, and integrations with AWS security services. It is compliant with standards like HIPAA, SOC, and PCI. 

- Integrations with AWS services - Bedrock integrates with various AWS services like Amazo

In [60]:
# with RAG

query = "What Amazon Titan model used for?"
vectorstore.similarity_search(query, k=3)  # our search query  # return 3 most relevant docs

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT},
)
result = qa({"query": query})
print(result["result"])

 Based on the provided context, the Amazon Titan models in Amazon Bedrock are used for natural language processing tasks like text generation. Specifically, the models mentioned are:

- Titan Text G1 - Express - A text generation model that is in limited preview release.

- Titan Embeddings G1 - Text - A model that generates embeddings from text. Embeddings represent words or sentences as numerical vectors which can be useful for downstream NLP tasks.

So in summary, the Amazon Titan models are used for natural language text generation and creating vector representations of text. The context indicates they support features like controlling response length, top-p sampling, and stop sequences when generating text.
