## Setup for inference

In [1]:
import os

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

llm = ChatOpenAI(api_key=OPENAI_API_KEY)

# Smoke test
llm.invoke("How tall is the eiffel tower?")

AIMessage(content='The Eiffel Tower is 1,063 feet (324 meters) tall, including antennas.', response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 16, 'total_tokens': 36}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None}, id='run-da24337a-7b6f-4f97-8eb8-6dd3dba8f9ec-0')

### Retrieval Augmented Generation (RAG)

In [2]:
import pandas as pd

df = pd.read_csv('../data/reinvent_qa.csv', delimiter=';')

with pd.option_context("display.max_rows", None):
    with pd.option_context("display.max_colwidth", None):
        display(df.head())


Unnamed: 0,Question,Answer
0,What city was AWS re:Invent 2022 held in?,Las Vegas
1,When did AWS re:Invent 2022 take place?,"November 28 to December 2, 2022"
2,How many years has AWS re:Invent been running?,11 years
3,How many people attended re:Invent 2022 in person?,"Over 51,000"
4,How many keynotes were featured at re:Invent 2022?,5 keynotes


#### Prompting without retriever


In [3]:


template = """

Human: Answer the question below.
Keep your response as precise as possible and limit it to a few words. 
If you don't know the answer, respond "I don't know".

Here is the question: 
{question}

Assistant:"""


def answer_question_llm(question: str) -> str:
    prompt_message = PromptTemplate.from_template(template).format(
        question=question
    )
    print(prompt_message)
    answer = llm.invoke(prompt_message)
    return answer.content.strip()


# Smoke test
answer_question_llm("What city was AWS re:Invent 2022 held in?")



Human: Answer the question below.
Keep your response as precise as possible and limit it to a few words. 
If you don't know the answer, respond "I don't know".

Here is the question: 
What city was AWS re:Invent 2022 held in?

Assistant:


"I don't know"

In [4]:
from langchain.prompts import PromptTemplate

template = """

Human: Answer the question below.
Keep your response as precise as possible and limit it to a few words. 
If you don't know the answer, respond "I don't know".

Here is the question: 
{question}

Assistant:"""

def ask_llm(row):
    prompt_message = PromptTemplate.from_template(template).format(
        question=row['Question']
    )
    answer = llm.invoke(prompt_message)
    return answer.content.strip()

df["LLM_answer"] = df.apply(ask_llm, axis=1)

with pd.option_context("display.max_rows", None):
    with pd.option_context("display.max_colwidth", None):
        display(df.head())

Unnamed: 0,Question,Answer,LLM_answer
0,What city was AWS re:Invent 2022 held in?,Las Vegas,Las Vegas
1,When did AWS re:Invent 2022 take place?,"November 28 to December 2, 2022",I don't know.
2,How many years has AWS re:Invent been running?,11 years,9 years
3,How many people attended re:Invent 2022 in person?,"Over 51,000",I don't know.
4,How many keynotes were featured at re:Invent 2022?,5 keynotes,I don't know


#### Prompting with retriever

A way to incorporate current knowledge into the model is to use an information from related sources. Let's use LangChain document loader. 

In [5]:
import re
from langchain.document_loaders import UnstructuredURLLoader

# List of URLs for the loader. We will only use one in this example.
urls = [
    "https://aws.amazon.com/blogs/security/three-key-security-themes-from-aws-reinvent-2022/",
]

# Define the URL Loader
loader = UnstructuredURLLoader(urls=urls)

# Load the data
data = loader.load()

# Pre-process the content for prettier display
data[0].page_content = re.sub("\n{3,}", "\n", data[0].page_content)
data[0].page_content = re.sub(" {2,}", " ", data[0].page_content)

print(data[0].page_content[214:1200])
print()


AWS re:Invent returned to Las Vegas, Nevada, November 28 to December 2, 2022. After a virtual event in 2020 and a hybrid 2021 edition, spirits were high as over 51,000 in-person attendees returned to network and learn about the latest AWS innovations.

Now in its 11th year, the conference featured 5 keynotes, 22 leadership sessions, and more than 2,200 breakout sessions and hands-on labs at 6 venues over 5 days.

With well over 100 service and feature announcements—and innumerable best practices shared by AWS executives, customers, and partners—distilling highlights is a challenge. From a security perspective, three key themes emerged.

Turn data into actionable insights

Security teams are always looking for ways to increase visibility into their security posture and uncover patterns to make more informed decisions. However, as AWS Vice President of Data and Machine Learning, Swami Sivasubramanian, pointed out during his keynote, data often exists in silos; it isn’t alw



#### Split documents into chunks


Handling extensive documents can be problematic for RAG due to their potential to exceed the context window's capacity. To manage this, documents are typically divided into smaller segments. This division not only facilitates the retrieval of the most pertinent segments by the retriever but also prevents the need to process the whole document through an LLM at once. In this segment, we utilize the [`RecursiveCharacterTextSplitter`](https://api.python.langchain.com/en/latest/text_splitter/langchain.text_splitter.RecursiveCharacterTextSplitter.html), a standard [text splitter](https://python.langchain.com/docs/modules/data_connection/document_transformers/#text-splitters) tool in LangChain. This splitter operates by taking an array of separators, initially splitting the text using the first separator, and progressing to subsequent separators if the resulting segments are still excessively large.


In [10]:
import random
from langchain.text_splitter import (
    RecursiveCharacterTextSplitter,
)

# Use the recursive character splitter
recur_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=200,
    separators=[r"\n\n", r"\n", r"(?<=\. )", r" ", r""],
    is_separator_regex=True,
)

# Perform the splits using the splitter
data_splits = recur_splitter.split_documents(data)

# Print a random chunk
print(random.choice(data_splits).page_content)

Threat detection and monitoring

Monitoring for malicious activity and anomalous behavior just got simpler. Amazon GuardDuty RDS Protection expands the threat detection capabilities of GuardDuty by using tailored machine learning (ML) models to detect suspicious logins to Amazon Aurora databases. You can enable the feature with a single click in the GuardDuty console, with no agents to manually deploy, no data sources to enable, and no permissions to configure. When RDS Protection detects a potentially suspicious or anomalous login attempt that indicates a threat to your database instance, GuardDuty generates a new finding with details about the potentially compromised database instance. You can view GuardDuty findings in AWS Security Hub, Amazon Detective (if enabled), and Amazon EventBridge, allowing for integration with existing security event management or workflow systems.

To bolster vulnerability management processes, Amazon Inspector now supports AWS Lambda functions, adding au

#### Embeddings and vector databases

For RAG to be successful, we need a way of doing a semantic search to **retrieve the documents that contain the most relevant information to be used in the answer generation process**. At this stage, the concept of **embedding** comes into play. This is the transformation of the previously extracted and chunked text into a vector in a high-dimensional space that represents the semantic meaning.

In this example we will use Amazon's  to generate the embeddings.

In [13]:
from langchain_openai import OpenAIEmbeddings

llm_embeddings = OpenAIEmbeddings(model="text-embedding-3-large", api_key=OPENAI_API_KEY)

# Smoke test
text = "This is a test document."
query_result = llm_embeddings.embed_query(text)
print(query_result)

[-0.014354088541273189, -0.027212579177899553, -0.020026415862148084, 0.057306920035778704, -0.02226981178779573, 0.021503774745242565, -0.02323647806742465, 0.06405534683427398, -0.016725156686464637, 0.018950315453205115, 0.018457861871291895, 0.024713836950519128, -0.015658175788297837, -0.048479244780316504, -0.007026572342829778, 0.038703144129422905, -0.023309434153634016, -0.0012117531317385084, -0.012949685744388074, -0.023564780455366793, 0.016542768333586397, 0.00449135207234467, -0.0413660375507744, 0.045524530739417896, 0.015858805025373592, 0.016861949348107197, -0.00019535672886997337, 0.008038836642000845, 0.01824811312343997, 0.004092374407209789, 0.016050314751673178, 0.04563396486873195, -0.02347358534760509, -0.02797861622045272, 0.053294339019553945, -0.004247405624743397, 0.04475849183421955, 0.059714467155397424, 0.01559434014418723, -0.016998742009749757, 0.03076918279267061, -0.012083333151974443, -0.022452202003319145, 0.026993710919271458, 0.02606352268274722,

We also need a place to store the documents' vector representation efficiently, allowing for quick retrieval. For the sake of this example we will use FAISS (Facebook AI Similarity Search). For real production system you will need scalable vector search databases. See more: https://python.langchain.com/docs/integrations/vectorstores/