# S3 Vector Search Notebook

This notebook demonstrates how to interact with **S3 Vectors** by:
1. Connecting to the S3 Vector Index using the `s3vectors` boto3 client.
2. Generating embeddings for a query using the Titan v2 model.
3. Performing a similarity search directly against the S3 Vector Index.
4. Generating an answer using an LLM (Llama 3) based on the retrieved context, with latency reporting.

In [85]:
# Import necessary libraries
import boto3
import os
import json
import secrets
import string
from dotenv import load_dotenv

# Load environment variables
load_dotenv(override=True)

print("‚úÖ Libraries imported and environment variables loaded.")

‚úÖ Libraries imported and environment variables loaded.


In [86]:
# Configuration
aws_region = os.getenv("AWS_REGION", "us-west-2")
aws_profile = os.getenv("AWS_PROFILE", "default")
bedrock_embedding_model_id = os.getenv("BEDROCK_EMBEDDING_MODEL_ID", "amazon.titan-embed-text-v2:0")
bedrock_model_id = os.getenv("BEDROCK_MODEL_ID", "meta.llama3-1-8b-instruct-v1:0")
s3_vector_bucket_name = os.getenv("S3_VECTOR_BUCKET_NAME")
s3_vector_index_name = os.getenv("S3_VECTOR_INDEX_NAME")

print(f"AWS Region: {aws_region}")
print(f"AWS Profile: {aws_profile}")
print(f"Bedrock Embedding Model: {bedrock_embedding_model_id}")
print(f"Bedrock Generation Model: {bedrock_model_id}")
print(f"S3 Vector Bucket: {s3_vector_bucket_name}")
print(f"S3 Vector Index: {s3_vector_index_name}")

# Setup AWS Session
session = boto3.Session(profile_name=aws_profile, region_name=aws_region)
bedrock_client = session.client("bedrock-runtime")

# Initialize S3 Vectors Client
try:
    s3_vectors_client = session.client("s3vectors")
    print("‚úÖ S3 Vectors client initialized.")
except Exception as e:
    print(f"‚ö†Ô∏è Failed to initialize 's3vectors' client. Ensure your boto3 version supports it. Error: {e}")
    s3_vectors_client = None

AWS Region: us-east-1
AWS Profile: warike-development
Bedrock Embedding Model: amazon.titan-embed-text-v2:0
Bedrock Generation Model: us.meta.llama3-1-8b-instruct-v1:0
S3 Vector Bucket: bucket-vector-s3-lambda
S3 Vector Index: documents
‚úÖ S3 Vectors client initialized.


## Step 1: Generate Embeddings
Use the Titan v2 model to generate embeddings for a sample query.

In [79]:
def get_embedding(text, model_id=bedrock_embedding_model_id):
    try:
        body = json.dumps({
            "inputText": text,
            # Optional: "dimensions": 1024, "normalize": True
        })
        
        response = bedrock_client.invoke_model(
            body=body,
            modelId=model_id,
            accept="application/json",
            contentType="application/json"
        )
        
        response_body = json.loads(response.get("body").read())
        embedding = response_body.get("embedding")
        return embedding
    except Exception as e:
        print(f"‚ùå Error generating embedding: {e}")
        return None

# Test Query
query_text = "Explain how EventBridge works in LLM Workflow context"
query_embedding = get_embedding(query_text)

if query_embedding:
    print(f"‚úÖ Generated embedding for query: '{query_text}'")
    print(f"Embedding dimension: {len(query_embedding)}")

‚úÖ Generated embedding for query: 'Explain how EventBridge works in LLM Workflow context'
Embedding dimension: 1024


## Step 2: Similarity Search with Latency Reporting
Perform a search against the S3 Vector Index and measure latency.

In [80]:
def search_vector_store(query_vector, bucket_name, index_name):
    """
    Performs a similarity search using the s3vectors client and returns results with latency.
    """
    if not s3_vectors_client:
        print("‚ùå S3 Vectors client is not initialized.")
        return [], 0

    if not bucket_name or not index_name:
        print("‚ö†Ô∏è S3_VECTOR_BUCKET_NAME or S3_VECTOR_INDEX_NAME is not set. Skipping search.")
        return [], 0

    print(f"üîç Searching in Vector Bucket: {bucket_name}, Index: {index_name}...")
    
    start_time = time.time()
    try:
        response = s3_vectors_client.query_vectors(
            vectorBucketName=bucket_name,
            indexName=index_name,
            queryVector={"float32": query_vector},
            topK=5,
            returnDistance=True,
            returnMetadata=True
        )
        end_time = time.time()
        latency = end_time - start_time
        
        vectors = response.get('vectors', [])
        print(f"‚úÖ Search Results found in {latency:.4f} seconds.")      
        results = []
        for result in vectors:
            metadata = result.get('metadata', {})
            chunk_text = metadata.get('chunk', '')
            results.append(chunk_text)
            
        return results, latency
            
    except Exception as e:
        print(f"‚ùå Error querying S3 Vectors: {e}")
        return [], 0

# Execute Search
retrieved_contexts, search_latency = search_vector_store(query_embedding, s3_vector_bucket_name, s3_vector_index_name)

üîç Searching in Vector Bucket: bucket-vector-s3-lambda, Index: documents...
‚úÖ Search Results found in 0.8367 seconds.


## Step 3: Generate Answer with Latency Reporting
Use the retrieved context to generate an answer using the Bedrock model, ensuring no hallucinations.

In [87]:
def generate_answer(query, contexts, model_id=bedrock_model_id):
    if not contexts:
        print("‚ö†Ô∏è No contexts retrieved. Skipping generation.")
        return

    # Construct Context String
    context_str = "\n\n".join(contexts)
    
    # Construct Prompt (Llama 3 Instruct Format)
    prompt_template = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant. Answer the user's question using ONLY the context provided below. 
If the answer is not in the context, say "I don't know" or "The provided context does not contain the answer."
Do not hallucinate or use outside knowledge.

Context:
{context_str}
<|eot_id|><|start_header_id|>user<|end_header_id|>
{query}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

    body = json.dumps({
        "prompt": prompt_template,
        "temperature": 0.1, # Low temperature to reduce hallucination
        "top_p": 0.9
    })

    print(f"ü§ñ Generating answer using {model_id}...")
    
    start_time = time.time()
    try:
        response = bedrock_client.invoke_model(
            body=body,
            modelId=model_id,
            accept="application/json",
            contentType="application/json"
        )
        end_time = time.time()
        latency = end_time - start_time
        
        response_body = json.loads(response.get("body").read())
        generation = response_body.get('generation')
        
        print(f"‚úÖ Answer Generated in {latency:.4f} seconds:\n")
        print(generation)
        return latency
        
    except Exception as e:
        print(f"‚ùå Error invoking Bedrock: {e}")
        return 0

# Execute Generation
generation_latency = generate_answer(query_text, retrieved_contexts)

print("\n--- Performance Report ---")
print(f"S3 Vector Search Latency: {search_latency:.4f}s")
print(f"LLM Generation Latency:   {generation_latency:.4f}s")
print(f"Total Latency:            {search_latency + generation_latency:.4f}s")

ü§ñ Generating answer using us.meta.llama3-1-8b-instruct-v1:0...
‚úÖ Answer Generated in 1.8723 seconds:



In the context of LLM (Large Language Model) workflow, Amazon EventBridge is not directly involved. Instead, the LLM classifies and interprets the user's intent through natural language, which is a different approach from traditional EventBridge rules-based routing.

However, the provided context does mention that traditional dynamic dispatch uses Amazon EventBridge rules for routing based on structured event attributes. This implies that EventBridge is used in a separate workflow, not in conjunction with LLM-based routing.

--- Performance Report ---
S3 Vector Search Latency: 0.8367s
LLM Generation Latency:   1.8723s
Total Latency:            2.7090s
