# CAIP Week 3 Call 2: AWS Bedrock API Integration and Error Handling

## Overview
This notebook covers hands-on implementation of AWS Bedrock API calls, handling throttling errors, implementing retry logic, and processing multiple requests efficiently. This builds on the conceptual foundation from Week 3 Call 1.

### Key Learning Objectives:
- Understand how to call Bedrock models programmatically
- Learn to handle ThrottlingException errors
- Implement exponential backoff retry logic
- Apply rate limiting between API calls
- Use prompt templates for batch processing
- Parse and extract responses from Bedrock API


## 1. Setting Up the Bedrock Runtime Client

### Key Point: Use the Runtime Client

**Critical distinction**: To invoke models for inference, you must use `boto3.client('bedrock-runtime')`, NOT `boto3.client('bedrock')`.

### Why Two Different Clients?
- **`bedrock` client**: For managing model resources, listing available models, configuring access
- **`bedrock-runtime` client**: For actually invoking models and getting predictions

### Example Setup:
```python
import boto3
import json

# Create the Bedrock runtime client
bedrock = boto3.client('bedrock-runtime')
```

**Note**: The runtime client is what you use to send prompts and receive responses from models.


## 2. Constructing the Request Body

### The `construct_body()` Function

The request body must be properly formatted for Claude models on Bedrock. Here's what it requires:

### Required Fields:
1. **`anthropic_version`**: Must be `"bedrock-2023-05-31"` for Claude models
2. **`max_tokens`**: Maximum number of tokens in the response (default: 2000)
3. **`messages`**: Array containing the user message with role and content

### Example Implementation:
```python
def construct_body(prompt: str, max_tokens: int = 2000) -> Dict[str, Any]:
    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [
            {
                "role": "user",
                "content": f"Human: {prompt}"
            }
        ]
    }
    return body
```

### Key Points:
- **anthropic_version**: This is a fixed string required by AWS Bedrock for Anthropic models
- **max_tokens**: Controls response length - remember you pay for every token!
- **messages**: The content follows Anthropic's format with `"Human: {prompt}"`


## 3. Calling Bedrock Models

### The `call_bedrock()` Function

This function handles the complete flow of calling a Bedrock model:

### Process Flow:
1. Construct the request body
2. Invoke the model with `invoke_model()`
3. Parse the JSON response
4. Extract text content from the response

### Example Implementation:
```python
def call_bedrock(
    bedrock_client: boto3.client, 
    prompt: str, 
    max_tokens: int = 2000, 
    modelId: str = 'anthropic.claude-3-sonnet-20240229-v1:0'
) -> List[str]:
    # Create the request body
    body = construct_body(prompt, max_tokens=max_tokens)

    # Invoke the model with the request body
    response = bedrock_client.invoke_model(
        body=json.dumps(body),  # Must be JSON string!
        modelId=modelId,
    )

    # Parse the JSON response body
    result = json.loads(response["body"].read())

    # Extract and return text content
    responses = [content["text"] for content in result["content"]]
    return responses
```

### Critical Details:
- **`body` must be JSON string**: Use `json.dumps(body)`, not a Python dict
- **Response parsing**: `response["body"].read()` returns bytes that must be parsed
- **Content extraction**: `result["content"]` is an array, each item has a `"text"` field
- **Default model**: Claude 3 Sonnet provides good balance of performance and cost


## 4. Understanding ThrottlingException

### What Happens When You Make Too Many Requests?

**ThrottlingException** occurs when you exceed AWS Bedrock's rate limits by making too many API calls in a short time period.

### Real-World Example from the Session:
- **What happened**: Processing 50 US states sequentially without delays
- **Result**: Successfully processed 47 states, then hit ThrottlingException on West Virginia (state #48)
- **Error message**: `"Too many requests, please wait before trying again"`

### Why Throttling Exists:
- **Protects system resources**: Prevents overload on AWS infrastructure
- **Ensures fair usage**: Prevents one user from monopolizing the service
- **Maintains service quality**: Ensures consistent performance for all users

### Key Insight:
> "Even though each individual request was legitimate, the volume of requests in a short time period exceeded the service's comfort threshold."

### Common Mistakes:
- Making requests in a loop without delays
- Not implementing retry logic
- Ignoring rate limit warnings
- Processing large batches without rate limiting


## 5. Implementing Exponential Backoff

### What is Exponential Backoff?

Exponential backoff is a retry strategy that waits progressively longer between retry attempts. The wait time increases exponentially with each attempt.

### Formula:
```python
wait_time = (2 ** attempt) + random.uniform(0, 1)
```

### Wait Times:
- **Attempt 1**: ~2-3 seconds
- **Attempt 2**: ~4-5 seconds  
- **Attempt 3**: ~8-9 seconds

### Why Add Random Jitter?

Random jitter prevents the **"thundering herd" problem** where multiple clients retry at exactly the same time, causing another stampede of requests.

### Example Implementation:
```python
import time
import random
from botocore.exceptions import ClientError

def call_bedrock_with_retry(bedrock_client, prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            return call_bedrock(bedrock_client, prompt)
        except ClientError as e:
            error_code = e.response['Error']['Code']
            if error_code == 'ThrottlingException':
                if attempt < max_retries - 1:  # Don't wait on last attempt
                    wait_time = (2 ** attempt) + random.uniform(0, 1)
                    print(f"Throttled! Waiting {wait_time:.1f} seconds...")
                    time.sleep(wait_time)
                else:
                    print(f"Max retries ({max_retries}) exceeded")
                    raise
            else:
                # Re-raise non-throttling errors immediately
                raise
    return None
```

### Key Best Practices:
1. **Only retry ThrottlingException**: Other errors (ValidationException, AccessDeniedException) won't succeed on retry
2. **Set reasonable max_retries**: 3-5 attempts is usually sufficient
3. **Add jitter**: Prevents synchronized retries from multiple clients
4. **Don't wait on last attempt**: No point waiting if you're about to give up


## 6. Rate Limiting Between Requests

### Why Add Delays?

Even with retry logic, it's better to prevent throttling in the first place by adding small delays between API calls.

### Simple Rate Limiting:
```python
import time

for i, state in enumerate(states):
    print(f"Processing {state}... ({i+1}/50)")
    
    # Call Bedrock
    responses = call_bedrock_with_retry(bedrock, current_prompt)
    
    # Add delay between requests (except after last one)
    if i < len(states) - 1:
        time.sleep(1)  # 1 second delay
```

### Why 1 Second?
- **Simple and effective**: Prevents rapid-fire requests
- **Not too slow**: 50 states × 1 second = ~50 seconds total delay
- **Flexible**: Can adjust based on your rate limits

### Advanced Rate Limiting:

For larger batches, consider longer pauses periodically:
```python
for i, state in enumerate(states):
    # Longer pause every 10 states
    if i > 0 and i % 10 == 0:
        time.sleep(5)  # 5 second pause
    
    responses = call_bedrock_with_retry(bedrock, current_prompt)
    time.sleep(1)  # Regular 1 second delay
```

### Key Takeaway:
> "Adding a small delay between API calls helps implement rate limiting, preventing too many requests from being sent in a short period."


## 7. Using Prompt Templates

### Why Use Templates?

When processing multiple similar requests, prompt templates allow you to:
- **Reuse the same prompt structure**
- **Substitute variables** for dynamic content
- **Maintain consistency** across all requests
- **Easier maintenance** - change template once, affects all uses

### Example Template:

**prompt_template.txt**:
```
Act as a flower expert. I need you to give me information on the state flower of {state_name}. The response should be in json and in this format:

{
"state": "state_name",
"flower": "flower_name",
"color": "flower_color"
}

Do not give any additional information even if I am wrong.
```

### Why Include Constraints and Format Requirements?

The template includes constraints like:
- **"The response should be in json"** - Specifies output format
- **"in this format"** - Shows exact JSON schema with field names
- **"Do not give any additional information"** - Prevents extra text that would break parsing

**Why this matters:**
- **Structured output**: Makes responses predictable and easy for programs to parse
- **Automated processing**: JSON responses can be directly parsed with `json.loads()`
- **Consistency**: All responses follow the same structure, making batch processing reliable
- **Error prevention**: Clear format requirements reduce parsing errors

Without constraints, the model might add explanatory text, use different field names, or return non-JSON responses, making automated processing difficult or impossible.

### Loading and Using Templates:
```python
# Read template from file
with open('prompt_template.txt', 'r') as file:
    prompt = file.read()

# Replace placeholder with actual value
current_prompt = prompt.replace("{state_name}", state)

# Use in API call
responses = call_bedrock_with_retry(bedrock, current_prompt)
```

### Why Store Templates in Separate Files?

Storing templates in `.txt` files (instead of hardcoding in Python) provides:
- **Easier updates**: Non-developers can modify prompts without touching code
- **Cleaner code**: Keeps Python scripts focused on logic, not prompt text
- **Version control**: Track prompt changes separately from code changes
- **Collaboration**: Different team members can work on prompts vs. code

### Benefits:
- **Consistency**: Same prompt structure for all states
- **Maintainability**: Update template once, affects all requests
- **Clarity**: Template file is easier to read and modify than code
- **Reusability**: Same template can be used for different datasets
- **Structured output**: Constraints ensure machine-readable responses


## 8. Parsing Bedrock Responses

### Response Structure

Bedrock API responses follow a specific structure that must be parsed correctly:

### Response Format:
```python
{
    "content": [
        {
            "type": "text",
            "text": "Actual response text here"
        }
    ],
    "stop_reason": "end_turn",
    "usage": {...}
}
```

### Extraction Process:
```python
# Parse the response
result = json.loads(response["body"].read())

# Extract text from each content block
responses = [content["text"] for content in result["content"]]
```

### Key Points:
- **`response["body"]`**: Returns a streaming body that must be read
- **`.read()`**: Reads the bytes from the streaming body
- **`json.loads()`**: Parses the JSON string into a Python dict
- **`result["content"]`**: Array of content blocks (usually one for text responses)
- **`content["text"]`**: The actual text response within each content block

### Handling JSON Responses:

If your prompt requests JSON output, you'll need to parse it:
```python
responses = call_bedrock_with_retry(bedrock, current_prompt)

for response in responses:
    # Parse JSON string into Python dict
    flower_data.append(json.loads(response))
```

### Error Handling:
Always wrap JSON parsing in try-except to handle malformed responses:
```python
try:
    flower_data.append(json.loads(response))
except json.JSONDecodeError as e:
    print(f"Failed to parse JSON: {e}")
    continue
```


## 9. Complete Workflow Example

### Processing Multiple Items with Error Handling

Here's a complete example combining all the concepts:

```python
from call_bedrock import call_bedrock
import boto3
import json
import time
import random
from botocore.exceptions import ClientError

# Create Bedrock client
bedrock = boto3.client('bedrock-runtime')

# Load prompt template
with open('prompt_template.txt', 'r') as file:
    prompt_template = file.read()

# List of items to process
states = ["Alabama", "Alaska", "Arizona", ...]  # All 50 states

flower_data = []

# Retry function with exponential backoff
def call_bedrock_with_retry(bedrock_client, prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            return call_bedrock(bedrock_client, prompt)
        except ClientError as e:
            if e.response['Error']['Code'] == 'ThrottlingException':
                if attempt < max_retries - 1:
                    wait_time = (2 ** attempt) + random.uniform(0, 1)
                    print(f"Throttled! Waiting {wait_time:.1f} seconds...")
                    time.sleep(wait_time)
                else:
                    raise
            else:
                raise
    return None

# Process each state
for i, state in enumerate(states):
    print(f"Processing {state}... ({i+1}/50)")
    
    # Replace placeholder in template
    current_prompt = prompt_template.replace("{state_name}", state)
    
    try:
        # Call with retry logic
        responses = call_bedrock_with_retry(bedrock, current_prompt)
        
        # Parse and store responses
        for response in responses:
            flower_data.append(json.loads(response))
        
        print(f"✓ Successfully processed {state}")
        
    except Exception as e:
        print(f"✗ Failed to process {state}: {str(e)}")
        continue
    
    # Rate limiting: delay between requests
    if i < len(states) - 1:
        time.sleep(1)

# Save results
with open('flower_data.json', 'w') as f:
    json.dump(flower_data, f, indent=4)

print(f"Done! Processed {len(flower_data)} items.")
```

### Key Components:
1. **Client setup**: Create bedrock-runtime client
2. **Template loading**: Read prompt template from file
3. **Retry logic**: Exponential backoff for throttling
4. **Rate limiting**: Delay between requests
5. **Error handling**: Try-except to continue on failures
6. **Response parsing**: Extract and store JSON responses


In [24]:
from call_bedrock import call_bedrock
import boto3
import json
import time
import random
from botocore.exceptions import ClientError

# Create Bedrock client
bedrock = boto3.client('bedrock-runtime')

# Load prompt template
with open('prompt_template.txt', 'r') as file:
    prompt_template = file.read()

# Retry function with exponential backoff
def call_bedrock_with_retry(bedrock_client, prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            return call_bedrock(bedrock_client, prompt)
        except ClientError as e:
            if e.response['Error']['Code'] == 'ThrottlingException':
                if attempt < max_retries - 1:
                    wait_time = (2 ** attempt) + random.uniform(0, 1)
                    print(f"Throttled! Waiting {wait_time:.1f} seconds...")
                    time.sleep(wait_time)
                else:
                    raise
            else:
                raise
    return None

state = "Georgia"

print(f"Processing {state}...")

# Replace placeholder in template
current_prompt = prompt_template.replace("{state_name}", state)

try:
    # Call with retry logic
    responses = call_bedrock_with_retry(bedrock, current_prompt)
    
    # Parse and store responses
    for response in responses:
        print(json.loads(response))
    
    print(f"✓ Successfully processed {state}")
    
except Exception as e:
    print(f"✗ Failed to process {state}: {str(e)}")



Processing Georgia...
{'state': 'Georgia', 'flower': 'Cherokee Rose', 'color': 'White, Gold'}
✓ Successfully processed Georgia


## 10. Best Practices Summary

### ✅ DO's:

1. **Always use `bedrock-runtime` client** for model invocation
2. **Convert body to JSON string** with `json.dumps()` before calling `invoke_model()`
3. **Implement retry logic** with exponential backoff for ThrottlingException
4. **Add rate limiting** between requests (e.g., `time.sleep(1)`)
5. **Use prompt templates** for batch processing
6. **Parse responses carefully** - remember to read the body and extract from content array
7. **Handle errors gracefully** - continue processing other items if one fails
8. **Add random jitter** to backoff times to prevent thundering herd
9. **Only retry ThrottlingException** - re-raise other errors immediately
10. **Monitor your usage** and adjust rate limiting based on your limits

### ❌ DON'Ts:

1. **Don't use `bedrock` client** for model invocation (use `bedrock-runtime`)
2. **Don't pass Python dict** directly to `invoke_model()` - must be JSON string
3. **Don't make rapid-fire requests** without delays
4. **Don't retry all errors** - only retry ThrottlingException
5. **Don't ignore throttling errors** - implement proper retry logic
6. **Don't forget to read response body** - use `.read()` before parsing
7. **Don't process large batches** without rate limiting
8. **Don't hardcode wait times** without jitter - causes synchronized retries
9. **Don't skip error handling** - one failure shouldn't stop the entire batch
10. **Don't assume responses are always valid JSON** - wrap parsing in try-except


## 11. Common Errors and Solutions

### Error: `ClientError: Invalid request body`
**Cause**: Passing Python dict instead of JSON string
**Solution**: Use `json.dumps(body)` before calling `invoke_model()`

### Error: `ThrottlingException: Too many requests`
**Cause**: Making too many requests too quickly
**Solution**: 
- Add `time.sleep(1)` between requests
- Implement exponential backoff retry logic
- Reduce batch size or increase delays

### Error: `AttributeError: 'StreamingBody' object has no attribute 'read'`
**Cause**: Trying to parse response body without reading it first
**Solution**: Use `response["body"].read()` before `json.loads()`

### Error: `KeyError: 'content'`
**Cause**: Response structure doesn't match expected format
**Solution**: Check response structure, may need to handle different response types

### Error: `JSONDecodeError` when parsing response
**Cause**: Model returned non-JSON text despite requesting JSON
**Solution**: 
- Wrap JSON parsing in try-except
- Improve prompt to be more explicit about JSON format
- Validate response before parsing


## 12. Key Takeaways

### Critical Concepts:

1. **Bedrock Runtime Client**: Use `boto3.client('bedrock-runtime')` for model invocation

2. **Request Body Format**: Must include `anthropic_version: "bedrock-2023-05-31"` and properly formatted messages

3. **JSON String Requirement**: Body must be `json.dumps(body)`, not a Python dictionary

4. **ThrottlingException**: Expected when making many requests - implement retry logic, don't panic

5. **Exponential Backoff**: `(2 ** attempt) + random.uniform(0, 1)` provides effective retry strategy

6. **Rate Limiting**: Always add delays between requests in loops (`time.sleep(1)`)

7. **Response Parsing**: `result["content"][0]["text"]` extracts text from Bedrock responses

8. **Error Handling**: Only retry ThrottlingException, re-raise other errors immediately

9. **Prompt Templates**: Use templates with placeholders for batch processing

10. **Production Readiness**: Always implement retry logic and rate limiting in production code

### The Real-World Lesson:

> "The throttling we encountered when processing 50 states is a perfect example of why you need to plan for rate limits and implement proper error handling. The script successfully processed 47 states before hitting the limit, demonstrating that even legitimate, well-intentioned usage can trigger throttling protection."

### Next Steps:
- Practice with your own batch processing tasks
- Experiment with different rate limiting strategies
- Build robust error handling into your API integrations
- Apply these patterns to other AWS services that may throttle
