# Part 4: RAG with ColPali using Inference API (Amazon Bedrock)

**Important Notice**: This notebook provides a guide to implement Retrieval-Augmented Generation (RAG) with ColPali using Elastic's Inference API integrated with Amazon Bedrock. The focus is on creating an Inference Endpoint for the Completion Task and performing natural language search and response generation on the RVL-CDIP dataset. This approach directly calls the Inference API using a Python client, bypassing the need for MCP Server setup.

Follow the step-by-step guide below to configure the Inference Endpoint with Amazon Bedrock (using Claude 3.5 Sonnet), connect to Elastic Cloud, and perform a natural language search demo. This demo showcases the RAG approach, where Elasticsearch retrieves relevant documents using KNN+Rescore search (Retrieval) and the LLM in Amazon Bedrock generates responses based on the search results (Generation). Additionally, a Streamlit app is provided to create a visual demo.

**Objective**: Set up an Inference Endpoint with Amazon Bedrock, connect it to Elastic Cloud, and implement a natural language search demo for the RVL-CDIP dataset in the ColPali project.

**[EN]** Load environment variables from `elastic.env` and `aws.env` to retrieve connection details for Elastic Cloud and Amazon Bedrock. If `aws.env` does not exist, create it with necessary credentials.<br>
**[KR]** `elastic.env`와 `aws.env` 파일에서 환경 변수를 로드하여 Elastic Cloud와 Amazon Bedrock 연결 정보를 가져옵니다. `aws.env` 파일이 없으면 필요한 자격 증명으로 생성합니다.

In [None]:
from dotenv import load_dotenv, set_key
import os
from pathlib import Path

# Load environment variables from elastic.env and aws.env
elastic_dotenv_path = 'elastic.env'
aws_dotenv_path = 'aws.env'
load_dotenv(dotenv_path=elastic_dotenv_path)
load_dotenv(dotenv_path=aws_dotenv_path, override=True)

# Retrieve Elastic Cloud connection details
ELASTIC_HOST = os.getenv("ELASTIC_HOST", os.getenv("ES_URL", ""))
ELASTIC_API_KEY = os.getenv("ELASTIC_API_KEY", os.getenv("ES_API_KEY", ""))

if not ELASTIC_HOST or not ELASTIC_API_KEY:
    raise ValueError(f"Please create an '{elastic_dotenv_path}' file and set ELASTIC_HOST and ELASTIC_API_KEY variables.")

# Check if aws.env exists, if not create it with default placeholders
aws_env_file = Path(aws_dotenv_path)
if not aws_env_file.exists():
    aws_env_file.touch(mode=0o600, exist_ok=False)
    set_key(dotenv_path=aws_dotenv_path, key_to_set="AWS_ACCESS_KEY", value_to_set="<your-aws-access-key>")
    set_key(dotenv_path=aws_dotenv_path, key_to_set="AWS_SECRET_KEY", value_to_set="<your-aws-secret-key>")
    set_key(dotenv_path=aws_dotenv_path, key_to_set="AWS_REGION", value_to_set="ap-northeast-2")
    print(f"Created '{aws_dotenv_path}' with placeholder values. Please update it with your actual AWS credentials.")
else:
    print(f"'{aws_dotenv_path}' already exists. Loading values from it.")

# Retrieve Amazon Bedrock credentials and region
AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY", "")
AWS_SECRET_KEY = os.getenv("AWS_SECRET_KEY", "")
AWS_REGION = os.getenv("AWS_REGION", "ap-northeast-2")

if not AWS_ACCESS_KEY or not AWS_SECRET_KEY or AWS_ACCESS_KEY == "<your-aws-access-key>" or AWS_SECRET_KEY == "<your-aws-secret-key>":
    raise ValueError(f"Please update '{aws_dotenv_path}' with valid AWS_ACCESS_KEY and AWS_SECRET_KEY values.")

print(f"Elastic Host loaded: {ELASTIC_HOST[:20]}... (partially hidden for security)")
print(f"Elastic API Key loaded: {ELASTIC_API_KEY[:5]}... (partially hidden for security)")
print(f"AWS Access Key loaded: {AWS_ACCESS_KEY[:5]}... (partially hidden for security)")
print(f"AWS Secret Key loaded: {AWS_SECRET_KEY[:5]}... (partially hidden for security)")
print(f"AWS Region loaded: {AWS_REGION}")

**[EN]** Connect to Elastic Cloud using the loaded credentials to interact with Elasticsearch.<br>
**[KR]** 로드된 자격 증명을 사용하여 Elastic Cloud에 연결하여 Elasticsearch와 상호작용합니다.

In [None]:
from elasticsearch import Elasticsearch

# Connect to Elastic Cloud
if ":" in ELASTIC_HOST and not ELASTIC_HOST.startswith("http"):
    es = Elasticsearch(cloud_id=ELASTIC_HOST, api_key=ELASTIC_API_KEY)
else:
    es = Elasticsearch(hosts=[ELASTIC_HOST], api_key=ELASTIC_API_KEY)

print(f"Connected to Elasticsearch version: {es.info()['version']['number']}")

**[EN]** Create an Inference Endpoint for Completion Task using Amazon Bedrock with Claude 3.5 Sonnet model.<br>
**[KR]** Claude 3.5 Sonnet 모델을 사용한 Amazon Bedrock으로 Completion Task를 위한 Inference Endpoint를 생성합니다.

In [None]:
# Step 1: Create Inference Endpoint for Completion Task
inference_id = "amazon_bedrock_completion"
task_type = "completion"
service = "amazonbedrock"
provider = "anthropic"
model = "anthropic.claude-3-5-sonnet-20240620-v1:0"

try:
    response = es.inference.put(
        task_type=task_type,
        inference_id=inference_id,
        inference_config={
            "service": service,
            "service_settings": {
                "access_key": AWS_ACCESS_KEY,
                "secret_key": AWS_SECRET_KEY,
                "region": AWS_REGION,
                "provider": provider,
                "model": model
            }
        }
    )
    print(f"Inference Endpoint created: {response}")
except Exception as e:
    print(f"Error creating Inference Endpoint: {e}")
    print("Ensure your Elasticsearch version supports Amazon Bedrock integration (8.12.0 or higher).")

**[EN]** Perform a natural language search on RVL-CDIP dataset using KNN+Rescore and generate a response using the Inference API with Amazon Bedrock.<br>
**[KR]** RVL-CDIP 데이터셋에 대해 KNN+Rescore를 사용하여 자연어 검색을 수행하고 Amazon Bedrock과 함께 Inference API를 사용하여 응답을 생성합니다.

In [None]:
# Step 2: Perform RAG with Natural Language Search (KNN+Rescore) and Response Generation
index_name = "colqwen-rvlcdip-demo-part2-original"
query_text = "Show me invoices with handwritten notes from the RVL-CDIP dataset."

try:
    # Search for relevant documents in RVL-CDIP dataset using KNN+Rescore
    search_body = {
        "query": {
            "script_score": {
                "query": {
                    "match": {
                        "content": query_text
                    }
                },
                "script": {
                    "source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
                    "params": {
                        "query_vector": [0.1, 0.2, 0.3]  # Placeholder vector, replace with actual query embedding if available
                    }
                }
            }
        },
        "rescore": {
            "window_size": 10,
            "query": {
                "rescore_query": {
                    "knn": {
                        "field": "embedding",
                        "query_vector": [0.1, 0.2, 0.3],  # Placeholder vector, replace with actual query embedding if available
                        "k": 10,
                        "num_candidates": 100
                    }
                },
                "query_weight": 0.5,
                "rescore_query_weight": 0.5
            }
        },
        "size": 5
    }
    search_response = es.search(index=index_name, body=search_body)
    retrieved_docs = [hit["_source"].get("content", "") for hit in search_response["hits"]["hits"]]
    context = "\n".join(retrieved_docs) if retrieved_docs else "No relevant documents found."
    
    print(f"Retrieved {len(retrieved_docs)} documents from {index_name} using KNN+Rescore.")
    print(f"Context for LLM: {context[:200]}... (partially shown for brevity)")
    
    # Call Inference API for Completion Task with Amazon Bedrock
    inference_body = {
        "input": f"User Query: {query_text}\nContext: {context}\nAnswer based on the context."
    }
    inference_response = es.inference.completion(inference_id="amazon_bedrock_completion", body=inference_body)
    result = inference_response.get("completion", [{}])[0].get("result", "No response generated.")
    
    print(f"LLM Response from Amazon Bedrock (Claude 3.5 Sonnet):\n{result}")
except Exception as e:
    print(f"Error during search or inference: {e}")
    print("Ensure the Inference Endpoint 'amazon_bedrock_completion' is created and active.")

**[EN]** Create a Streamlit app to provide a visual demo for natural language search and response generation using Inference API.<br>
**[KR]** Inference API를 사용하여 자연어 검색 및 응답 생성을 위한 시각적 데모를 제공하는 Streamlit 앱을 생성합니다.

In [None]:
# Step 3: Create a Streamlit App for Visual Demo
streamlit_app_code = '''
import streamlit as st
from elasticsearch import Elasticsearch
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv('elastic.env')
load_dotenv('aws.env', override=True)

# Retrieve Elastic Cloud connection details
ELASTIC_HOST = os.getenv("ELASTIC_HOST", os.getenv("ES_URL", ""))
ELASTIC_API_KEY = os.getenv("ELASTIC_API_KEY", os.getenv("ES_API_KEY", ""))

if not ELASTIC_HOST or not ELASTIC_API_KEY:
    st.error("Elastic Cloud credentials not found. Please create 'elastic.env' with ELASTIC_HOST and ELASTIC_API_KEY.")
    st.stop()

# Connect to Elastic Cloud
if ":" in ELASTIC_HOST and not ELASTIC_HOST.startswith("http"):
    es = Elasticsearch(cloud_id=ELASTIC_HOST, api_key=ELASTIC_API_KEY)
else:
    es = Elasticsearch(hosts=[ELASTIC_HOST], api_key=ELASTIC_API_KEY)

if not es.ping():
    st.error("Failed to connect to Elastic Cloud. Please check your credentials.")
    st.stop()

st.title("ColPali RAG Demo with Inference API (Amazon Bedrock)")
st.write("Enter a natural language query to search the RVL-CDIP dataset and generate a response using Amazon Bedrock.")

query_text = st.text_input("Enter your search query:", "Show me invoices with handwritten notes from the RVL-CDIP dataset.")
index_name = "colqwen-rvlcdip-demo-part2-original"

if st.button("Search and Generate Response"):
    try:
        # Search for relevant documents in RVL-CDIP dataset using KNN+Rescore
        search_body = {
            "query": {
                "script_score": {
                    "query": {
                        "match": {
                            "content": query_text
                        }
                    },
                    "script": {
                        "source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
                        "params": {
                            "query_vector": [0.1, 0.2, 0.3]  # Placeholder vector, replace with actual query embedding if available
                        }
                    }
                }
            },
            "rescore": {
                "window_size": 10,
                "query": {
                    "rescore_query": {
                        "knn": {
                            "field": "embedding",
                            "query_vector": [0.1, 0.2, 0.3],  # Placeholder vector, replace with actual query embedding if available
                            "k": 10,
                            "num_candidates": 100
                        }
                    },
                    "query_weight": 0.5,
                    "rescore_query_weight": 0.5
                }
            },
            "size": 5
        }
        search_response = es.search(index=index_name, body=search_body)
        retrieved_docs = [hit["_source"].get("content", "") for hit in search_response["hits"]["hits"]]
        context = "\n".join(retrieved_docs) if retrieved_docs else "No relevant documents found."
        
        st.write(f"Retrieved {len(retrieved_docs)} documents from {index_name} using KNN+Rescore.")
        st.write("Context for LLM (partially shown for brevity):")
        st.text(context[:500] + "..." if len(context) > 500 else context)
        
        # Call Inference API for Completion Task with Amazon Bedrock
        inference_body = {
            "input": f"User Query: {query_text}\nContext: {context}\nAnswer based on the context."
        }
        inference_response = es.inference.completion(inference_id="amazon_bedrock_completion", body=inference_body)
        result = inference_response.get("completion", [{}])[0].get("result", "No response generated.")
        
        st.write("LLM Response from Amazon Bedrock (Claude 3.5 Sonnet):")
        st.text(result)
    except Exception as e:
        st.write(f"Error during search or inference: {e}")
        st.write("Ensure the Inference Endpoint 'amazon_bedrock_completion' is created and active.")
'''

# Write the Streamlit app code to a file
with open("colpali_rag_demo.py", "w") as f:
    f.write(streamlit_app_code)

print("Streamlit app code saved as 'colpali_rag_demo.py'.")
print("To run the app, execute the following command in your terminal:")
print("  streamlit run colpali_rag_demo.py")
print("Ensure you have Streamlit installed. If not, install it with: pip install streamlit")