# Vector Index and Knowledge Base Setup

This notebook performs the following crucial setup steps:

1. Creates a new vector index in the collection
2. Establishes a knowledge base in Amazon Bedrock
3. Sets up a data source within the knowledge base
4. Initiates an ingestion job to populate the knowledge base with data from the data source

## Purpose

These steps serve as essential prerequisites for the operations detailed in the `CustomerSupport` notebook. Completing this setup ensures that subsequent processes have the necessary data structures and content in place.

## Important Note

⚠️ Ensure this notebook is executed and all steps are completed successfully before proceeding to the `CustomerSupport` notebook. Failure to do so may result in errors or unexpected behavior in subsequent operations.

---

For detailed instructions on each step, refer to the code cells below.


In [6]:
import boto3
import os

client = boto3.client('s3')
response = client.list_buckets()
buckets = response['Buckets']
for bucket in buckets:
    if bucket['Name'].startswith('bedrock-workshop-'):
        bucket_name = bucket['Name']
        os.environ['S3_BUCKET'] = bucket_name
        print(bucket_name)

bedrock-workshop-111b4700


## Copy troubleshooting runbooks to S3 bucket

In [7]:
!echo $S3_BUCKET

bedrock-workshop-111b4700


In [8]:
!aws s3 cp troubleshooting_kb.txt s3://$S3_BUCKET/KB/dataset.txt 

upload: ./troubleshooting_kb.txt to s3://bedrock-workshop-111b4700/KB/dataset.txt


In [9]:
# Creating two variables to store the AWS region name and the OpenSearch endpoint (host) of the OpenSearch Serverless collection. You need to change it to the AWS region you are using. To obtain host variable value, in the Amazon OpenSearch Service console, select Collections in the navigation pane. The name of the collection for this workshop is bedrock-workshop-collection. Click on the collection name to view the details of the collection. Take note of the OpenSearch endpoint (host) of the collection.

region = 'us-east-1'
host = 'https://v1uhc2qhwwr7ng5pvj2h.us-east-1.aoss.amazonaws.com'

In [10]:
!pip install requests_aws4auth --upgrade



In [None]:
import boto3
import requests
from requests_aws4auth import AWS4Auth

service = 'aoss'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(
    credentials.access_key, 
    credentials.secret_key, 
    region, 
    service, 
    session_token=credentials.token
)

index = 'workshop-kb-index'
url = host + '/' + index

headers = {'Content-Type': 'application/json'}
document = {
   "settings": {
      "index.knn": "true",
       "number_of_shards": 1,
       "knn.algo_param.ef_search": 512,
       "number_of_replicas": 0,
   },
   "mappings": {
      "properties": {
         "vector": {
            "type": "knn_vector",
            "dimension": 1536,
             "method": {
                 "name": "hnsw",
                 "engine": "faiss",
                 "space_type": "l2"
             },
         },
         "text": {
            "type": "text"
         },
         "text-metadata": {
            "type": "text"
         }
      }
   }
}
response = requests.put(url, auth=awsauth, json=document, headers=headers)
response.raise_for_status()
print(response.json())

### Creating Vector Index

In [9]:
import boto3

# Model ARN
region  = 'us-east-1'
model_arn = f'arn:aws:bedrock:{region}::foundation-model/amazon.titan-embed-text-v1'
# IAM role ARN
sts = boto3.client('sts')
account = sts.get_caller_identity().get('Account')
role_arn = f'arn:aws:iam::{account}:role/AgentWorkshopStackKnowledgeBaseRole'
# OpenSearch Serverless Collection
oss = boto3.client('opensearchserverless')
oss_collection_name = 'bedrock-workshop-collection'
oss_index_name = 'workshop-kb-index'
oss_collection_arn = oss.list_collections(collectionFilters={'name': oss_collection_name}).get('collectionSummaries')[0]['arn']
oss_configuration = {
    "collectionArn": oss_collection_arn,
    "vectorIndexName": oss_index_name,
    "fieldMapping": {
        "vectorField": "vector",
        "textField": "text",
        "metadataField": "text-metadata"
    }
}

# Create Knowledge Base
kb_name = 'workshop-kb'
kb_desc = 'Bedrock workshop knowledge base.'
bedrock_agent_client = boto3.client('bedrock-agent')
create_kb_response = bedrock_agent_client.create_knowledge_base(
    name = kb_name,
    description = kb_desc,
    roleArn = role_arn,
    knowledgeBaseConfiguration = {
        "type": "VECTOR",
        "vectorKnowledgeBaseConfiguration": {
            "embeddingModelArn": model_arn
        }
    },
    storageConfiguration = {
        "type": "OPENSEARCH_SERVERLESS",
        "opensearchServerlessConfiguration":oss_configuration
    }
)

# Print out the Knowledge ID
kb_id = create_kb_response["knowledgeBase"]["knowledgeBaseId"]
print('Knowledge Base ID: {0}'.format(kb_id))

Knowledge Base ID: I9YWNTL4EI


In [None]:
def save_kb_id(kb_id):
    with open('kb_id.txt', 'w') as f:
        f.write(kb_id)
    print(f"Saved Knowledge Base ID to kb_id.txt")

In [None]:
# Save Knowledge Base ID
save_kb_id(kb_id)

## Creating Data source

In [10]:
import os
import boto3

# S3 Configuration
s3_prefix = "KB"
s3_configuration = {
    "bucketArn": f"arn:aws:s3:::{os.environ['S3_BUCKET']}",
    "inclusionPrefixes":[s3_prefix]
}
# Chunking Configuration
chunking_configuration = {
    "chunkingStrategy": "FIXED_SIZE",
    "fixedSizeChunkingConfiguration": {
        "maxTokens": 64, 
        "overlapPercentage": 20
    }
}

# Create Data Source
data_source_name = "kb-source"
data_source_desc = "knowledge base data source"
bedrock_agent_client = boto3.client('bedrock-agent')
create_ds_response = bedrock_agent_client.create_data_source(
    name = data_source_name,
    description = data_source_desc,
    knowledgeBaseId = kb_id,
    dataSourceConfiguration = {
        "type": "S3",
        "s3Configuration":s3_configuration
    },
    vectorIngestionConfiguration = {
        "chunkingConfiguration": chunking_configuration
    }
)

# Print out the data source ID
ds_id = create_ds_response["dataSource"]["dataSourceId"]
print('Data Source ID: {0}'.format(ds_id))

Data Source ID: TLHOH7CQXB


## Ingestion Job to ingest troubleshooting runbooks

In [12]:
import boto3
import time

bedrock_agent_client = boto3.client('bedrock-agent')
ingestion_job_response = bedrock_agent_client.start_ingestion_job(knowledgeBaseId = kb_id, dataSourceId = ds_id)
job = ingestion_job_response["ingestionJob"]

while(job['status']!='COMPLETE'):
    time.sleep(10)
    ingestion_job_response = bedrock_agent_client.get_ingestion_job(
        knowledgeBaseId = kb_id,
        dataSourceId = ds_id,
        ingestionJobId = job["ingestionJobId"])
    job = ingestion_job_response["ingestionJob"]
    print(job['status'])

COMPLETE


In [13]:
import boto3

query = "I'm getting authentication errors with the API. Can you help me troubleshoot?"
client = boto3.client('bedrock-agent-runtime')
documents = client.retrieve(
    retrievalQuery= {
        'text': query
    },
    knowledgeBaseId=kb_id,
    retrievalConfiguration= {
        'vectorSearchConfiguration': {
            'numberOfResults': 3
        }
    }
)
for item in documents["retrievalResults"]:
    print(item)
    print('')

{'content': {'text': 'Request a limit increase if needed Best Practice: Monitor your API usage trends and set up alerts before hitting limits.  Issue: API Authentication Errors Solution: Common authentication issues can be resolved by: 1. Verify API keys are valid and not expired 2.', 'type': 'TEXT'}, 'location': {'s3Location': {'uri': 's3://bedrock-workshop-5f387660/KB/dataset.txt'}, 'type': 'S3'}, 'metadata': {'x-amz-bedrock-kb-source-uri': 's3://bedrock-workshop-5f387660/KB/dataset.txt', 'x-amz-bedrock-kb-chunk-id': '1%3A0%3AOMNh0pcBWfWI1mlsDDL2', 'x-amz-bedrock-kb-data-source-id': 'TLHOH7CQXB'}, 'score': 0.5970031}

{'content': {'text': "Check if the API key has proper permissions 3. Ensure you're using HTTPS for all API calls 4. Validate the API endpoint region matches your configuration Best Practice: Rotate API keys regularly and never share them in code repositories.", 'type': 'TEXT'}, 'location': {'s3Location': {'uri': 's3://bedrock-workshop-5f387660/KB/dataset.txt'}, 'type': 