# Bedrock Knowledge Base via Your own Opensearch Database

If you have have any questions, please feel free to contact Hao Huang (tonyhh@amazon.com, GAIIC).


### Step 1. Prepare python environments.

In [1]:
!pip install opensearch-py
!pip install requests-aws4auth

Collecting opensearch-py
  Downloading opensearch_py-2.4.2-py2.py3-none-any.whl.metadata (6.8 kB)
Downloading opensearch_py-2.4.2-py2.py3-none-any.whl (258 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.6/258.6 kB[0m [31m563.5 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: opensearch-py
Successfully installed opensearch-py-2.4.2


### Step 2. Authentication.

In [2]:
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3


aos_ssl_client = boto3.client('opensearchserverless', 'us-east-1')
service = 'aoss'
region = 'us-east-1'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key,
                   region, service, session_token=credentials.token)

### Step 3. Build Amazon Opensearch Serveless Vector Database

In [None]:
# Create Encryption Policy
response = aos_ssl_client.create_security_policy(
    description='Encryption policy for bedrock-knowledge-* collections',
    name='bedrock-policy-test-v1',
    policy="""
        {
            \"Rules\":[
                {
                    \"ResourceType\":\"collection\",
                    \"Resource\":[
                        \"collection\/bedrock-knowledge-*\"
                    ]
                }
            ],
            \"AWSOwnedKey\":true
        }
        """,
    type='encryption'
)
print('\nEncryption policy created:')
print(response)

In [None]:
# Create Network Policy
response = aos_ssl_client.create_security_policy(
    description='Network policy for bedrock collections',
    name='bedrock-policy-test-v1',
    policy="""
            [{
                \"Description\":\"Public access for bedrock-policy-test collection\",
                \"Rules\":[
                    {
                        \"ResourceType\":\"dashboard\",
                        \"Resource\":[\"collection\/bedrock-knowledge-*\"]
                    },
                    {
                        \"ResourceType\":\"collection\",
                        \"Resource\":[\"collection\/bedrock-knowledge-*\"]
                    }
                ],
                \"AllowFromPublic\":true
            }]
            """,
    type='network'
)
print('\nNetwork policy created:')
print(response)

In [None]:
# Create Data Access Policy
response = aos_ssl_client.create_access_policy(
    description='Data access policy for bedrock-policy-test collections',
    name='bedrock-policy-test-v1',
    policy="""
        [{
            \"Rules\":[
                {
                    \"Resource\":[
                        \"index\/bedrock-knowledge-*\/*\"
                    ],
                    \"Permission\":[
                        \"aoss:CreateIndex\",
                        \"aoss:DeleteIndex\",
                        \"aoss:UpdateIndex\",
                        \"aoss:DescribeIndex\",
                        \"aoss:ReadDocument\",
                        \"aoss:WriteDocument\"
                    ],
                    \"ResourceType\": \"index\"
                },
                {
                    \"Resource\":[
                        \"collection\/bedrock-knowledge-*\"
                    ],
                    \"Permission\":[
                        \"aoss:CreateCollectionItems\",
                        \"aoss:DeleteCollectionItems\",
                        \"aoss:UpdateCollectionItems\",
                        \"aoss:DescribeCollectionItems\"
                    ],
                    \"ResourceType\": \"collection\"
                }
            ],
            \"Principal\":[
                \"arn:aws:iam::{your-account-id}:role\/Admin"
                \"arn:aws:iam::{your-account-id}:user\/rag-dxq"
            ]
        }]
        """,
    type='data'
)
print('\nAccess policy created:')
print(response)

In [None]:
# Create Collection
response = aos_ssl_client.create_collection(
    name="bedrock-knowledge-base-invoice",
    type='VECTORSEARCH'
)
print(response)

### Step.4 Insert Knowledge to AOS

- Create Index via OpenSearch Web
    - click `Serverless->Collections->bedrock-knowledge-base-test-v1`
    - click `Create vector index` in the top of web
    - Vector index name -> "bedrock-test-index-search-v1" You can set your own index name here
    - create
        - Vector fields
            - "bedrock-knowledge-base-default-vector" `float` `Vector fields` `1536 Dimensions and Cosine Distance type for Titan Embedding model`
        - Metadata management 
            - "id" string `Filterable=True`
            - "AMAZON_BEDROCK_METADATA" string  `Filterable=False`
            - "AMAZON_BEDROCK_TEXT_CHUNK": string `Filterable=True`

In [None]:
# Get Collection Client
response = aos_ssl_client.batch_get_collection(names=["bedrock-knowledge-base-invoice"])
host = (response['collectionDetails'][0]['collectionEndpoint'])
final_host = host.replace("https://", "")
print(final_host)
client = OpenSearch(
    hosts=[{'host': final_host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=300
)

Using the following code to insert knowledge to AOS.

Please note:
 - We need have four fields: `vector`, `text`, `id` and `metadata`;
 - the matadata must follow the format : `{"source":"s3_path"}`. Bedrock Knowledge Base Test will give the reference (link to s3 path). If you don't follow the format, the web will raise Error.
 - You can use `your own chunking strategy` here.

In [None]:
# Insert Knowledge

import json
import boto3
def create_vector_embedding_with_bedrock(text, s3_path, embedding_modelId='amazon.titan-embed-text-v1'):
    brt = boto3.client(service_name='bedrock-runtime', region_name='us-east-1')
    body = json.dumps({
        "inputText": text
    })

    accept = 'application/json'
    contentType = 'application/json'
    response = brt.invoke_model(body=body, modelId=embedding_modelId, accept=accept, contentType=contentType)
    response_body = json.loads(response.get('body').read())
    embedding = response_body['embedding']
    info = {
        "AMAZON_BEDROCK_METADATA": '{{"source":"{}"}}'.format(s3_path), 
        "AMAZON_BEDROCK_TEXT_CHUNK": text, 
        "bedrock-knowledge-base-default-vector": embedding, 
        "id": "0"
        }
    return info

insert_body = create_vector_embedding_with_bedrock(
    text="使用单位线上申请是否一定和线下资料申请时使用单位 保持一致?如果想添加使用单位后续如何添加申请?",
    s3_path="s3://test"
    )

print(insert_body)
# Add a document to the index.
response = client.index(
    index='bedrock-knowledge-base-invoice',
    body=insert_body,
)
print('\nDocument added:')
print(response)

### Step.5 Create Bedrock Knowledge Base

Now, we can go to Bedrock page to create our Knowledge Base chatbot. To create knowledge base, we need to go though 4 steps:
 - 1. Provide knowledge base details
    - you need to provide your own `Knowledge base name` and `Knowledge base description`
    - you need to select or create a IAM service role. Please make sure the role have AOS Access permission. (I give FullAccess)
- 2.Set up data source
    - you need to provide Data source name.
    - you need to give a s3 path (path your file stored)
- 3. Configure vector store
    - select `Choose a vector store you have created`
    - select `Vector engine for Amazon OpenSearch Serverless`
    - provide the `Collection ARN`, `Vector index name`, `Vector field`, `Text field`, `Bedrock-managed metadata field`. If you follow this tutorial, the corresponding info is `Your ARN`, `bedrock-knowledge-base-test-v1`, `bedrock-knowledge-base-default-vector`, `AMAZON_BEDROCK_TEXT_CHUNK`, `AMAZON_BEDROCK_METADATA`.
- 4. Review and create
    - **Before you finally create the knowledge, please add the Knowledge Base service role to Your AOS data access policy, ensuring the Knowledge Base can access your AOS**
    - create the knowledge base!


### (Optional) Step 6. Test your Knowledge Base

You need to first sync data then test your knowledge base. Please note that if your s3 path has file, it will be automaticlly split to chunks and insert to vector database. So, if you don't want to repeatly insert same knowledge (or use you own chunking strategy), please provide a empty s3 folder.
