# Biomarker Database Analyst Agent
In this notebook we create the biomarker database analyst sub-agent

#### Ensure the latest version of boto3 is shown below

##### If not then run through setup_environment.ipynb in the 0-Notebook-environment/ folder

In [None]:
!pip freeze | grep boto3

#### Load in environment variables to notebook

In [None]:
# Retrieve import path
%store -r IMPORTS_PATH

# Retrieve account info
%store -r account_id
%store -r region

# Retrieve model lists
%store -r agent_foundation_model

#### Retrieve imports environment variable and bring libraries into notebook

In [None]:
%run $IMPORTS_PATH

# Prerequisites

This notebook assumes that you have deployed the CloudFormation stack located at https://github.com/aws-samples/amazon-bedrock-agents-cancer-biomarker-discovery to your AWS account in workshop mode.

# Agent Creation
In this section we create the sub-agent

#### Define agent configuration

In [None]:
agent_name = 'Biomarker-database-analyst'
agent_description = "biomarker query engine with redshift"
agent_instruction = """
You are a medical research assistant AI specialized in generating SQL queries for a 
database containing medical biomarker information. Your primary task is to interpret user queries, 
generate appropriate SQL queries, and provide relevant medical insights based on the data. 
Use only the appropriate tools as required by the specific question. Follow these instructions carefully: 
1. Before generating any SQL query, use the /getschema tool to familiarize yourself with the database structure. 
This will ensure your queries are correctly formatted and target the appropriate columns. 
2. When generating an SQL query: a. Write the query as a single line, removing all newline ("\n") characters. 
b. Column names should remain consistent, do not modify the column names in the generated SQL query. 
3. Before execution of a step, a. Evaluate the SQL query with the rationale of the specific step by 
using the /refinesql tool. Provide both the SQL query and a brief rationale for the specific step you're taking. 
Do not share the original user question with the tool. b. Only proceed to execute the query using the /queryredshift 
tool after receiving the evaluated and potentially optimized version from the /refinesql tool. 
c. If there is an explicit need for retrieving all the data in S3, avoid optimized query recommendations that 
aggregate the data. 4. When providing your response: a. Start with a brief summary of your understanding of 
the user's query. b. Explain the steps you're taking to address the query. c. Ask for clarifications from the 
user if required."""

#### Instantiate agent with the desired configuration

In [None]:
agents = AgentsForAmazonBedrock()

redshift_agent = agents.create_agent(
    agent_name,
    agent_description,
    agent_instruction,
    agent_foundation_model,
    code_interpretation=False,
    verbose=False
)

redshift_agent

#### Extract useful agent information

In [None]:
redshift_agent_id = redshift_agent[0]
redshift_agent_arn = f"arn:aws:bedrock:{region}:{account_id}:agent/{redshift_agent_id}"

redshift_agent_id, redshift_agent_arn

#### Define the API Schema needed for an ActionGroup

In [None]:
api_schema_string = '''{
  "openapi": "3.0.1",
  "info": {
    "title": "Database schema look up and query APIs",
    "version": "1.0.0",
    "description": "APIs for looking up database table schemas and making queries to database tables."
  },
  "paths": {
    "/getschema": {
      "get": {
        "summary": "Get a list of all columns in the redshift database",
        "description": "Get the list of all columns in the redshift database table. Return all the column information in database table.",
        "operationId": "getschema",
        "responses": {
          "200": {
            "description": "Gets the list of table names and their schemas in the database",
            "content": {
              "application/json": {
                "schema": {
                  "type": "array",
                  "items": {
                    "type": "object",
                    "properties": {
                      "Table": {
                        "type": "string",
                        "description": "The name of the table in the database."
                      },
                      "Schema": {
                        "type": "string",
                        "description": "The schema of the table in the database. Contains all columns needed for making queries."
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "/queryredshift": {
      "get": {
        "summary": "API to send query to the redshift database table",
        "description": "Send a query to the database table to retrieve information pertaining to the users question. The API takes in only one SQL query at a time, sends the SQL statement and returns the query results from the table. This API should be called for each SQL query to a database table.",
        "operationId": "queryredshift",
        "parameters": [
          {
            "name": "query",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "SQL statement to query database table."
          }
        ],
        "responses": {
          "200": {
            "description": "Query sent successfully",
            "content": {
              "application/json": {
                "schema": {
                  "type": "object",
                  "properties": {
                    "responseBody": {
                      "type": "string",
                      "description": "The query response from the database."
                    }
                  }
                }
              }
            }
          },
          "400": {
            "description": "Bad request. One or more required fields are missing or invalid."
          }
        }
      }
    },
    "/refinesql": {
      "get": {
        "summary": "Evaluate SQL query efficiency",
        "description": "Evaluate the efficiency of an SQL query based on the provided schema, query, and question.",
        "operationId": "refinesql",
        "parameters": [
          {
            "name": "sql",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "The SQL query to evaluate."
          },
          {
            "name": "question",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "The question related to the rationale of the specific step."
          }
        ],
        "responses": {
          "200": {
            "description": "Successful response",
            "content": {
              "application/json": {
                "schema": {
                  "type": "object",
                  "properties": {
                    "evaluatedQuery": {
                      "type": "string",
                      "description": "The evaluated SQL query, or the original query if it is efficient."
                    }
                  }
                }
              }
            }
          },
          "400": {
            "description": "Bad request. One or more required fields are missing or invalid."
          }
        }
      }
    }
  }
}
'''

In [None]:
api_schema = {"payload": api_schema_string}

#### Attach Lambda function and create ActionGroup
Note: This uses the default Lambda function name "biomarker-agent-env1", this could be different in your account so double-check that this function exists and if not change the lambda_function_name in the code below

In [None]:
redshift_lambda_function_name = "biomarker-agent-env1"  # Change if different in your account
redshift_lambda_function_arn = f"arn:aws:lambda:{region}:{account_id}:function:{redshift_lambda_function_name}"
%store redshift_lambda_function_arn

In [None]:
agents.add_action_group_with_lambda(
    agent_name=agent_name,
    lambda_function_name=redshift_lambda_function_name,
    source_code_file=f"arn:aws:lambda:{region}:{account_id}:function:{redshift_lambda_function_name}",
    agent_action_group_name="sqlActionGroup",
    agent_action_group_description="Action for getting the database schema and querying the database",
    api_schema=api_schema,
    verbose=True
)

#### Add resource based policy to Lambda function to allow agent to invoke

In [None]:
lambda_client = boto3.client('lambda', region)

# Define the resource policy statement
policy_statement = {
    "Sid": "AllowBedrockAgentAccess",
    "Effect": "Allow",
    "Principal": {
        "Service": "bedrock.amazonaws.com"
    },
    "Action": "lambda:InvokeFunction",
    "Resource": redshift_lambda_function_arn,
    "Condition": {
        "ArnEquals": {
            "aws:SourceArn": redshift_agent_arn
        }
    }
}

try:
    # Get the current policy
    response = lambda_client.get_policy(FunctionName=redshift_lambda_function_arn)
    current_policy = json.loads(response['Policy'])
    
    # Add the new statement to the existing policy
    current_policy['Statement'].append(policy_statement)
    
except lambda_client.exceptions.ResourceNotFoundException:
    # If there's no existing policy, create a new one
    current_policy = {
        "Version": "2012-10-17",
        "Statement": [policy_statement]
    }

# Convert the policy to JSON string
updated_policy = json.dumps(current_policy)

# Add or update the resource policy
response = lambda_client.add_permission(
    FunctionName=redshift_lambda_function_arn,
    StatementId="AllowRedshiftAgentAccess",
    Action="lambda:InvokeFunction",
    Principal="bedrock.amazonaws.com",
    SourceArn=redshift_agent_arn
)

print("Resource policy added successfully.")
print("Response:", response)

#### Invoke Redshift Agent Test Alias to see that it answers question properly

In [None]:
%%time

bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region)

session_id:str = str(uuid.uuid1())

query = "How many patients are current smokers?"
response = bedrock_agent_runtime_client.invoke_agent(
      inputText=query,
      agentId=redshift_agent_id,
      agentAliasId="TSTALIASID", 
      sessionId=session_id,
      enableTrace=True, 
      endSession=False,
      sessionState={}
)

print("Request sent to Agent:\n{}".format(response))
print("====================")
print("Agent processing query now")
print("====================")

# Initialize an empty string to store the answer
answer = ""

# Iterate through the event stream
for event in response['completion']:
    # Check if the event is a 'chunk' event
    if 'chunk' in event:
        chunk_obj = event['chunk']
        if 'bytes' in chunk_obj:
            # Decode the bytes and append to the answer
            chunk_data = chunk_obj['bytes'].decode('utf-8')
            answer += chunk_data

# Now 'answer' contains the full response from the agent
print("Agent Answer: {}".format(answer))
print("====================")

#### Now that agent has been tested via direct invoke, prepare it by creating an alias

In [None]:
redshift_agent_alias_id, redshift_agent_alias_arn = agents.create_agent_alias(
    redshift_agent[0], 'v1'
)

%store redshift_agent_alias_arn
redshift_agent_alias_id, redshift_agent_alias_arn