<a href="https://colab.research.google.com/github/yongsa-nut/TU_CN409_GenAI_67_2/blob/main/SSAI_S5_LLM_demos.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tokenizer Demos

In [None]:
!pip install transformers tiktoken

In [None]:
# Example adapted from Jay Alammar (https://www.youtube.com/watch?v=rT6wVLEDC_w)

from transformers import AutoTokenizer
import tiktoken

def show_tokens(text, model_name, print_token_ids = False):
    # List of OpenAI models
    openai_models = [
        "gpt-4", "gpt-3.5-turbo", "text-davinci-003", "text-davinci-002",
        "code-davinci-002", "code-davinci-001", "davinci", "curie", "babbage", "ada"
    ]

    if model_name.lower() in [model.lower() for model in openai_models]:
        # Use tiktoken for OpenAI models
        tokenizer = tiktoken.encoding_for_model(model_name)
        token_ids = tokenizer.encode(text)
    else:
         tokenizer = AutoTokenizer.from_pretrained(model_name)
         token_ids = tokenizer.encode(text)

    for t in token_ids:
      if print_token_ids:
        print(t, end =" ")
      print( '\x1b[0;30;47m' + tokenizer.decode([t]) + '\x1b[0m', end = ' ')

    print('\n\n')

    for t in token_ids:
      print(t, '\x1b[0;30;47m' + tokenizer.decode([t]) + '\x1b[0m')

# Example usage
text = """
English and CAPITALIZATION
iiiiiiiiiiii
ภาษาไทย คำไทย สวัสดี
show_tokens False None elif == >= else:
Two tab:"\t\t" Four space: "    "
12.0*50=600
"""

- Need HF Token: https://huggingface.co/settings/tokens
- If you are using Google Colab, you can add it to secret key.

In [None]:
show_tokens(text,'bert-base-uncased')

In [None]:
show_tokens(text,'gpt2')

In [None]:
show_tokens(text,'gpt-4')

- For gemma models, need to login and get the access first here (https://huggingface.co/google/gemma-2-9b)

- Gemma 2: https://huggingface.co/google/gemma-2-9b
- Gemma 3: https://huggingface.co/google/gemma-3-12b-it

In [None]:
!huggingface-cli login

In [None]:
show_tokens(text,'google/gemma-2-9b')

In [None]:
show_tokens(text,'google/gemma-3-12b-it')

# API Demo

In [1]:
from openai import OpenAI
import json
from google.colab import userdata

llm = OpenAI(
   api_key='sk-Ra57yuyDq7UAQXRRr93SW3oXN1nPZRhFu3MosMr6In05BfY5',   # Insert your API key here
   base_url='https://api.opentyphoon.ai/v1' # Use this url to access Typhoon models
)

In [4]:
response = llm.chat.completions.create(
    model="typhoon-v2-8b-instruct",         # Selected Typhoon model
    messages=[
        {
            "role": "user",
            "content": "Hello how are you doing today?"
        }
    ]
)
print(response.choices[0].message.content)

Hello! I'm doing well, thank you for asking. How about you?


In [5]:
client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key=userdata.get('openrouter'),
)

Hello! How can I assist you today? 😊


In [None]:
prompt = "Hello!"

completion = client.chat.completions.create(
  model="qwen/qwen3-14b:free",
  messages=[
    {
      "role": "user",
      "content": prompt
    }
  ]
)
print(completion.choices[0].message.content)

In [None]:
# Basic chat loop
msg_history = []
system_prompt = "Be nice"
msg_history.append({'role':'system',
                    'content':system_prompt})


# Prompt Engineering Demo

In [14]:
def gen_response(user_prompt, system_prompt="", temperature=0):
  completion = client.chat.completions.create(
    model="qwen/qwen3-14b:free",
    messages=[ {"role": "system","content": system_prompt},
               {"role": "user","content": user_prompt} ]
  )
  return completion.choices[0].message.content

gen_response("Hello")

'Hello! How can I assist you today? 😊'

**Q1** Get the model to count to three

In [15]:
gen_response("Count from one to three")

'1, 2, 3.'

**Q2** Modify the system prompt to make **the model respond like it's a 3 year old child**.

In [19]:
system_prompt = ""
prompt = "How big is the sky?"
gen_response(prompt, system_prompt)

'The concept of "how big the sky is" depends on what you mean by "sky," as it can refer to different things:\n\n1. **The Atmosphere (Visible Sky):**  \n   - The *sky* you see (the blue dome over your head) is part of Earth\'s **atmosphere**, which extends about **100 km (62 miles)** above the surface.  \n   - This includes layers like the troposphere (weather), stratosphere (ozone layer), and exosphere (where the atmosphere merges into space). However, the atmosphere gradually thins and doesn\'t have a strict boundary—it becomes space at the **Kármán line (100 km)**, used as the starting point for space exploration.\n\n2. **The Celestial Sky (Space and Beyond):**  \n   - If you\'re thinking of the **celestial sky** (stars, planets, galaxies), that’s not limited to Earth’s atmosphere. It encompasses the **entire observable universe**, which is about **93 billion light-years in diameter** (and possibly infinite).  \n   - The "sky" you see at night is a **celestial sphere**—a conceptual m

**Q3** Modify the basketball player prompt so that the model doesn't equivocate at all and responds with ONLY the name of one specific player, with no other words or punctuation.

In [None]:
prompt = "Who is the best basketball player of all time?"
print(gen_response(prompt))

**Q4** Modify the prompt so that the model responds with as long a response as you can muster. The response should be over 800 words.

In [None]:
prompt = "Can you write me a story?"
response = gen_response(prompt)
print(f"Number of words = {len(response.split())}")
print(response)

**Q5**: In this exercise, we'll be instructing the model to sort emails into the following categories:

* (A) Pre-sale question
* (B) Broken or defective item
* (C) Billing question
* (D) Other (please explain)

For the first part of the exercise, change the prompt below to **make the model output the correct classification and ONLY the classification**. Your answer needs to include the letter (A - D) of the correct choice, with the parentheses, as well as the name of the category.

Extra: try giving the model examples.


In [None]:
#Modify this!
prompt = """Please classify this email as either green or blue: {{EMAIL}}
"""

# Hint:
## 1) How will Claude know what categories you want to use?
## 2) Be sure to tell to only include the classification
## 3) Consider using prefill to force Claude to only response with the classification immedietely

email1 = "Hi -- My Mixmaster4000 is producing a strange noise when I operate it. It also smells a bit smoky and plasticky, like burning electronics.  I need a replacement."
email2 = "Can I use my Mixmaster 4000 to mix paint, or is it only meant for mixing food?"
email3 = "I HAVE BEEN WAITING 4 MONTHS FOR MY MONTHLY CHARGES TO END AFTER CANCELLING!!  WTF IS GOING ON???"
email4 = "How did I get here I am not good with computer.  Halp."

print(gen_response(prompt.format(EMAIL=email1))) # The answer is B
print(gen_response(prompt.format(EMAIL=email2))) # The answer is D or A
print(gen_response(prompt.format(EMAIL=email3))) # The answer is C
print(gen_response(prompt.format(EMAIL=email4))) # The answer is D

# Reasoning Demo

In [7]:
prompt = "What is 1+2+3+4?"

completion = client.chat.completions.create(
  model="qwen/qwen3-14b:free",
  messages=[
    {
      "role": "user",
      "content": prompt
    }
  ]
)
print(completion.choices[0].message.content)

The sum of 1 + 2 + 3 + 4 is calculated as follows:

1. **Direct Addition**:
   - Start with $1 + 2 = 3$.
   - Next, add $3 + 3 = 6$.
   - Finally, add $6 + 4 = 10$.

2. **Grouping Strategy**:
   - Pair $1 + 4 = 5$ and $2 + 3 = 5$.
   - Then, $5 + 5 = 10$.

3. **Arithmetic Series Formula**:
   - For consecutive integers from 1 to $n$, the sum is $\frac{n(n + 1)}{2}$.
   - Here, $n = 4$, so $\frac{4 \times 5}{2} = \frac{20}{2} = 10$.

All methods confirm the result. 

**Answer:** $1 + 2 + 3 + 4 = \boxed{10}$


In [8]:
completion

ChatCompletion(id='gen-1747159800-ZcZgMAuuXaP4wQrLT7UP', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The sum of 1 + 2 + 3 + 4 is calculated as follows:\n\n1. **Direct Addition**:\n   - Start with $1 + 2 = 3$.\n   - Next, add $3 + 3 = 6$.\n   - Finally, add $6 + 4 = 10$.\n\n2. **Grouping Strategy**:\n   - Pair $1 + 4 = 5$ and $2 + 3 = 5$.\n   - Then, $5 + 5 = 10$.\n\n3. **Arithmetic Series Formula**:\n   - For consecutive integers from 1 to $n$, the sum is $\\frac{n(n + 1)}{2}$.\n   - Here, $n = 4$, so $\\frac{4 \\times 5}{2} = \\frac{20}{2} = 10$.\n\nAll methods confirm the result. \n\n**Answer:** $1 + 2 + 3 + 4 = \\boxed{10}$', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning="Okay, I need to figure out what 1 + 2 + 3 + 4 is. Let me start by recalling how addition works. Adding numbers together means combining their quantities. So, starting with the first two numbers: 1

# Tool Uses

## Example 1: Using a Calculator Tool

### Setup a simple calculator tool
- We will simply create a function to do a simple arthmetic calculation.
- This function is for demonstration only. The use of `eval` is not recommended.

In [47]:
import re

def calculate(expression):
    # Remove any non-digit or non-operator characters from the expression
    expression = re.sub(r'[^0-9+\-*/().]', '', expression)

    try:
        # Evaluate the expression using the built-in eval() function
        result = eval(expression)
        return str(result)
    except (SyntaxError, ZeroDivisionError, NameError, TypeError, OverflowError):
        return "Error: Invalid expression"

# Claude version
tools = [
    {
        "name": "calculator",
        "description": "A simple calculator that performs basic arithmetic operations.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "The mathematical expression to evaluate (e.g., '2 + 3 * 4')."
                }
            },
            "required": ["expression"]
        }
    }
]

# OpenAI (Typhoon) version
tools = [
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "A simple calculator that performs basic arithmetic operations.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate (e.g., '2 + 3 * 4')."
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

In [48]:
messages = [
    {"role": "user", "content": "What is the result of 1,000 * 9,343,116?"},
    ]

response = llm.chat.completions.create(
    model="typhoon-v2-8b-instruct",     # Selected Typhoon model
    messages=messages,                  # Message history
    tools=tools,                        # Tools
)

llm_response = response.choices[0].message
print(llm_response)

ChatCompletionMessage(content='{\n"tool_calls": [\n    {"name": "calculator", "arguments": {"expression": "1000 * 9343116"}}\n]}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='bb173d67-f58c-4b86-9ffb-875cb98d0e80', function=Function(arguments='{"expression": "1000 * 9343116"}', name='calculator'), type='function')])


In [49]:
response

ChatCompletion(id='ntweYC6-57nCBj-93f66574bbcb3231', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='{\n"tool_calls": [\n    {"name": "calculator", "arguments": {"expression": "1000 * 9343116"}}\n]}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='bb173d67-f58c-4b86-9ffb-875cb98d0e80', function=Function(arguments='{"expression": "1000 * 9343116"}', name='calculator'), type='function')]))], created=1747183775, model='typhoon-v2-8b-instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=30, prompt_tokens=338, total_tokens=368, completion_tokens_details=None, prompt_tokens_details=None), prompt=[], duration=2.25516)

- Use tool

In [50]:
for tool_call in llm_response.tool_calls:

    # Obtain the name and arguments from tool call message
    name = tool_call.function.name
    arguments = json.loads(tool_call.function.arguments)
    expression = arguments['expression']
    tool_result = calculate(expression)

    print(tool_result)

9343116000


- Create a tool result message

In [26]:
tool_result = {
        "role" : "tool",
         # Convert to string
         "content" : json.dumps({
            "name": name,
            "arguments": arguments,
            "results": tool_result
            }
         )
}

# Add tool call and tool result to messages
messages.append(response.choices[0].message) # LLM response including tool call message
messages.append(tool_result)

In [27]:
messages

[{'role': 'user', 'content': 'What is the result of 1,000 * 9,343,116?'},
 ChatCompletionMessage(content='{\n"tool_calls": [\n    {"name": "calculator", "arguments": {"expression": "1000 * 9343116"}}\n]}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='2d9274ae-706e-4122-8b7a-2019daeb4294', function=Function(arguments='{"expression": "1000 * 9343116"}', name='calculator'), type='function')]),
 {'role': 'tool',
  'content': '{"name": "calculator", "arguments": {"expression": "1000 * 9343116"}, "results": "9343116000"}'}]

In [30]:
final_response = llm.chat.completions.create(
    model="typhoon-v2-8b-instruct",     # Selected Typhoon model
    messages=messages,                  # Message history
    tools=tools,                        # Tool
)
print(final_response.choices[0].message.content)

The result of 1,000 * 9,343,116 is 9,343,116,000.


In [31]:
final_response

ChatCompletion(id='ntwUem5-57nCBj-93f6365eb89ca077', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The result of 1,000 * 9,343,116 is 9,343,116,000.', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1747181846, model='typhoon-v2-8b-instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=25, prompt_tokens=411, total_tokens=436, completion_tokens_details=None, prompt_tokens_details=None), prompt=[], duration=1.91086)

## Example 2: A Customer Service Agent
- We will create a mockup customer service agent that can look up customer information, retrieve order details, and cancel orders on behalf of the custome.

### Step1:  Simulate synthetic tool responses

In [51]:
def get_customer_info(customer_id):
    # Simulated customer data
    customers = {
        "C1": {"name": "John Doe", "email": "john@example.com", "phone": "123-456-7890"},
        "C2": {"name": "Jane Smith", "email": "jane@example.com", "phone": "987-654-3210"}
    }
    return customers.get(customer_id, "Customer not found")

def get_order_details(order_id):
    # Simulated order data
    orders = {
        "O1": {"id": "O1", "product": "Widget A", "quantity": 2, "price": 19.99, "status": "Shipped"},
        "O2": {"id": "O2", "product": "Gadget B", "quantity": 1, "price": 49.99, "status": "Processing"}
    }
    return orders.get(order_id, "Order not found")

def cancel_order(order_id):
    # Simulated order cancellation
    if order_id in ["O1", "O2"]:
        return True
    else:
        return False

### Step 2: Define the client-side tools

In [57]:
tools = [
    {
        "type": "function",
        "function":{
            "name": "get_customer_info",
            "description": "Retrieves customer information based on their customer ID. Returns the customer's name, email, and phone number.",
            "parameters": {
                "type": "object",
                "properties": {
                    "customer_id": {
                        "type": "string",
                        "description": "The unique identifier for the customer."
                    }
                },
                "required": ["customer_id"]
            }
        }
    },
    {
        "type": "function",
        "function":{
            "name": "get_order_details",
            "description": "Retrieves the details of a specific order based on the order ID. Returns the order ID, product name, quantity, price, and order status.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The unique identifier for the order."
                    }
                },
                "required": ["order_id"]
            }
        }
    },
    {
        "type": "function",
        "function":{
            "name": "cancel_order",
            "description": "Cancels an order based on the provided order ID. Returns a confirmation message if the cancellation is successful.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The unique identifier for the order to be cancelled."
                    }
                },
                "required": ["order_id"]
            }
        }
    }
]

### Step 3: Process tool calls and return results

In [53]:
# a function to process the tool calls made by Claude and return the appropriate results
def process_tool_call(tool_name, tool_input):
    if tool_name == "get_customer_info":
        return get_customer_info(tool_input["customer_id"])
    elif tool_name == "get_order_details":
        return get_order_details(tool_input["order_id"])
    elif tool_name == "cancel_order":
        return cancel_order(tool_input["order_id"])

### Step 4: Interact with the chatbot

In [81]:
import json

def chatbot_interaction(user_message):
    print(f"\n{'='*50}\nUser Message: {user_message}\n{'='*50}")

    messages = [
        {"role": "system", "content":"You are a helpful customer support agent. You should use tools to answer questions."},
        {"role": "user", "content": user_message}
    ]
    # Get the initial response
    response = llm.chat.completions.create(
        model="typhoon-v2-8b-instruct",     # Selected Typhoon model
        messages=messages,                  # Message history
        tools=tools,                        # Tool
        temperature = 0                     # set temp = 0 so that's the answer is mostly deterministic
    )
    llm_response = response.choices[0].message
    print(f"\nInitial Response:")
    print(f"Content: {llm_response}")

    for tool_call in llm_response.tool_calls:

        # Obtain the name and arguments from tool call message
        tool_name = tool_call.function.name
        tool_arguments = json.loads(tool_call.function.arguments)
        print(f"\nTool Used: {tool_name}")
        print(f"Tool arguments:")
        print(tool_arguments)

        tool_result = process_tool_call(tool_name, tool_arguments)
        print(f"\nTool Result:")
        print(tool_result)

        # Construct tool result messages
        tool_result = {
            "role" : "tool",
            # Convert to string
            "content" : json.dumps({
                "name": tool_name,
                "arguments": tool_arguments,
                "results": str(tool_result)
                }
            )
        }

        # Add tool call and tool result to messages
        messages.append(llm_response) # LLM response including tool call message
        messages.append(tool_result)

        response = llm.chat.completions.create(
            model="typhoon-v2-8b-instruct",     # Selected Typhoon model
            messages=messages,                  # Message history
            tools=tools,                        # Tool
        )

        print(f"\nResponse:")
        print(f"Content: {response.choices[0]}")

    print(f'\nFinal Response: ')
    print(response.choices[0].message.content)

### Step 5: Test the chatbot

In [None]:
chatbot_interaction("Can you tell me the email address for customer C1?")

In [83]:
chatbot_interaction("What is the status of order O2?")


User Message: What is the status of order O2?

Initial Response:
Content: ChatCompletionMessage(content='{\n"tool_calls": [\n    {"name": "get_order_details", "arguments": {"order_id": "O2"}}\n]}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='be34894f-f32d-401d-bf1e-be3856379e10', function=Function(arguments='{"order_id": "O2"}', name='get_order_details'), type='function')])

Tool Used: get_order_details
Tool arguments:
{'order_id': 'O2'}

Tool Result:
{'id': 'O2', 'product': 'Gadget B', 'quantity': 1, 'price': 49.99, 'status': 'Processing'}

Response:
Content: Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The status of order O2 is Processing.', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))

Final Response: 
The status of order O2 is Processing.


In [None]:
chatbot_interaction("Please cancel order O1 for me.")


User Message: Please cancel order O1 for me.

Initial Response:
Content: ChatCompletionMessage(content='{\n"tool_calls": [\n    {"name": "cancel_order", "arguments": {"order_id": "O1"}}\n]}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='be7be560-3cb3-4e94-9846-26f4d84a28de', function=Function(arguments='{"order_id": "O1"}', name='cancel_order'), type='function')])

Tool Used: cancel_order
Tool arguments:
{'order_id': 'O1'}

Tool Result:
True


# Agentic Demo

Adapted from https://github.com/disler/single-file-agents/blob/main/sfa_duckdb_anthropic_v2.py

In [None]:
!pip install anthropic

In [None]:
import duckdb
import subprocess
import json
from typing import List
from rich import print
from anthropic import Anthropic
from google.colab import userdata

In [None]:
#download the data
!wget https://github.com/disler/single-file-agents/raw/refs/heads/main/data/analytics.db

In [None]:
# Connect to the database
DB_PATH = '/content/analytics.db'
DB_CONN = duckdb.connect(DB_PATH)

## Database Agent

5 tools (functions)
- `list_table`: Returns list of available tables in database
- `describe_table`: Returns schema info for specified table
- `sample_table`: Returns sample rows from specified table, always specify row_sample_size
- `run_test_sql_query`: Tests a SQL query and returns results (only visible to agent)
- `run_final_sql_query`: Runs the final validated SQL query and shows results to user

The agent will keep using tools looking at the result and keep going until it generates accurate queries for the user.

### `list_tables`

Returns a list of tables in the database. The agent uses this to discover available tables and make informed decisions.

In [None]:
def list_tables(reasoning: str) -> List[str]:
    """Returns a list of tables in the database.

    The agent uses this to discover available tables and make informed decisions.

    Args:
        reasoning: Explanation of why we're listing tables relative to user request

    Returns:
        List of table names as strings
    """
    try:
        # Use the global connection
        result = DB_CONN.execute("SELECT name FROM sqlite_master WHERE type='table'").fetchall()

        # Extract table names from the result
        table_names = [row[0] for row in result]

        print(f"List Tables Tool - Reasoning: {reasoning}")
        return table_names
    except Exception as e:
        print(f"Error listing tables: {str(e)}")
        return []

In [None]:
list_tables('Test')

### `describe_table`

Returns schema information about the specified table.
The agent uses this to understand table structure and available columns.

In [None]:
def describe_table(reasoning: str, table_name: str) -> str:
    """Returns schema information about the specified table.

    The agent uses this to understand table structure and available columns.

    Args:
        reasoning: Explanation of why we're describing this table
        table_name: Name of table to describe

    Returns:
        String containing table schema information
    """
    try:
        # Use the global connection to execute DESCRIBE
        result = DB_CONN.execute(f"DESCRIBE {table_name}").fetchall()

        # Convert result to a string - DuckDB's DESCRIBE already returns
        # nicely formatted column information
        schema_info = "\n".join(str(row) for row in result)

        # Log the operation
        print(f"Describe Table Tool - Table: {table_name} - Reasoning: {reasoning}")
        return schema_info

    except Exception as e:
        print(f"Error describing table: {str(e)}")
        return ""

In [None]:
print(describe_table('Test','User'))

```
('id', 'UUID', 'YES', None, None, None)
 │      │      │      │     │     │
 │      │      │      │     │     └─ 6. Column comment or extra info (None here)
 │      │      │      │     └───── 5. Column position or other metadata (None here)
 │      │      │      └─────────── 4. Default value (None here)
 │      │      └──────────────────── 3. Nullable status ('YES' means column can be NULL)
 │      └─────────────────────────── 2. Data type (UUID type)
 └────────────────────────────────── 1. Column name (id)
```

### `sample_table`

Returns a sample of rows from the specified table.
The agent uses this to understand actual data content and patterns.

In [None]:
def sample_table(reasoning: str, table_name: str, row_sample_size: int) -> str:
    """Returns a sample of rows from the specified table.

    The agent uses this to understand actual data content and patterns.

    Args:
        reasoning: Explanation of why we're sampling this table
        table_name: Name of table to sample from
        row_sample_size: Number of rows to sample aim for 3-5 rows

    Returns:
        String containing sample rows in readable format
    """
    try:
        # Use the global connection to select sample rows
        result = DB_CONN.execute(
            f"SELECT * FROM {table_name} LIMIT {row_sample_size}"
        ).fetchall()

        # Get column names for context
        columns = DB_CONN.execute(
            f"SELECT column_name FROM information_schema.columns WHERE table_name = '{table_name}'"
        ).fetchall()
        column_names = [col[0] for col in columns]

        # Format the output with column names and data
        header = str(column_names)
        rows = [str(row) for row in result]

        sample_data = header + "\n" + "\n".join(rows)

        print(f"Sample Table Tool - Table: {table_name} - Rows: {row_sample_size} - Reasoning: {reasoning}")

        return sample_data

    except Exception as e:
        print(f"Error sampling table: {str(e)}")
        return ""

In [None]:
print(sample_table('Test','User',5))

### `run_test_sql_query`

Executes a test SQL query and returns results.

The agent uses this to validate queries before finalizing them. Results are only shown to the agent, not the user.

In [None]:
def run_test_sql_query(reasoning: str, sql_query: str) -> str:
    """Executes a test SQL query and returns results.

    The agent uses this to validate queries before finalizing them.
    Results are only shown to the agent, not the user.

    Args:
        reasoning: Explanation of why we're running this test query
        sql_query: The SQL query to test

    Returns:
        Query results as a string
    """
    try:
        # Use the global connection to execute the query
        result = DB_CONN.execute(sql_query).fetchall()

        # Convert result to a simple string representation
        # For educational purposes, keeping the output straightforward
        output = "\n".join(str(row) for row in result)

        print(f"Test Query Tool - Reasoning: {reasoning}")
        print(f"Query: {sql_query}")

        return output
    except Exception as e:
        print(f"Error running test query: {str(e)}")
        return str(e)

In [None]:
print(run_test_sql_query('Test','SELECT * FROM User WHERE AGE > 50'))

### `run_final_sql_query`

In [None]:
def run_final_sql_query(reasoning: str, sql_query: str) -> str:
    """Executes the final SQL query and returns results to user.

    This is the last tool call the agent should make after validating the query.

    Args:
        reasoning: Final explanation of how this query satisfies user request
        sql_query: The validated SQL query to run

    Returns:
        Query results as a string
    """
    try:
        # Use the global connection to execute the query
        result = DB_CONN.execute(sql_query).fetchall()

        # Convert result to a string - format is a simple representation of each row
        results_str = "\n".join(str(row) for row in result)

        # Use regular print with simple formatting
        print(f"Final Query Tool\nReasoning: {reasoning}\nQuery: {sql_query}")

        return results_str
    except Exception as e:
        print(f"Error running final query: {str(e)}")
        return str(e)

In [None]:
print(run_final_sql_query('Test','SELECT * FROM User WHERE AGE > 50'))

## Defining Tools

In [None]:
tools=[
        {
            "name": "list_tables",
            "description": "Returns a list of available tables in database",
            "input_schema": {
                "type": "object",
                "properties": {
                    "reasoning": {
                        "type": "string",
                        "description": "Explanation for listing tables",
                    }
                },
                "required": ["reasoning"],
            },
        },
        {
            "name": "describe_table",
            "description": "Returns schema info for a specified table",
            "input_schema": {
                "type": "object",
                "properties": {
                    "reasoning": {
                        "type": "string",
                        "description": "Why we need to describe this table",
                    },
                    "table_name": {
                        "type": "string",
                        "description": "Name of a table to describe",
                    },
                },
                "required": ["reasoning", "table_name"],
            },
        },
        {
            "name": "sample_table",
            "description": "Returns sample rows from specified table",
            "input_schema": {
                "type": "object",
                "properties": {
                    "reasoning": {
                        "type": "string",
                        "description": "Why we need to sample this table",
                    },
                    "table_name": {
                        "type": "string",
                        "description": "Name of table to sample",
                    },
                    "row_sample_size": {
                        "type": "integer",
                        "description": "Number of rows to sample aim for 3-5 rows",
                    },
                },
                "required": ["reasoning", "table_name", "row_sample_size"],
            },
        },
        {
            "name": "run_test_sql_query",
            "description": "Tests a SQL query and returns results (only visible to agent)",
            "input_schema": {
                "type": "object",
                "properties": {
                    "reasoning": {
                        "type": "string",
                        "description": "Why we're testing this specific query",
                    },
                    "sql_query": {
                        "type": "string",
                        "description": "The SQL query to test",
                    },
                },
                "required": ["reasoning", "sql_query"],
            },
        },
        {
            "name": "run_final_sql_query",
            "description": "Runs the final validated SQL query and shows results to user",
            "input_schema": {
                "type": "object",
                "properties": {
                    "reasoning": {
                        "type": "string",
                        "description": "Final explanation of how query satisfies user request",
                    },
                    "sql_query": {
                        "type": "string",
                        "description": "The validated SQL query to run",
                    },
                },
                "required": ["reasoning", "sql_query"],
            },
        },
    ]

In [None]:
# Helper function calling
def call_function(func_name, func_args):

    if func_name == "list_tables":
        result = list_tables(reasoning=func_args["reasoning"])
    elif func_name == "describe_table":
        result = describe_table(
            reasoning=func_args["reasoning"],
            table_name=func_args["table_name"],
        )
    elif func_name == "sample_table":
        result = sample_table(
            reasoning=func_args["reasoning"],
            table_name=func_args["table_name"],
            row_sample_size=func_args["row_sample_size"],
        )
    elif func_name == "run_test_sql_query":
        result = run_test_sql_query(
            reasoning=func_args["reasoning"],
            sql_query=func_args["sql_query"],
        )
    elif func_name == "run_final_sql_query":
        result = run_final_sql_query(
            reasoning=func_args["reasoning"],
            sql_query=func_args["sql_query"],
        )
    else:
        raise Exception(f"Unknown tool call: {func_name}")
    return result

## Prompt

**Note**: When you call the Anthropic API with the tools parameter, we construct a special system prompt from the tool definitions, tool configuration, and any user-specified system prompt. The constructed prompt is designed to instruct the model to use the specified tool(s) and provide the necessary context for the tool to operate properly.

In [None]:
AGENT_PROMPT = """You are a world-class expert at crafting precise DuckDB SQL queries.
Your goal is to generate accurate queries that exactly match the user's data needs.

<instructions>
    - Use the provided tools to explore the database and construct the perfect query.
    - Start by listing tables to understand what's available.
    - Describe tables to understand their schema and columns.
    - Sample tables to see actual data patterns.
    - Test queries before finalizing them.
    - Only call run_final_sql_query when you're confident the query is perfect.
    - Be thorough but efficient with tool usage.
    - If you find your run_test_sql_query tool call returns an error or won't satisfy the user request, try to fix the query or try a different query.
    - Think step by step about what information you need.
    - Be sure to specify every parameter for each tool call.
    - Every tool call should have a reasoning parameter which gives you a place to explain why you are calling the tool.
</instructions>

<user-request>
    {{user_request}}
</user-request>
"""

## Main Agent Loop

Example to test: Show me all users with score above 80

In [None]:
user_query = input('Enter query: ')
completed_prompt = AGENT_PROMPT.replace("{{user_request}}", user_query)
messages = [{"role": "user", "content": completed_prompt}]

max_iteration = 15
compute_iterations = 0
final_result = False

# Main agent loop
while not final_result:
    print(f"\n=== Agent Loop {compute_iterations+1}/{max_iteration} ===")
    compute_iterations += 1

    if compute_iterations > max_iteration:
      print("Warning: Reached maximum compute loops without final query" )
      break

    # Generate content with tool support
    response = client.messages.create(
        model="claude-3-7-sonnet-20250219",
        max_tokens=1024,
        messages=messages,
        tools=tools,
        tool_choice={"type": "any"}  # Always force a tool call
    )

    # Look for tool calls in the response (expecting ToolUseBlock objects)
    tool_calls = []
    for block in response.content:
        if hasattr(block, "type") and block.type == "tool_use":
            tool_calls.append(block)

    for tool_call in tool_calls:
        tool_use_id = tool_call.id
        func_name = tool_call.name
        func_args = (tool_call.input)

        print(f"Tool Call: {func_name}({json.dumps(func_args)})")
        messages.append({"role": "assistant", "content": response.content})

        try:
            result = call_function(func_name, func_args)
            if func_name == "run_final_sql_query":
                print(f"\n\n***** Final Results *****\n")
                print(result)
                final_result = True
                break
            print(f"Tool Call Result: {func_name}(...) ->\n{result}")

            messages.append(
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": tool_use_id,
                            "content": str(result),
                        }
                    ],
                }
            )

        except Exception as e:
            error_msg = f"Error executing {func_name}: {str(e)}"
            print(f"{error_msg}")
            # See info https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview#troubleshooting-errors
            messages.append(
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": tool_use_id,
                            "content": error_msg,
                            "is_error": True
                        }
                    ]
                }
            )
            continue