# Day 4: Agents and Tools

https://docs.google.com/document/d/1x8w85qr_ai-uodf2Q3kTPTaaehNloPwuKUAhf_Rsyi0/edit?tab=t.0

### 1. Tools and Agents

An agent is an LLM that can not only generate texts, but also invoke tools. Tools are external functions that the LLM can call in order to retrieve information, perform calculations, or take actions.

In [18]:
import os
import openai
from groq import Groq

# Initialize the OpenAI client, but point it to the Groq API endpoint
# groq_client = openai.OpenAI(
#     api_key=os.environ.get("GROQ_API_KEY"),
#     base_url="https://api.groq.com/openai/v1",
# )


groq_client = Groq(
    api_key=os.environ.get("GROQ_API_KEY")
    )

user_prompt = "I just discovered the course, can I join now?"

chat_messages = [
    {"role": "user", "content": user_prompt}
]

response = groq_client.chat.completions.create(
        messages = chat_messages,
        model = "openai/gpt-oss-20b",
    )

print(response.choices[0].message.content)

It sounds like you’re interested in joining a course you just found. I can definitely help you figure out the next steps! 

---

### 1️⃣ Which Course Are You Interested In?
- **Course Title / ID** (if you have it handy)
- **Instructor / Institution** (helps us locate the exact class)

### 2️⃣ Enrollment Window
- Most courses have a **fixed enrollment period** (e.g., a few weeks before the start date).
- If you’re within that window, you can usually register immediately.
- If the window has closed, we can:
  - **Add you to a waitlist** (if the platform offers it).
  - Suggest an **alternate session** that starts soon.
  - Check if there’s a **recorded version** you can access later.

### 3️⃣ Prerequisites & Materials
- Some courses require specific knowledge or pre‑work.  
  - If you’re unsure, let me know the course name and I’ll pull up the prerequisites list.
- Also, confirm whether you need to download any software, text, or other resources beforehand.

### 4️⃣ Payment & Confirmatio

With only LLM, the response is generic and the answer is not really useful.

But if we let it invoke the `search(query)`, the agent can give us a more useful answer.

Here's how the conversation would flow with our agent using the `search` tool:
- **User**: "I just discovered the course, can I join now?"
- **Agent** thinking: I can't answer this question, so I need to search for information about course enrollment and timing.
- **Tool call**: `search("course enrollment join registration deadline")`
- **Tool response**: (...search results...)
- **Agent response**: "Yes, you can still join the course even after the start date..."


### 2. Function Calling with OpenAI

In [33]:
from utils.ingest import read_repo_data
from minsearch import Index
from pprint import pprint

In [7]:
# DataTalksClub FAQ
dtc_faq = read_repo_data('DataTalksClub', 'faq')

de_dtc_faq = [d for d in dtc_faq if 'data-engineering' in d['filename']]

faq_index = Index(
    text_fields=["question", "content"],
    keyword_fields=[]
)

faq_index.fit(de_dtc_faq)

<minsearch.minsearch.Index at 0x774be739ec90>

In [8]:
def text_search(query):
    return faq_index.search(query, num_results=5)

In [19]:
text_search_tool = {
    "type": "function",
    "function": {
        "name": "text_search",
        "description": "Search the FAQ database.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query text to look up in the course FAQ."
                }
            },
            "required": ["query"],
            "additionalProperties": False
        }
    }
}

In [55]:
system_prompt = """
You are a helpful assistant for a course. 
"""

question = "I just discovered the course, can I join now?"

chat_messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": question}
]

response = groq_client.chat.completions.create(
    model='openai/gpt-oss-20b',
    messages=chat_messages,
    tools=[text_search_tool]
)

In [56]:
response_message = response.choices[0].message
tool_calls = response_message.tool_calls
print(tool_calls)

[ChatCompletionMessageToolCall(id='fc_06c903e3-6fb2-4235-8134-169675cda982', function=Function(arguments='{"query":"join course now enrollment deadline"}', name='text_search'), type='function')]


In [57]:
import json

tool_call = tool_calls[0]

arguments = json.loads(tool_call.function.arguments)
pprint(arguments)

{'query': 'join course now enrollment deadline'}


In [58]:
result = text_search(**arguments)

pprint(result)

[{'content': "Yes, even if you don't register, you're still eligible to submit "
             'the homework.\n'
             '\n'
             'Be aware, however, that there will be deadlines for turning in '
             "homeworks and the final projects. So don't leave everything for "
             'the last minute.',
  'filename': 'faq-main/_questions/data-engineering-zoomcamp/general/003_3f1424af17_course-can-i-still-join-the-course-after-the-start.md',
  'id': '3f1424af17',
  'question': 'Course: Can I still join the course after the start date?',
  'sort_order': 3},
 {'content': 'The next cohort starts January 13th, 2025. More info at '
             '[DTC](https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html).\n'
             '\n'
             '- Register before the course starts using this '
             '[link](https://airtable.com/shr6oVXeQvSI5HuWD).\n'
             '- Join the [course Telegram channel with '
             'announcements](https://t.me

In [59]:
tool_call_output = {
    "tool_call_id": tool_call.id,
    "role": "tool",
    "name": tool_call.function.name,
    "content": json.dumps(result), 
}

pprint(tool_call_output)

{'content': '[{"id": "3f1424af17", "question": "Course: Can I still join the '
            'course after the start date?", "sort_order": 3, "content": "Yes, '
            "even if you don't register, you're still eligible to submit the "
            'homework.\\n\\nBe aware, however, that there will be deadlines '
            "for turning in homeworks and the final projects. So don't leave "
            'everything for the last minute.", "filename": '
            '"faq-main/_questions/data-engineering-zoomcamp/general/003_3f1424af17_course-can-i-still-join-the-course-after-the-start.md"}, '
            '{"id": "9e508f2212", "question": "Course: When does the course '
            'start?", "sort_order": 1, "content": "The next cohort starts '
            'January 13th, 2025. More info at '
            '[DTC](https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html).\\n\\n- '
            'Register before the course starts using this '
            '[link](https://ai

In [60]:
chat_messages.append(response_message)
chat_messages.append(tool_call_output)

response = groq_client.chat.completions.create(
    model='openai/gpt-oss-20b',
    messages=chat_messages,
    tools=[text_search_tool]
)

print(response.choices[0].message.content)

Absolutely! You can join a cohort even after it has officially started.

### What that means
- **You’ll still get access to all the lectures, materials, and assignments** – the course runs continuously, so you can start learning at any time.
- **Homework and project deadlines stay the same** – you just need to keep up with the schedule. Don’t wait until the last minute; the course timeline is designed to keep everyone on track.
- **Community support is still there** – the Telegram channel, Slack workspace, and community forums stay open to new members, so you’ll never feel left out.

### How to get in
1. **Register** – use the same registration link as any new cohort:  
   [Register here](https://airtable.com/shr6oVXeQvSI5HuWD).  
2. **Join the community** –  
   - Telegram: [dezoomcamp](https://t.me/dezoomcamp)  
   - Slack: add yourself to the DataTalks.Club workspace and join the relevant channel.
3. **Set up your environment** – the FAQ “What can I do before the course starts?” lis

### 3. System Prompt: Instructions

We have two things here:

- `system_prompt` contains instructions for the LLM
- `question` ("user prompt") is the actual question or task

The system prompt is very important: it influences how the agent behaves. This is how we can control what the agent does and how it responds to user questions.

Usually, the more complete the instructions in the system prompt are, the better the results.


In [63]:
system_prompt = """
You are a helpful assistant for a course. 

Use the search tool to find relevant information from the course materials before answering questions.

If you can find specific information through search, use it to provide accurate answers.
If the search doesn't return relevant results, let the user know and provide general guidance.
"""


In [65]:
# # Agent to make multiple search queries
# system_prompt = """
# You are a helpful assistant for a course. 

# Always search for relevant information before answering. 
# If the first search doesn't give you enough information, try different search terms.

# Make multiple searches if needed to provide comprehensive answers.
# """


### 4. Pydantic AI

Working with function calls can get tricky — you have to decide which function to call, handle the inputs, send results back to the LLM, and manage follow-up tasks. It’s easy to slip up in that process.

To simplify this, we’ll use a library designed for agentic workflows. There are several options, like `OpenAI’s Agents SDK`, `LangChain`, and `Pydantic AI`.

For this example, we’ll go with Pydantic AI because its API is straightforward and well-documented.

In [66]:
from typing import List, Any

def text_search(query: str) -> List[Any]:
    """
    Perform a text-based search on the FAQ index.

    Args:
        query (str): The search query string.

    Returns:
        List[Any]: A list of up to 5 search results returned by the FAQ index.
    """
    return faq_index.search(query, num_results=5)

In [69]:
from pydantic_ai import Agent

agent = Agent(
    name="faq_agent",
    instructions=system_prompt,
    tools=[text_search],
    model='groq:openai/gpt-oss-20b', # use Groq inference
)

In [70]:
question = "I just discovered the course, can I join now?"

result = await agent.run(user_prompt=question)

In [76]:
print(result.output)

Absolutely! You can still sign up—just make sure you register **before** the cohort actually starts.

Here’s what you need to do:

1. **Register**  
   Use the registration form linked in the FAQ:  
   https://airtable.com/shr6oVXeQvSI5HuWD  

2. **Get the course started**  
   * Join the official course Telegram channel for announcements:  
     https://t.me/dezoomcamp  
   * Join DataTalks.Club’s Slack workspace and hop into the appropriate channel (you’ll find the invite link after you register).  

3. **What to expect**  
   * The next cohort begins **January 13 , 2025** (you can check the exact date in the FAQ).  
   * Even if you sign up after the start date, you’re still eligible to submit all homework assignments—just keep an eye on the deadlines so you don’t miss any.  

4. **Before you start**  
   * Install the required tools (Google Cloud account & SDK, Python 3 via Anaconda, Terraform, Git).  
   * Review the prerequisites and syllabus to make sure you’re comfortable with 

In [77]:
result.new_messages()

[ModelRequest(parts=[UserPromptPart(content='I just discovered the course, can I join now?', timestamp=datetime.datetime(2025, 10, 2, 9, 56, 44, 973019, tzinfo=datetime.timezone.utc))], instructions="You are a helpful assistant for a course. \n\nUse the search tool to find relevant information from the course materials before answering questions.\n\nIf you can find specific information through search, use it to provide accurate answers.\nIf the search doesn't return relevant results, let the user know and provide general guidance."),
 ModelResponse(parts=[ThinkingPart(content='User: "I just discovered the course, can I join now?" Likely want to check enrollment period. Need to search.'), ToolCallPart(tool_name='text_search', args='{"query":"enrollment period join now course"}', tool_call_id='fc_fed211f5-9bf0-4b2f-9c4c-aee3d886f61d')], usage=RequestUsage(input_tokens=239, output_tokens=55), model_name='openai/gpt-oss-20b', timestamp=datetime.datetime(2025, 10, 2, 9, 56, 45, tzinfo=TzInf