### Guardrails

Guardrails serve as safety lines for inputs and outputs. 
They prevent LLM from leading to jailbreak, prompt injections and hallucinations.

In [14]:
from agents import (
    Agent, 
    Runner, 
    RunContextWrapper, 
    GuardrailFunctionOutput, 
    OutputGuardrailTripwireTriggered,
    InputGuardrailTripwireTriggered,
    input_guardrail,
    output_guardrail,
    TResponseInputItem, 
)
from pydantic import BaseModel

In [3]:
class HomeworkCheatDetectionOutput(BaseModel):
    attempting_cheat: bool
    explanation: str

In [4]:
homework_cheat_guardrail_agent = Agent(
    name="Homework Cheat Detector",
    instructions=(
        "Determine if the user's query resembles a typical homework assignment or exam question, indicating an attempt to cheat. General questions about concepts are acceptable. "
        " Cheating: 'Fill in the blank: The capital of France is ____.',"
        " 'Which of the following best describes photosynthesis? A) Cellular respiration B) Conversion of light energy C) Evaporation D) Fermentation.'"
        " Not-Cheating: 'What is the capital of France?', 'Explain photosynthesis.'"
    ),
    output_type=HomeworkCheatDetectionOutput,
    model="gpt-4o-mini"
)

In [None]:

@input_guardrail
async def cheat_detection_guardrail(
        ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput :
    
    detection_result = await Runner.run(homework_cheat_guardrail_agent, input)

    return GuardrailFunctionOutput(
        tripwire_triggered=detection_result.final_output.attempting_cheat,
        output_info=detection_result.final_output
    )

study_helper_agent = Agent(
    name="Study Helper Agent",
    instructions="You assist users in studying by explaining concepts or providing guidance, without directly solving homework or test questions.",
    model="gpt-4.1-nano",
    input_guardrails=[cheat_detection_guardrail]
)

# This should trigger the cheat detection guardrail

In [6]:
# First test: Should trigger the guardrail
try:
    response = await Runner.run(study_helper_agent, "Fill in the blank: The process of converting light energy into chemical energy is called ____.")
    print("Guardrail didn't trigger")
    print("Response: ", response.final_output)

except InputGuardrailTripwireTriggered as e:
    print("Homework cheat guardrail triggered")
    print("Exception details:", str(e))

Homework cheat guardrail triggered
Exception details: Guardrail InputGuardrail triggered tripwire


In [7]:
# This should trigger the cheat detection guardrail
try:
    response = await Runner.run(study_helper_agent, "What were the main causes of the American civil war?")
    print("Guardrail didn't trigger")
    print("Response: ", response.final_output)

except InputGuardrailTripwireTriggered as e:
    print("Homework cheat guardrail triggered")
    print("Exception details:", str(e))

Guardrail didn't trigger
Response:  The main causes of the American Civil War were a complex mix of political, economic, social, and moral issues. Here are some of the key factors:

1. **Slavery**: The most contentious issue was the institution of slavery, especially its expansion into new territories and states. Southern states relied heavily on slavery for their economy, while Northern states increasingly opposed its expansion.

2. **States' Rights**: Southern states championed the idea that states had the right to govern themselves and make decisions—like allowing slavery—without federal interference. This led to conflicts over federal authority versus state sovereignty.

3. **Economic and Social Differences**: The North developed an industrial, commerce-based economy, while the South maintained an agrarian economy dependent on slavery and cotton production. These differing economic interests fueled tensions.

4. **Expansion and Westward Movement**: The question of whether new terri

In [10]:
class MessageOutput(BaseModel):
    response: str

@output_guardrail
async def forbidden_words_guardrail(ctx: RunContextWrapper, agent: Agent, output: str) -> GuardrailFunctionOutput:
    print(f"Checking output for forbidden phrases: {output}")

    # Funny forbidden phrases to check
    forbidden_phrases = ["fart", "booger", "silly goose"]

    # Convert output to lowercase for case-insensitive comparison
    output_lower = output.lower()

    # Check which forbidden phrases are present in the response
    found_phrases = [phrase for phrase in forbidden_phrases if phrase in output_lower]
    trip_triggered = bool(found_phrases)

    print(f"Found forbidden phrases: {found_phrases}")

    return GuardrailFunctionOutput(
        output_info={
            "reason": "Output contains forbidden phrases.",
            "forbidden_phrases_found": found_phrases,
        },
        tripwire_triggered=trip_triggered,
    )

test_agent = Agent(
    name="Customer support agent",
    instructions="You are a customer support agent. You help customers with their questions.",
    output_guardrails=[forbidden_words_guardrail],
    model="gpt-4o-mini",
)

In [11]:
try:
    await Runner.run(test_agent, "Say the word fart")
    print("Guardrail didn't trip - this is unexpected")
except OutputGuardrailTripwireTriggered:
    print("The agent said a bad word, he is fired.")

Checking output for forbidden phrases: Fart! If you have any questions or need assistance, feel free to ask!
Found forbidden phrases: ['fart']
The agent said a bad word, he is fired.


In [12]:
try:
    await Runner.run(test_agent, "Hey wassup")
    print("Guardrail didn't trip yay")
except OutputGuardrailTripwireTriggered:
    print("The agent said a bad word, he is fired.")

Checking output for forbidden phrases: Hello! How can I assist you today?
Found forbidden phrases: []
Guardrail didn't trip yay
