# Looping Agent Workflow
Author: [Zain Hasan](https://x.com/ZainHasan6)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/togethercomputer/together-cookbook/blob/main/Looping_Agent_Workflow.ipynb)

## Introduction

In this notebook we'll create an agent workflow that continuously loops over and improves the response to a task.

The workflow requires LLMs playing two different roles:

- **Generator LLM**: An LLM that generates possible solutions to a task.
- **Evaluator LLM**: An LLM that evaluates if the proposed solution meets certain criteria.

The Generator starts off the workflow by generating a response. The Evaluator then assesses the solution based on various criteria that we provide to it and either accepts or rejects the solution. If the solution is rejected, the Evaluator provides feedback and guidance on how the problem can be fixed. The Generator then receives this feedback and proposes a new solution. This loop of iterative feedback and improvement continues until the Evaluator approves the solution.

<img src="../images/loop.png" width="700">

In this **looping agent workflow**, we incrementally arrive at optimal solutions through structured evaluation and refinement cycles.

The workflow is built around a straightforward Generator LLM that freely produces solutions to the given task, paired with a constrained Evaluator LLM that assesses against predefined criteria.

The Evaluator LLM outputs structured JSON verdicts containing both a PASS/FAIL status and actionable feedback, which feed back into the Generator LLM to produce increasingly refined solutions.

Now let's see the implementation of this workflow.

### Setup and Utils

In [None]:
# Install libraries
!pip install -qU pydantic together

In [None]:
# Import libraries
import json
import asyncio
import together
from together import Together

from typing import Any, Optional, Dict, List, Literal
from pydantic import Field, BaseModel, ValidationError

TOGETHER_API_KEY = "-- TOGETHER API KEY --"

client = Together(api_key= TOGETHER_API_KEY)

In [2]:
# Simple LLM call helper function - will be used by the Generator
def run_llm(user_prompt : str, model : str, system_prompt : Optional[str] = None):
    """ Run the language model with the given user prompt and system prompt. """
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    
    messages.append({"role": "user", "content": user_prompt})
    
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.7,
        max_tokens=4000,        
    )

    return response.choices[0].message.content

# Simple JSON mode LLM call helper function - will be used by the Evaluator
def JSON_llm(user_prompt : str, schema : BaseModel, system_prompt : Optional[str] = None):
    """ Run a language model with the given user prompt and system prompt, and return a structured JSON object. """
    try:
        messages = []
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        
        messages.append({"role": "user", "content": user_prompt})
        
        extract = client.chat.completions.create(
            messages=messages,
            model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
            response_format={
                "type": "json_object",
                "schema": schema.model_json_schema(),
            },
        )
        
        response = json.loads(extract.choices[0].message.content)
        return response
        
    except ValidationError as e:
        raise ValueError(f"Schema validation failed: {str(e)}")

### Agent Workflow

#### Generator
The generator in a CoT manner and told to take feedback into account

In [4]:
GENERATOR_PROMPT = """
Your goal is to complete the task based on <user input>. If there are feedback 
from your previous generations, you should reflect on them to improve your solution

Output your answer concisely in the following format: 

Thoughts:
[Your understanding of the task and feedback and how you plan to improve]

Response:
[Your code implementation here]
"""

def generate(task: str, generator_prompt: str, context: str = "") -> tuple[str, str]:
    """Generate and improve a solution based on feedback."""
    full_prompt = f"{generator_prompt}\n{context}\nTask: {task}" if context else f"{generator_prompt}\nTask: {task}"

    response = run_llm(full_prompt, model="Qwen/Qwen2.5-Coder-32B-Instruct")
    
    print("\n=== GENERATION START ===")
    print(f"Output:\n{response}\n")
    print("=== GENERATION END ===\n")
    
    return response

#### Evaluator
The evaluator is given criteria that it needs to check for, it is also given a `schema` that controls how it needs to output its responses.

In [5]:
EVALUATOR_PROMPT = """
Evaluate this following code implementation for:
1. code correctness
2. time complexity
3. style and best practices

You should be evaluating only and not attemping to solve the task.

Only output "PASS" if all criteria are met and you have no further suggestions for improvements.

Provide detailed feedback if there are areas that need improvement. You should specify what needs improvement and why.

Only output JSON.
"""

def evaluate(task : str, evaluator_prompt : str, generated_content: str, schema) -> tuple[str, str]:
    """Evaluate if a solution meets requirements."""
    full_prompt = f"{evaluator_prompt}\nOriginal task: {task}\nContent to evaluate: {generated_content}"
    
    response = JSON_llm(full_prompt, schema)
    
    evaluation = response["evaluation"]
    feedback = response["feedback"]

    print("=== EVALUATION START ===")
    print(f"Status: {evaluation}")
    print(f"Feedback: {feedback}")
    print("=== EVALUATION END ===\n")

    return evaluation, feedback

#### Looping Workflow

In [None]:
def loop_workflow(task: str, evaluator_prompt: str, generator_prompt: str) -> tuple[str, list[dict]]:
    """Keep generating and evaluating until the evaluator passes the last generated response."""
    # Store previous responses from generator
    memory = []
    
    # Generate initial response
    response = generate(task, generator_prompt)
    memory.append(response)

    #Build a schema for the evaluation
    class Evaluation(BaseModel):
        evaluation: Literal["PASS", "NEEDS_IMPROVEMENT", "FAIL"]
        feedback: str

    # While the generated response is not passing, keep generating and evaluating
    while True:
        evaluation, feedback = evaluate(task, evaluator_prompt, response, Evaluation)
        # Terminating condition
        if evaluation == "PASS":
            return response
        
        # Add current response and feedback to context and generate a new response
        context = "\n".join([
            "Previous attempts:",
            *[f"- {m}" for m in memory],
            f"\nFeedback: {feedback}"
        ])
        
        response = generate(generator_prompt, task, context)
        memory.append(response)

In [None]:
task = """
Implement a Stack with:
1. push(x)
2. pop()
3. getMin()
All operations should be O(1).
"""

loop_workflow(task, EVALUATOR_PROMPT, GENERATOR_PROMPT)


=== GENERATION START ===
Output:
Thoughts:
To implement a stack with the operations `push(x)`, `pop()`, and `getMin()` all in O(1) time complexity, we can use two stacks. One stack will store the actual stack elements, and the other will store the minimum values. The minimum stack will help us keep track of the minimum element efficiently.

Response:
```python
class MinStack:
    def __init__(self):
        self.stack = []
        self.min_stack = []

    def push(self, x: int) -> None:
        self.stack.append(x)
        if not self.min_stack or x <= self.min_stack[-1]:
            self.min_stack.append(x)

    def pop(self) -> None:
        if self.stack:
            x = self.stack.pop()
            if x == self.min_stack[-1]:
                self.min_stack.pop()

    def top(self) -> int:
        if self.stack:
            return self.stack[-1]
        raise IndexError("Stack is empty")

    def getMin(self) -> int:
        if self.min_stack:
            return self.min_stack[-1]


'Thoughts:\nThe current implementation is mostly correct and handles edge cases by returning `None` for empty stack scenarios. However, we can further improve the code by:\n1. Ensuring consistent method naming (e.g., `getMin` to `get_min` for consistency with `get_top_element`).\n2. Adding type hints and docstrings for better readability and maintainability.\n3. Ensuring that the implementation adheres to best practices by handling empty stack scenarios gracefully and providing clear documentation.\n\nResponse:\n```python\nclass MinStack:\n    def __init__(self):\n        """Initialize the stack and the minimum stack."""\n        self.stack = []\n        self.min_stack = []\n\n    def push(self, x: int) -> None:\n        """Push element x onto the stack.\n\n        Args:\n            x (int): The element to be pushed onto the stack.\n        """\n        self.stack.append(x)\n        if not self.min_stack or x <= self.min_stack[-1]:\n            self.min_stack.append(x)\n\n    def pop(