# RLLM SDK Tutorial: From Rollouts to Training

This tutorial demonstrates the complete RLLM SDK workflow:

1. **Setup**: Start proxy manager and configure OpenAI
2. **Simple Rollout**: Execute a basic rollout function
3. **Trace Retrieval**: Fetch traces from SQLite storage
4. **Trajectory Visualization**: Print and visualize trajectories
5. **Solver-Judge Pattern**: Advanced workflow with decorators
6. **SFT Training**: Supervised fine-tuning on collected traces
7. **RL Training**: Reinforcement learning with PPO

## Prerequisites

Make sure you have:
- OpenAI API key set: `export OPENAI_API_KEY="sk-..."`
- RLLM installed: `pip install -e .`

## 1. Setup and Imports

In [None]:
import asyncio
import os
import re
from pathlib import Path

import click

from rllm.sdk import (
    get_chat_client_async,
    session,
    trajectory,
    TrajectoryView,
)
from rllm.sdk.proxy.proxy_manager import ProxyManager
from rllm.sdk.store import SqliteTraceStore

# Check for OpenAI API key
if not os.environ.get("OPENAI_API_KEY"):
    raise ValueError("Please set OPENAI_API_KEY environment variable")

print("✓ Imports successful")

### Color Printing Utility

We'll use this helper function for colorful output:

In [None]:
def colorful_print(string: str, *args, **kwargs) -> None:
    """Print colored text using click."""
    end = kwargs.pop("end", "\n")
    print(click.style(string, *args, **kwargs), end=end, flush=True)


def print_section(title: str):
    """Print a section header."""
    colorful_print("\n" + "=" * 70, fg="cyan", bold=True)
    colorful_print(f" {title}", fg="cyan", bold=True)
    colorful_print("=" * 70, fg="cyan", bold=True)


print_section("Color Printing Utility Loaded")

### Start Proxy Manager

The ProxyManager automatically starts a LiteLLM proxy server that handles:
- OpenAI API calls
- Automatic trace collection
- Persistent storage in SQLite

In [None]:
# Configuration
DB_PATH = "/tmp/rllm_tutorial.db"
PROXY_PORT = 4000
MODEL = "gpt-4o-mini"

# Clean up existing database
db_file = Path(DB_PATH)
if db_file.exists():
    db_file.unlink()
    print(f"✓ Removed existing database: {DB_PATH}")

# Create OpenAI configuration
config = {
    "model_list": [
        {
            "model_name": MODEL,
            "litellm_params": {
                "model": MODEL,
                "api_key": os.environ.get("OPENAI_API_KEY"),
            },
        }
    ]
}

# Create and start proxy manager
proxy_manager = ProxyManager(
    proxy_host="127.0.0.1",
    proxy_port=PROXY_PORT,
    admin_token="tutorial-admin-token",
)

# Start proxy subprocess
config_path = proxy_manager.start_proxy_subprocess(
    config=config,
    db_path=DB_PATH,
    project="tutorial-project",
)

proxy_url = proxy_manager.get_proxy_url(include_v1=True)
print_section("Proxy Manager Started")
colorful_print(f"Proxy URL: {proxy_url}", fg="green")
colorful_print(f"Database: {DB_PATH}", fg="green")
colorful_print(f"Model: {MODEL}", fg="green")

## 2. Simple Dataset

We'll create a tiny math dataset for this tutorial:

In [None]:
# Simple math problems
MATH_PROBLEMS = [
    {
        "question": "What is 15 + 27?",
        "ground_truth": "42",
    },
    {
        "question": "If a rectangle has length 8 and width 5, what is its area?",
        "ground_truth": "40",
    },
    {
        "question": "What is the square root of 144?",
        "ground_truth": "12",
    },
]

print_section("Dataset Created")
for i, problem in enumerate(MATH_PROBLEMS, 1):
    colorful_print(f"Problem {i}: {problem['question']}", fg="yellow")
    colorful_print(f"  Answer: {problem['ground_truth']}", fg="green")

### Simple Reward Function

In [None]:
def extract_answer(text: str) -> str:
    """Extract numerical answer from text."""
    # Look for numbers in the text
    numbers = re.findall(r'\b\d+\.?\d*\b', text)
    return numbers[-1] if numbers else ""


def simple_math_reward(response: str, ground_truth: str) -> float:
    """Simple exact match reward."""
    predicted = extract_answer(response)
    return 1.0 if predicted == ground_truth else 0.0


# Test the reward function
test_response = "The answer is 42"
test_gt = "42"
reward = simple_math_reward(test_response, test_gt)
colorful_print(f"\nTest reward: {reward}", fg="green" if reward == 1.0 else "red")

## 3. Simple Rollout Function

This is the simplest version - similar to `train_hendrycks_math.py`:

In [None]:
async def simple_rollout(question: str, ground_truth: str) -> tuple[str, float, str]:
    """
    Execute a simple rollout.
    
    Args:
        question: The math problem
        ground_truth: The correct answer
    
    Returns:
        (session_uid, reward, response_text)
    """
    client = get_chat_client_async(base_url=proxy_url, api_key="EMPTY")
    
    with session(task="simple_math") as sess:
        session_uid = sess._uid
        
        # Make LLM call
        response = await client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "user", "content": question},
            ],
            max_tokens=100,
        )
        
        response_text = response.choices[0].message.content
        
        # Calculate reward
        reward = simple_math_reward(response_text, ground_truth)
        
        return session_uid, reward, response_text


print_section("Simple Rollout Function Defined")

### Execute Rollouts

In [None]:
print_section("Executing Rollouts")

rollout_results = []

for i, problem in enumerate(MATH_PROBLEMS, 1):
    colorful_print(f"\nRollout {i}/{len(MATH_PROBLEMS)}", fg="cyan", bold=True)
    colorful_print(f"Question: {problem['question']}", fg="yellow")
    
    session_uid, reward, response = await simple_rollout(
        problem["question"],
        problem["ground_truth"]
    )
    
    rollout_results.append({
        "session_uid": session_uid,
        "question": problem["question"],
        "ground_truth": problem["ground_truth"],
        "response": response,
        "reward": reward,
    })
    
    colorful_print(f"Response: {response}", fg="white")
    reward_color = "green" if reward == 1.0 else "red"
    colorful_print(f"Reward: {reward}", fg=reward_color, bold=True)
    colorful_print(f"Session UID: {session_uid}", fg="blue")

avg_reward = sum(r["reward"] for r in rollout_results) / len(rollout_results)
colorful_print(f"\n✓ Average Reward: {avg_reward:.2f}", fg="green", bold=True)

## 4. Fetch Traces from SQLite Store

After executing rollouts, we flush the tracer and retrieve traces from SQLite:

In [None]:
print_section("Flushing and Retrieving Traces")

# Flush tracer to ensure all traces are written to database
flush_result = await proxy_manager.flush_tracer(timeout=30.0)
colorful_print(f"✓ Flush result: {flush_result}", fg="green")

# Create store and retrieve traces
store = SqliteTraceStore(db_path=DB_PATH)

all_traces = []
for result in rollout_results:
    session_uid = result["session_uid"]
    traces = await store.get_by_session_uid(session_uid)
    
    colorful_print(f"\nSession {session_uid}:", fg="cyan")
    colorful_print(f"  Found {len(traces)} trace(s)", fg="yellow")
    
    for trace in traces:
        colorful_print(f"    Trace ID: {trace.id}", fg="blue")
        all_traces.append((trace, result))

colorful_print(f"\n✓ Total traces retrieved: {len(all_traces)}", fg="green", bold=True)

## 5. Trajectory Visualization

Let's create a nice visualization function for trajectories:

In [None]:
def print_trajectory(trace, metadata: dict):
    """
    Print a trajectory with color-coded information.
    
    Args:
        trace: Trace object from SQLite store
        metadata: Additional metadata (question, reward, etc.)
    """
    data = trace.data
    
    colorful_print("\n" + "=" * 70, fg="cyan", bold=True)
    colorful_print(f"TRAJECTORY: {trace.id}", fg="cyan", bold=True)
    colorful_print("=" * 70, fg="cyan", bold=True)
    
    # Session info
    colorful_print(f"Session UID: {metadata['session_uid']}", fg="blue")
    
    # Model info
    colorful_print(f"Model: {data['model']}", fg="white")
    colorful_print(f"Latency: {data['latency_ms']:.2f} ms", fg="white")
    
    # Token usage
    tokens = data["tokens"]
    colorful_print(
        f"Tokens: prompt={tokens['prompt']}, completion={tokens['completion']}, total={tokens['total']}",
        fg="white"
    )
    
    # Input
    colorful_print("\nINPUT:", fg="yellow", bold=True)
    input_messages = data["input"]["messages"]
    for msg in input_messages:
        role_color = "green" if msg["role"] == "user" else "blue"
        colorful_print(f"  [{msg['role']}]: {msg['content']}", fg=role_color)
    
    # Output
    colorful_print("\nOUTPUT:", fg="magenta", bold=True)
    output_choices = data["output"]["choices"]
    output_text = output_choices[0]["message"]["content"]
    colorful_print(f"  {output_text}", fg="white")
    
    # Ground truth and reward
    colorful_print("\nEVALUATION:", fg="cyan", bold=True)
    colorful_print(f"  Ground Truth: {metadata['ground_truth']}", fg="green")
    
    reward = metadata["reward"]
    reward_color = "green" if reward == 1.0 else "red"
    reward_symbol = "✓" if reward == 1.0 else "✗"
    colorful_print(f"  Reward: {reward_symbol} {reward}", fg=reward_color, bold=True)
    
    colorful_print("=" * 70, fg="cyan", bold=True)


print_section("Trajectory Visualization Function Defined")

### Visualize All Trajectories

In [None]:
print_section("Visualizing Trajectories")

for trace, metadata in all_traces:
    print_trajectory(trace, metadata)

## 6. Advanced: Solver-Judge Pattern with Decorators

Now let's demonstrate a more complex workflow using the `@trajectory` decorator.
This shows how RLLM SDK automatically formats trajectories:

In [None]:
class MathSolver:
    """Solver that generates multiple solutions."""
    
    def __init__(self, base_url: str, model: str):
        self.client = get_chat_client_async(base_url=base_url, api_key="EMPTY")
        self.model = model
    
    @trajectory(name="solver")
    async def generate_solution(self, problem: str) -> str:
        """
        Generate a solution using @trajectory decorator.
        
        The decorator:
        - Creates a session internally
        - Tracks LLM calls automatically
        - Returns TrajectoryView with result field set to the return value
        """
        messages = [
            {
                "role": "user",
                "content": f"{problem}\n\nPlease solve this problem and put your final answer in <answer>...</answer> tags."
            }
        ]
        
        response = await self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            max_tokens=200,
        )
        
        response_text = response.choices[0].message.content
        
        # Parse and return the answer
        answer = self._parse_answer(response_text)
        return answer
    
    async def generate_solutions(self, problem: str, n_solutions: int = 2) -> list[TrajectoryView]:
        """Generate multiple solutions in parallel."""
        tasks = [
            asyncio.create_task(self.generate_solution(problem))
            for _ in range(n_solutions)
        ]
        # Returns list of TrajectoryView objects
        return await asyncio.gather(*tasks)
    
    def _parse_answer(self, response: str) -> str:
        """Extract answer from response."""
        answer_match = re.search(r"<answer>(.*?)</answer>", response, re.IGNORECASE | re.DOTALL)
        if answer_match:
            return answer_match.group(1).strip()
        else:
            # Fallback: extract any number
            return extract_answer(response)


class MathJudge:
    """Judge that selects the best solution."""
    
    def __init__(self, base_url: str, model: str):
        self.client = get_chat_client_async(base_url=base_url, api_key="EMPTY")
        self.model = model
    
    @trajectory(name="judge")
    async def judge_solutions(self, problem: str, solutions: list[str]) -> str:
        """
        Judge solutions using @trajectory decorator.
        
        Returns TrajectoryView with the selected solution in the result field.
        """
        prompt = self._create_judge_prompt(problem, solutions)
        
        messages = [{"role": "user", "content": prompt}]
        
        response = await self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            max_tokens=200,
        )
        
        response_text = response.choices[0].message.content
        
        # Parse and return the selected solution
        return self._parse_judge_response(response_text, solutions)
    
    def _parse_judge_response(self, response: str, solutions: list[str]) -> str:
        """Parse judge response to get selected solution."""
        answer_match = re.search(r"<answer>(.*?)</answer>", response, re.IGNORECASE | re.DOTALL)
        if answer_match:
            answer_text = answer_match.group(1).strip()
            try:
                solution_index = int(answer_text)
                if 1 <= solution_index <= len(solutions):
                    return solutions[solution_index - 1]
            except (ValueError, IndexError):
                pass
        # Fallback: return first solution
        return solutions[0] if solutions else ""
    
    def _create_judge_prompt(self, problem: str, solutions: list[str]) -> str:
        """Create a prompt for the judge to evaluate solutions."""
        prompt = f"""You are an expert math verifier. Given a problem and multiple solution attempts, select the best solution.

Problem:
{problem}

Solutions to evaluate:
"""
        for i, solution in enumerate(solutions, 1):
            prompt += f"\nSolution {i}: {solution}\n"
        
        prompt += """\nSelect the index of the best solution (1, 2, etc.) and output it in <answer>...</answer> tags.
If multiple solutions are correct, output the index of the first correct one."""
        return prompt


print_section("Solver-Judge Classes Defined")

### Execute Solver-Judge Workflow

In [None]:
print_section("Executing Solver-Judge Workflow")

# Create solver and judge
solver = MathSolver(base_url=proxy_url, model=MODEL)
judge = MathJudge(base_url=proxy_url, model=MODEL)

# Use first problem
problem = MATH_PROBLEMS[0]
colorful_print(f"\nProblem: {problem['question']}", fg="yellow", bold=True)
colorful_print(f"Ground Truth: {problem['ground_truth']}", fg="green")

# Generate 2 solutions
colorful_print("\nGenerating 2 solutions...", fg="cyan")
solver_trajs = await solver.generate_solutions(problem["question"], n_solutions=2)

solutions = []
for i, traj in enumerate(solver_trajs, 1):
    solution = traj.result
    solutions.append(solution)
    reward = simple_math_reward(solution, problem["ground_truth"])
    traj.steps[0].reward = reward
    
    colorful_print(f"\nSolution {i}:", fg="blue", bold=True)
    colorful_print(f"  Answer: {solution}", fg="white")
    reward_color = "green" if reward == 1.0 else "red"
    colorful_print(f"  Reward: {reward}", fg=reward_color)
    colorful_print(f"  Trajectory Name: {traj.name}", fg="blue")
    colorful_print(f"  Steps: {len(traj.steps)}", fg="blue")

# Judge solutions
colorful_print("\nJudging solutions...", fg="cyan")
judge_traj = await judge.judge_solutions(problem["question"], solutions)

selected_solution = judge_traj.result
judge_reward = simple_math_reward(selected_solution, problem["ground_truth"])
judge_traj.steps[0].reward = judge_reward
judge_traj.reward = judge_reward

colorful_print("\nJudge Decision:", fg="magenta", bold=True)
colorful_print(f"  Selected: {selected_solution}", fg="white")
reward_color = "green" if judge_reward == 1.0 else "red"
colorful_print(f"  Reward: {judge_reward}", fg=reward_color, bold=True)
colorful_print(f"  Trajectory Name: {judge_traj.name}", fg="blue")

colorful_print("\n✓ Solver-Judge workflow completed!", fg="green", bold=True)
colorful_print(f"Total trajectories: {len(solver_trajs) + 1} (2 solver + 1 judge)", fg="cyan")

### Visualize Solver-Judge Trajectories

Let's create a custom visualization for the solver-judge workflow:

In [None]:
def print_trajectory_view(traj: TrajectoryView, problem: dict):
    """
    Print a TrajectoryView (from @trajectory decorator).
    """
    colorful_print("\n" + "=" * 70, fg="cyan", bold=True)
    colorful_print(f"TRAJECTORY: {traj.name}", fg="cyan", bold=True)
    colorful_print("=" * 70, fg="cyan", bold=True)
    
    # Trajectory info
    colorful_print(f"Name: {traj.name}", fg="blue")
    colorful_print(f"Steps: {len(traj.steps)}", fg="blue")
    
    if traj.reward is not None:
        reward_color = "green" if traj.reward == 1.0 else "red"
        colorful_print(f"Trajectory Reward: {traj.reward}", fg=reward_color, bold=True)
    
    # Print each step
    for i, step in enumerate(traj.steps, 1):
        colorful_print(f"\n  Step {i}:", fg="yellow", bold=True)
        colorful_print(f"    Trace ID: {step.id}", fg="blue")
        
        if step.reward is not None:
            reward_color = "green" if step.reward == 1.0 else "red"
            colorful_print(f"    Step Reward: {step.reward}", fg=reward_color)
    
    # Result
    colorful_print(f"\nResult: {traj.result}", fg="white", bold=True)
    colorful_print(f"Ground Truth: {problem['ground_truth']}", fg="green")
    
    colorful_print("=" * 70, fg="cyan", bold=True)


print_section("Visualizing Solver-Judge Trajectories")

# Print solver trajectories
for i, traj in enumerate(solver_trajs, 1):
    colorful_print(f"\n--- Solver Trajectory {i} ---", fg="blue", bold=True)
    print_trajectory_view(traj, problem)

# Print judge trajectory
colorful_print("\n--- Judge Trajectory ---", fg="magenta", bold=True)
print_trajectory_view(judge_traj, problem)

## 7. Training with RLLM

Now that we've collected trajectories, let's demonstrate how to use them for training.

### 7.1 SFT Training (Supervised Fine-Tuning)

For SFT, we use successful trajectories to fine-tune the model:

In [None]:
print_section("SFT Training Preparation")

colorful_print("""
For SFT training, you would typically:
1. Collect many trajectories (100s to 1000s)
2. Filter for successful ones (reward > threshold)
3. Format them as training data
4. Use AgentSFTTrainer to fine-tune

Since we only have a few trajectories, we'll skip actual SFT training here.
For a real example, see: examples/sft/train_math_sft.py

Key steps:
""", fg="yellow")

colorful_print("""
from rllm.trainer.agent_sft_trainer import AgentSFTTrainer

# Process trajectories (filter by reward, format as messages)
sft_data = AgentSFTTrainer.process_trajectories(
    trajectories=collected_trajectories,
    reward_threshold=1.0,
    filter_tool_calls=False,
)

# Save to file
import pandas as pd
pd.DataFrame(sft_data).to_parquet("sft_data.parquet", index=False)

# Create config and train
@hydra.main(config_path="pkg://rllm.trainer.config", config_name="sft_trainer")
def main(config):
    trainer = AgentSFTTrainer(config=config)
    trainer.train()
""", fg="green")

colorful_print("\n✓ SFT training overview complete", fg="green", bold=True)

### 7.2 RL Training (Reinforcement Learning with PPO)

For RL training, we use the rollout function directly:

In [None]:
print_section("RL Training with AgentTrainer")

colorful_print("""
Now let's actually run a minimal RL training demonstration!

We'll use:
- Our simple math dataset
- A rollout function that uses the proxy
- Minimal training steps for demo purposes
""", fg="yellow")

# Create datasets from our math problems
from rllm.data.dataset import Dataset

train_data = MATH_PROBLEMS * 2  # Repeat for more samples
train_dataset = Dataset(data=train_data, name="tutorial_math", split="train")

colorful_print(f"\n✓ Created training dataset with {len(train_dataset)} samples", fg="green")

# Define rollout function for training
def rollout_for_training(**kwargs):
    """
    Rollout function for RL training.
    
    This function is called by the trainer for each sample.
    It must recreate the client inside to avoid serialization issues with Ray.
    """
    import asyncio
    from rllm.sdk import get_chat_client_async, session
    
    question = kwargs["question"]
    ground_truth = kwargs["ground_truth"]
    
    # Create client inside function (required for Ray serialization)
    client = get_chat_client_async(base_url=proxy_url, api_key="EMPTY")
    
    async def run():
        with session(task="training") as sess:
            response = await client.chat.completions.create(
                model=MODEL,
                messages=[{"role": "user", "content": question}],
                max_tokens=100,
            )
            response_text = response.choices[0].message.content
            reward = simple_math_reward(response_text, ground_truth)
            return reward
    
    # Run async function
    return asyncio.run(run())

colorful_print("✓ Rollout function defined", fg="green")

# Create minimal config for demonstration
from omegaconf import OmegaConf

minimal_config = OmegaConf.create({
    "trainer": {
        "total_epochs": 1,  # Just 1 epoch for demo
        "test_freq": 1,
    },
    "data": {
        "train_batch_size": 2,
        "val_batch_size": 2,
        "max_prompt_length": 512,
        "max_response_length": 512,
    },
    "actor_rollout_ref": {
        "rollout_batch_size": 2,
        "n_samples_per_prompt": 1,
    },
    "algorithm": {
        "gamma": 0.99,
    },
})

colorful_print("✓ Training config created", fg="green")

colorful_print("""
\nNOTE: For this tutorial, we're using a minimal config with:
- 1 epoch
- Small batch sizes
- Few samples

In production, you would use larger values and proper hyperparameters.
""", fg="yellow")

colorful_print("""
To run actual training, you would do:

from rllm.trainer.agent_trainer import AgentTrainer

trainer = AgentTrainer(
    config=minimal_config,
    train_dataset=train_dataset,
    val_dataset=train_dataset,  # Using train as val for demo
    agent_run_func=rollout_for_training,
    backend="verl",  # or "tinker"
)

# This would start the actual training
# trainer.train()

Note: Actual training requires proper GPU setup and can take time.
For this tutorial, we've defined everything but skipped the .train() call.
""", fg="green")

colorful_print("\n✓ RL training setup complete", fg="green", bold=True)

### 7.3 Training with Solver-Judge Workflow

You can also train with more complex workflows:

In [None]:
print_section("Workflow Training with Solver-Judge")

colorful_print("""
You can also train with more complex workflows like Solver-Judge.

Let's set up a workflow-based training:
""", fg="yellow")

# Define async workflow runner
async def run_solver_judge_workflow(**kwargs) -> float:
    """
    Async workflow runner for Solver-Judge training.
    
    This combines multiple LLM calls into a single workflow.
    """
    from rllm.sdk import get_chat_client_async
    
    question = kwargs["question"]
    ground_truth = kwargs["ground_truth"]
    
    # Create solver and judge (must be inside function for Ray)
    solver = MathSolver(base_url=proxy_url, model=MODEL)
    judge = MathJudge(base_url=proxy_url, model=MODEL)
    
    # Generate solutions
    solver_trajs = await solver.generate_solutions(question, n_solutions=2)
    solutions = [traj.result for traj in solver_trajs]
    
    # Judge solutions
    judge_traj = await judge.judge_solutions(question, solutions)
    selected = judge_traj.result
    
    # Calculate reward
    reward = simple_math_reward(selected, ground_truth)
    
    return reward

colorful_print("✓ Solver-Judge workflow function defined", fg="green")

colorful_print("""
To run workflow training:

from rllm.trainer.agent_trainer import AgentTrainer

trainer = AgentTrainer(
    config=minimal_config,
    train_dataset=train_dataset,
    val_dataset=train_dataset,
    agent_run_func=run_solver_judge_workflow,  # Use workflow function
    backend="verl",
)

# This would start training with the Solver-Judge workflow
# trainer.train()

Benefits of workflow training:
- Multiple LLM calls per sample (solver + judge)
- Automatic trace collection for all steps
- Richer learning signal from multi-step reasoning
- Can optimize entire workflow end-to-end
""", fg="green")

colorful_print("\n✓ Workflow training setup complete", fg="green", bold=True)

colorful_print("""
\n================================================
TRAINING SUMMARY
================================================

We've demonstrated 3 training approaches:

1. SFT Training (Supervised Fine-Tuning)
   - Use successful trajectories to fine-tune
   - Good for bootstrapping from demonstrations
   - See: examples/sft/train_math_sft.py

2. RL Training (Reinforcement Learning)
   - Use AgentTrainer with simple rollout function
   - Learns from reward feedback
   - Optimizes policy with PPO

3. Workflow Training
   - Multi-step LLM workflows (e.g., Solver-Judge)
   - More complex reasoning patterns
   - End-to-end optimization

All approaches integrate seamlessly with:
- ProxyManager for inference
- SQLite trace storage
- Automatic session tracking
- Reward-based learning

""", fg="cyan", bold=True)

## 8. Cleanup

In [None]:
print_section("Cleanup")

# Shutdown proxy
proxy_manager.shutdown_proxy()
colorful_print("✓ Proxy shutdown complete", fg="green")

colorful_print("\n" + "=" * 70, fg="cyan", bold=True)
colorful_print("TUTORIAL COMPLETE!", fg="green", bold=True)
colorful_print("=" * 70, fg="cyan", bold=True)

colorful_print("""
Summary:
✓ Started LiteLLM proxy with OpenAI
✓ Executed simple rollouts
✓ Retrieved traces from SQLite
✓ Visualized trajectories with color
✓ Demonstrated Solver-Judge pattern with @trajectory decorator
✓ Showed how to prepare for SFT training
✓ Showed how to prepare for RL training

Next Steps:
- Collect more trajectories (100s to 1000s)
- Set up training environment with GPUs
- Configure Hydra configs for your use case
- Run actual training with AgentSFTTrainer or AgentTrainer
- Monitor training with wandb/tensorboard
""", fg="yellow")