# Ablation Studies

This notebook details the design and execution of ablation studies performed on a multi-agent system designed to solve Advent of Code puzzles. The system comprises several specialized AI agents, each responsible for a different stage of the problem-solving process: preprocessing, retrieval, planning, coding, and debugging.

Ablation studies are a technique used to understand the contribution of individual components to the overall performance of a system. In this context, we will systematically disable specific agents within the system to observe how their absence impacts the system's ability to solve Advent of Code puzzles.

The primary goal of these studies is to gain insights into the importance and effectiveness of each agent in the pipeline. By isolating the coding agent and disabling others in various combinations, we aim to identify which agents are most critical for successful problem-solving and how the interaction between agents influences the outcome.

Through these experiments, we hope to better understand the system's architecture, identify potential areas for improvement, and inform future development efforts.

**Agents in the System:**

*   **Preprocessing Agent:** Handles initial processing and understanding of the puzzle description.
*   **Retrieval Agent:** Responsible for retrieving relevant information, potentially from a knowledge base or past solutions.
*   **Planning Agent:** Develops strategies and plans for solving the puzzle.
*   **Coding Agent:** Generates the actual code solution based on the plan.
*   **Debugging Agent:** Identifies and fixes errors in the generated code.

In these ablation studies, the **Coding Agent will always remain enabled**, as it is essential for producing a code solution. The other agents will be selectively disabled to assess their impact.


## Setup

- Setup the model to use
- Load all the puzzles

In [5]:
import os
import sys
import dotenv
import json
from loguru import logger

# Append the models path in order to import the models
PROJECT_ROOT = os.path.join(os.getcwd(), 'src/')
print(PROJECT_ROOT)
sys.path.append(PROJECT_ROOT)

# Load env variables
dotenv.load_dotenv()

# Set log level INFO
logger.remove()
logger.add(sys.stderr, level="INFO")

/home/twanh/workspace/thesis/thesis-advent-of-agents/src/


1

### Load puzzles and input/outputs

In [3]:
# Get the correct paths
test_data_folder = os.path.join(PROJECT_ROOT, '..', 'experiments', 'test_data')
puzzles_folder = os.path.join(test_data_folder, 'puzzles/')
input_output_file = os.path.join(test_data_folder, 'answers2024.json')
puzzle_files = [os.path.join(puzzles_folder, f) for f in os.listdir(puzzles_folder) if os.path.isfile(os.path.join(puzzles_folder, f))]

In [6]:
# Create a datastructure were we can get by day
json_data = {}
with open(input_output_file, 'r') as f:
    json_data = {item['day']: item for item in json.load(f)}

puzzle_data = []
for file_path in puzzle_files:
    # Get the day of the puzzle file
    file_name = os.path.basename(file_path)
    day_str = file_name.split('_')[-1].split('.')[0]
    day = int(day_str)

    if day in json_data:
        with open(file_path, 'r') as f:
            puzzle_description = f.read()

        puzzle_info = {
            "year": json_data[day]['year'],
            "day": day,
            "description": puzzle_description,
            "input": json_data[day]['input'],
            "expected_output": json_data[day]['part1']
        }

        puzzle_data.append(puzzle_info)

# Sort by day
puzzle_data.sort(key=lambda x: x['day'])
print(len(puzzle_data)) # should be 25

25


### Setup Ablation Configs

In [8]:
ablation_configs = [
    {
        "name": "baseline",
        "preprocess": True,
        "retrieval": True,
        "planning": True,
        "coding": True,
        "debugging": True
    },
    {
        "name": "disable_preprocessing",
        "preprocess": False,
        "retrieval": True,
        "planning": True,
        "coding": True,
        "debugging": True
    },
    {
        "name": "disable_retrieval",
        "preprocess": True,
        "retrieval": False,
        "planning": True,
        "coding": True,
        "debugging": True
    },
    {
        "name": "disable_planning",
        "preprocess": True,
        "retrieval": True,
        "planning": False,
        "coding": True,
        "debugging": True
    },
    {
        "name": "disable_debugging",
        "preprocess": True,
        "retrieval": True,
        "planning": True,
        "coding": True,
        "debugging": False
    }
]

## System

In [10]:
from agents.base_agent import BaseAgent
from agents.coding_agent import CodingAgent
from agents.debugging_agent import DebuggingAgent
from agents.planning_agent import PlanningAgent
from agents.pre_processing_agent import PreProcessingAgent
from agents.retreival_agent import RetrievalAgent
from core.orchestrator import Orchestrator
from utils.util_types import AgentSettings
from core.state import MainState
from utils.util_types import Puzzle
from models.base_model import BaseLanguageModel

In [13]:

def setup_system(config: dict[str, bool], model: BaseLanguageModel, expected_output: str, puzzle_input: str):

    agents = (
        (
            PreProcessingAgent(
                'preprocess', model=model,
            ),
            AgentSettings(enabled=config['preprocess'], can_debug=False),
        ),
        (
            RetrievalAgent(
                'retreival',
                model=model,
                connection_string=os.getenv('DB_CONNECTION_STRING') or '',
                openai_key=os.getenv('OPENAI_API_KEY') or '',
                weights=None, # Use default weights
            ),
            AgentSettings(enabled=config['retrieval'], can_debug=False),
        ),
        (
            PlanningAgent(
                'planning',
                model=model,
                n_plans=3, # Keep n_plans fixed for consistent comparison
            ),
            AgentSettings(enabled=config['planning'], can_debug=False),
        ),
        (
            CodingAgent('coding', model=model),
            AgentSettings(enabled=config['coding'], can_debug=False),
        ),
        (
            DebuggingAgent(
                'debugging',
                model=model,
                expected_output=expected_output,
                puzzle_input=puzzle_input,
            ),
            AgentSettings(enabled=config['debugging'], can_debug=True),
        ),
    )

    orchestrator = Orchestrator(agents, {})
    return orchestrator


In [None]:
import time
from utils.util_types import TestCase

def run_and_test_system(
    day: int,
    puzzle_desc: str,
    puzzle_input: str,
    expected_output: str,
    config: dict[str, bool],
    model: BaseLanguageModel
) -> dict[str, str|int|None]:

    orchestrator = setup_system(
        config,
        model,
        expected_output=expected_output,
        puzzle_input=puzzle_input
    )


    puzzle = Puzzle(
        description=puzzle_desc,
        solution=None,
        year=2024,
        day=day,
    )

    state = MainState(puzzle=puzzle)

    try:
        start_time = time.time()
        ret_state = orchestrator.solve_puzzle(state)
        end_time = time.time()

        if not config['debugging']:
            dba = DebuggingAgent(
                'debugging',
                model=model,
                expected_output=expected_output,
                puzzle_input=puzzle_input,
            )
            run_result = dba._run_test(
                ret_state.generated_code or '',
                TestCase(
                    input_=puzzle_input,
                    expected_output=expected_output,
                ),
            )
            if run_result.success:
                ret_state.is_solved = True
                ret_state.final_code = ret_state.generated_code

        return {
            'success': ret_state.is_solved,
            'day': day,
            'name': config['name'],
            'code': ret_state.final_code,
            'debug_attempts': ret_state.debug_attempts,
            'debug_suggestions': ret_state.debug_suggestions,
            'n_retreived_puzzles': len(ret_state.retreived_puzzles),
            'keywords': ','.join(ret_state.keywords),
            'concepts': ','.join(ret_state.underlying_concepts),
            'time': end_time - start_time
        }

    except Exception as e:
        print(f"Runtime error during puzzle solving for Day {day}: {e}")
        return {
            'success': False,
            'day': day,
            'name': config['name'],
            'code': None,
            'debug_attempts': None,
            'debug_suggestions': None,
            'n_retreived_puzzles': None,
            'keywords': None,
            'concepts': None,
            'time': None,
        }

In [9]:
import datetime
import pandas as pd

SAVE_DIR = os.path.join(PROJECT_ROOT, '../', 'experiments/', 'results/', 'ablation_studies/')

def run_ablation(config, model, puzzle_data):

    all_results = []
    for puzzle in puzzle_data:
        
        puzzle_day = puzzle['day']
        puzzle_description = puzzle['description']
        input_ = puzzle['input']
        expected_ouptut = puzzle['expected_output']
        print(f"Running day {puzzle_day}")

        results = run_and_test_system(
            puzzle_day,
            puzzle_description,
            input_,
            expected_output,
            config,
            model
        )

        all_results.append(results)

    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f'abl-{config["name"]}-{timestamp}.csv'
    filepath =os.path.join(SAVE_DIR, filename)

    results_df = pd.DataFrame(all_results)
    print(f"Saving results for {config['name']} to {filepath}")
    results_df.to_csv(filepath, index=False)
    
    return results_df

In [11]:
from models.gemini_model import GeminiLanguageModel
# Create language model to use
model = GeminiLanguageModel(
    api_key=os.getenv("GEMINI_API_KEY"),
    model_name='gemini-2.0-flash'
)

In [None]:
results_list = []

for config in ablation_configs:

    print(f'----- RUNNING CONFIG {config["name"]} -----')
    print(config)
    result = run_ablation(config, model, puzzle_data)
    results_list.append(result)

timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f'abl-full-{timestamp}.csv'
filepath = os.path.join(SAVE_DIR, filename)
results = pd.concat(results_list)
results.to_csv(filepath, index=False)

SyntaxError: invalid syntax. Perhaps you forgot a comma? (1053638273.py, line 5)