# RL - NEAT Implementation

This notebook implements a Neuroevolution of Augmenting Topologies (NEAT) approach for a multiagent reinforcement learning task with scouts and guards in a gridworld environment.

> [ ! NOTE ] 
> * Make sure to select the 2nd `env` kernel which runs on python 3.12

In [None]:
# Install required libraries
# !pip install numpy matplotlib neat-python graphviz pickle5

In [1]:
# Import libraries
import os
import random
import numpy as np
import matplotlib.pyplot as plt
import pickle
import neat
import copy
from til_environment import gridworld # from TIL

# Import our modules
from feature_extraction import process_observation, get_action_from_network
from neat_config import get_config, create_config
from neat_network import NeatNetwork, create_network_from_genome, load_genome
from neat_training import NeatTrainer
from neat_visualization import plot_fitness_history, plot_species, plot_network, visualize_agent_behavior
from neat_utils import save_genome, evaluate_network, setup_directories

## 1. Setup and Configuration

First, let's set up our directory structure and create a NEAT configuration file.

In [2]:
# Create directories
setup_directories()

# Create NEAT configuration
config_path = create_config("neat_config.txt")
config = get_config(config_path)

print(f"Configuration file created at: {config_path}")

Configuration file created at: /home/jupyter/tiza-til-ai-2025/rl/zane/Neat/neat_config.txt


## 2. Feature Extraction Demonstration

Let's demonstrate how our feature extraction works with a sample observation.

In [3]:
# Create a sample environment
env = gridworld.env(
    env_wrappers=[],
    render_mode=None,
    debug=True,
    novice=False
)

# Reset the environment
env.reset(seed=42)

# Get a sample observation
observation, _, _, _, _ = env.last()

# Process the observation
features = process_observation(observation)

print(f"Observation shape: {observation['viewcone'].shape}")
print(f"Features length: {len(features)}")
print(f"First 10 features: {features[:10]}")

Observation shape: (7, 5)
Features length: 324
First 10 features: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


## 3. Train Scout Agent

Now let's train our scout agent using NEAT.

In [4]:
if False:
    # Create trainer
    trainer = NeatTrainer(config, num_generations=50, num_trials=5, max_steps=200)

    # Train scout
    print("Training scout agent...")
    best_scout = trainer.train_scout(checkpoint_prefix="checkpoints/scout_checkpoint")

    # Save best scout genome
    save_genome(best_scout, "models/best_scout_genome.pkl")
    print(f"Scout training completed. Best fitness: {trainer.best_scout_fitness}")

Training scout agent...

 ****** Running generation 0 ****** 

Population's average fitness: 1.58933 stdev: 1.52312
Best fitness: 6.80000 - size: (15, 2455) - species 69 - id 69
Average adjusted fitness: 0.489
Mean genetic distance 3.262, standard deviation 0.253
Population of 300 members in 150 species:
   ID   age  size  fitness  adj fit  stag
     1    0     2      1.0    0.431     0
     2    0     2      0.4    0.373     0
     3    0     2      3.4    0.667     0
     4    0     2      4.6    0.784     0
     5    0     2      3.2    0.647     0
     6    0     2      3.4    0.667     0
     7    0     2      2.4    0.569     0
     8    0     2      0.4    0.373     0
     9    0     2      3.8    0.706     0
    10    0     2      1.2    0.451     0
    11    0     2      5.4    0.863     0
    12    0     2      3.4    0.667     0
    13    0     2      0.2    0.353     0
    14    0     2      1.8    0.510     0
    15    0     2      2.0    0.529     0
    16    0     2     

## 4. Train Guard Agent

Next, let's train our guard agent using NEAT.

In [4]:
# Create trainer
trainer = NeatTrainer(config, num_generations=50, num_trials=5, max_steps=200)

# Train guard
print("Training guard agent...")
best_guard = trainer.train_guard(checkpoint_prefix="checkpoints/guard_checkpoint")

# Save best guard genome
save_genome(best_guard, "models/best_guard_genome.pkl")
print(f"Guard training completed. Best fitness: {trainer.best_guard_fitness}")

Training guard agent...

 ****** Running generation 0 ****** 

Population's average fitness: 0.49787 stdev: 0.15084
Best fitness: 1.04000 - size: (15, 2455) - species 21 - id 21
Average adjusted fitness: 0.198
Mean genetic distance 3.263, standard deviation 0.254
Population of 300 members in 150 species:
   ID   age  size  fitness  adj fit  stag
     1    0     2      0.3    0.040     0
     2    0     2      0.3    0.040     0
     3    0     2      0.5    0.180     0
     4    0     2      0.6    0.340     0
     5    0     2      0.6    0.320     0
     6    0     2      0.5    0.220     0
     7    0     2      0.5    0.240     0
     8    0     2      0.3    0.020     0
     9    0     2      0.7    0.400     0
    10    0     2      0.4    0.080     0
    11    0     2      0.6    0.280     0
    12    0     2      0.5    0.220     0
    13    0     2      0.5    0.220     0
    14    0     2      0.4    0.060     0
    15    0     2      0.4    0.120     0
    16    0     2     

## 5. Competitive Training

Now let's train the scout and guard agents competitively against each other.

In [None]:
# Load checkpoints from previous training
scout_checkpoint = "scout_checkpoints/comp_checkpoint67"
guard_checkpoint = "guard_checkpoints/comp_checkpoint66"

# Create new trainer for competitive training
comp_trainer = NeatTrainer(config, num_generations=50, num_trials=15, max_steps=100)

# Train competitively
print("Training scout and guard agents competitively...")
best_scout, best_guard = comp_trainer.train_competitive_ultimate(
    scout_checkpoint=scout_checkpoint,
    guard_checkpoint=guard_checkpoint,
    checkpoint_prefix="checkpoints/comp_checkpoint"
)

# Save best genomes
save_genome(best_, "models/best_competitive_scout_genome.pkl")
save_genome(best_guard, "models/best_competitive_guard_genome.pkl")

print(f"Competitive training completed.")
print(f"Best scout fitness: {comp_trainer.best_scout_fitness}")
print(f"Best guard fitness: {comp_trainer.best_guard_fitness}")

Training scout and guard agents competitively...
Loading scout checkpoint from scout_checkpoints/comp_checkpoint67
✅ Loading best guard genome from file
✅ Loaded best guard genome with fitness -10.666666666666666
🔄 Creating fresh guard population from scratch
✅ Injecting best guard genome into new population

Evaluating 112 scout genomes...
📈 New best scout: 62.13 (previous: -inf)
Evaluating 150 guard genomes...
📊 Fitness improvement: -inf -> 51.47
Evolving scout population...

 ****** Running generation 67 ****** 

Population's average fitness: 54.65238 stdev: 2.77498
Best fitness: 62.13333 - size: (17, 1142) - species 53 - id 64
Average adjusted fitness: 0.477
Mean genetic distance 3.202, standard deviation 0.520
Population of 105 members in 29 species:
   ID   age  size  fitness  adj fit  stag
     2   67     2     56.3    0.427     3
    13   67     1     54.5    0.440     9
    19   67     6     58.5    0.523     7
    23   67     2     56.5    0.471     5
    24   67     8     60

## 6. Visualization and Analysis

Let's visualize our training progress and agent behaviors.

In [None]:
# Plot fitness history
plot_fitness_history(comp_trainer.fitness_history, filename="visualizations/fitness_history.png")

# Display the fitness plot
plt.figure(figsize=(10, 6))
plt.plot(comp_trainer.fitness_history['scout'], label='Scout')
plt.plot(comp_trainer.fitness_history['guard'], label='Guard')
plt.title("Fitness History")
plt.xlabel("Generation")
plt.ylabel("Fitness")
plt.legend()
plt.grid(True)
plt.show()

In [None]:
# Visualize networks
print("Visualizing scout network...")
plot_network(best_scout, config, filename="visualizations/scout_network")

print("Visualizing guard network...")
plot_network(best_guard, config, filename="visualizations/guard_network")

# Display scout network
from IPython.display import Image
Image(filename="visualizations/scout_network.png")

In [None]:
# Display guard network
Image(filename="visualizations/guard_network.png")

In [None]:
# Create environment for visualization
vis_env = gridworld.env(
    env_wrappers=[],
    render_mode="rgb_array",
    debug=True,
    novice=False
)

# Visualize scout behavior
print("Visualizing scout behavior...")
visualize_agent_behavior(
    best_scout, 
    config, 
    vis_env, 
    num_episodes=3, 
    is_scout=True,
    filename_prefix="visualizations/scout_behavior"
)

# Display scout behavior
Image(filename="visualizations/scout_behavior_episode_1.png")

In [None]:
# Visualize guard behavior
print("Visualizing guard behavior...")
visualize_agent_behavior(
    best_guard, 
    config, 
    vis_env, 
    num_episodes=3, 
    is_scout=False,
    filename_prefix="visualizations/guard_behavior"
)

# Display guard behavior
Image(filename="visualizations/guard_behavior_episode_1.png")

## 7. Evaluation

Let's evaluate our trained agents.

In [None]:
# Create environment for evaluation
eval_env = gridworld.env(
    env_wrappers=[],
    render_mode=None,
    debug=True,
    novice=False
)

# Create networks from best genomes
scout_net = create_network_from_genome(best_scout, config)
guard_net = create_network_from_genome(best_guard, config)

# Evaluate scout
scout_reward = evaluate_network(scout_net, eval_env, num_episodes=10, is_scout=True)
print(f"Scout evaluation reward: {scout_reward}")

# Evaluate guard
guard_reward = evaluate_network(guard_net, eval_env, num_episodes=10, is_scout=False)
print(f"Guard evaluation reward: {guard_reward}")

# Calculate normalized score
score = (scout_reward + guard_reward) / 2 / 100
print(f"Normalized score: {score}")

## 8. Prepare Competition Submission

Finally, let's prepare the files for competition submission.

In [None]:
# Create submission directory
if not os.path.exists('zane'):
    os.makedirs('zane')

# Copy best genomes to submission directory
import shutil
shutil.copyfile('models/best_competitive_scout_genome.pkl', 'zane/best_scout_genome.pkl')
shutil.copyfile('models/best_competitive_guard_genome.pkl', 'zane/best_guard_genome.pkl')

# Copy rl_utils_Z.py to submission directory
shutil.copyfile('rl_utils_Z.py', 'zane/rl_utils_Z.py')

print("Competition files prepared in 'zane' directory.")

In [None]:
# Test the RL Manager
from rl_manager import RLManager

# Create test environment
test_env = gridworld.env(
    env_wrappers=[],
    render_mode=None,
    debug=True,
    novice=False
)
test_env.reset(seed=42)

# Get sample observation
observation, _, _, _, _ = test_env.last()

# Create RL manager
manager = RLManager()

# Test action selection
action = manager.rl(observation)
print(f"Selected action: {action}")

## 9. Conclusion

We've successfully implemented a NEAT approach for training scout and guard agents in a gridworld environment. Our approach includes:

1. **Feature extraction** from the complex observation space
2. **NEAT configuration** optimized for this specific problem
3. **Training methods** for both single-agent and competitive scenarios
4. **Visualization tools** to understand agent behaviors
5. **Evaluation metrics** to assess performance
6. **Competition submission files** ready for deployment

The NEAT approach allows for evolving network topologies that can adapt to the complex dynamics of the scout vs guards scenario. The competitive training approach helps develop robust strategies for both agent types.

Key advantages of our implementation:
- **Evolving topology**: Networks can grow in complexity as needed
- **Competitive training**: Agents improve by competing against each other
- **Comprehensive visualization**: Tools to understand agent behaviors
- **Modular design**: Easy to modify and extend

For further improvements, we could:
1. Implement more sophisticated feature extraction
2. Use multi-objective optimization for more balanced behaviors
3. Incorporate memory (recurrent connections) for better temporal reasoning
4. Add curriculum learning for more efficient training