# Reinforcement Learning: Bipedal Walker Training

## Project 2 - RL Agent Training with Stable Baselines3

This notebook implements reinforcement learning algorithms (**SAC, DDPG, PPO, TD3**) for:
- **BipedalWalker-v3** - A bipedal walking robot in OpenAI Gym

**Requirements fulfilled:**
- ‚úÖ Unfolded algorithm implementations with pseudocode
- ‚úÖ Training visualizations (learning curves, comparisons)
- ‚úÖ Graphical results (bar charts, agent behavior montage)
- ‚úÖ Multiple RL algorithms comparison

**Setup:** Make sure you're using the `rl_env` conda environment:
```bash
conda activate rl_env
```


## 1. Installation


In [3]:
# Install required packages (run this cell once)
%pip install gymnasium[box2d] stable-baselines3[extra] tensorboard matplotlib seaborn pandas numpy swig --quiet
%pip install torch --extra-index-url https://download.pytorch.org/whl/cu124 --quiet
print("‚úÖ All packages installed!")


  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m√ó[0m [32mBuilding wheel for box2d-py [0m[1;32m([0m[32mpyproject.toml[0m[1;32m)[0m did not run successfully.
  [31m‚îÇ[0m exit code: [1;36m1[0m
  [31m‚ï∞‚îÄ>[0m [31m[29 lines of output][0m
  [31m   [0m Using setuptools (version 80.9.0).
  [31m   [0m !!
  [31m   [0m 
  [31m   [0m         ********************************************************************************
  [31m   [0m         Please consider removing the following classifiers in favor of a SPDX license expression:
  [31m   [0m 
  [31m   [0m         License :: OSI Approved :: zlib/libpng License
  [31m   [0m 
  [31m   [0m         See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
  [31m   [0m         ********************************************************************************
  [31m   [0m 
  [31m   [0m !!
  [31m   [0m   self._finalize_license_expression()
  [31m   [0m 

## 2. Import Libraries


In [4]:
# Import libraries
import gymnasium as gym
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from datetime import datetime
import os
import torch
import warnings
warnings.filterwarnings('ignore')

# Stable Baselines3
from stable_baselines3 import SAC, PPO, TD3, DDPG
from stable_baselines3.common.callbacks import BaseCallback
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor

# Set style for plots
plt.style.use('dark_background')
sns.set_palette("husl")

# Check GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"‚úì Libraries imported successfully!")
print(f"üñ•Ô∏è  Using device: {device}")
if torch.cuda.is_available():
    for i in range(torch.cuda.device_count()):
        print(f"   GPU {i}: {torch.cuda.get_device_name(i)}")


ModuleNotFoundError: No module named 'seaborn'

## 3. Environment Setup & Visualization


In [None]:
# Create BipedalWalker environment
env_id = "BipedalWalker-v3"
env = gym.make(env_id)

print(f"üéÆ Environment: {env_id}")
print(f"\nüìä Observation Space:")
print(f"   Shape: {env.observation_space.shape}")
print(f"   Low: {env.observation_space.low[:5]}...")
print(f"   High: {env.observation_space.high[:5]}...")

print(f"\nüïπÔ∏è  Action Space:")
print(f"   Shape: {env.action_space.shape}")
print(f"   Low: {env.action_space.low}")
print(f"   High: {env.action_space.high}")

env.close()


In [None]:
# Visualize the environment
env = gym.make(env_id, render_mode="rgb_array")
obs, info = env.reset()

fig, ax = plt.subplots(figsize=(12, 6))
ax.imshow(env.render())
ax.set_title("BipedalWalker-v3 Environment", fontsize=16, fontweight='bold', color='#00ffaa')
ax.axis('off')
plt.tight_layout()
plt.savefig('environment_preview.png', dpi=150, bbox_inches='tight', facecolor='#1a1a2e')
plt.show()

env.close()
print("‚úì Environment preview saved!")
