![MuJoCo banner](https://raw.githubusercontent.com/google-deepmind/mujoco/main/banner.png)

# <h1><center>PPO Training for Piper Robot <a href="https://colab.research.google.com/github/wzzzzq/MuJoCo_Visual_PPO/blob/main/ppo_training_colab.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" width="140" align="center"/></a></center></h1>

This notebook provides a complete training pipeline for PPO (Proximal Policy Optimization) on a Piper robot arm grasping task using MuJoCo physics simulation. The training uses visual observations (RGB cameras) combined with state information.

## ⚙️ Requirements

**Important:** Make sure you're using a **GPU runtime** in Google Colab:
1. Go to `Runtime` → `Change runtime type`
2. Select `GPU` as the hardware accelerator
3. Choose `T4`, `V100`, or `A100` if available

## 🎯 What This Notebook Does

1. **Environment Setup**: Installs MuJoCo with proper GPU rendering support
2. **Repository Setup**: Clones the training code and verifies all assets
3. **Training**: Runs PPO training with visual observations
4. **Evaluation**: Tests the trained policy and visualizes results
5. **Model Management**: Saves trained models to Google Drive

## 📊 Expected Results

- Training time: ~30-60 minutes (depending on GPU)
- The robot learns to grasp and manipulate objects using camera inputs
- Final models are automatically saved to your Google Drive

Let's get started! 🚀

# PPO Training for Piper Robot in Google Colab

This notebook sets up the environment and trains a PPO agent for the Piper robot arm grasping task using MuJoCo simulation.

In [None]:
# Set up GPU rendering (following official MuJoCo tutorial pattern)
from google.colab import files
import distutils.util
import os
import subprocess

# Check GPU availability
if subprocess.run('nvidia-smi').returncode:
  raise RuntimeError(
      'Cannot communicate with GPU. '
      'Make sure you are using a GPU Colab runtime. '
      'Go to the Runtime menu and select Choose runtime type.')

# Add an ICD config so that glvnd can pick up the Nvidia EGL driver.
# This is usually installed as part of an Nvidia driver package, but the Colab
# kernel doesn't install its driver via APT, and as a result the ICD is missing.
# (https://github.com/NVIDIA/libglvnd/blob/master/src/EGL/icd_enumeration.md)
NVIDIA_ICD_CONFIG_PATH = '/usr/share/glvnd/egl_vendor.d/10_nvidia.json'
if not os.path.exists(NVIDIA_ICD_CONFIG_PATH):
  with open(NVIDIA_ICD_CONFIG_PATH, 'w') as f:
    f.write("""{
    "file_format_version" : "1.0.0",
    "ICD" : {
        "library_path" : "libEGL_nvidia.so.0"
    }
}
""")

# Configure MuJoCo to use the EGL rendering backend (requires GPU)
print('Setting environment variable to use GPU rendering:')
%env MUJOCO_GL=egl

# Install system dependencies
!apt-get update
!apt-get install -y \
    libgl1-mesa-dev \
    libgl1-mesa-glx \
    libglew-dev \
    libosmesa6-dev \
    libegl1-mesa-dev \
    software-properties-common \
    patchelf \
    libglfw3-dev

# Install MuJoCo
!pip install mujoco

# Check if installation was successful
try:
  print('Checking that the installation succeeded:')
  import mujoco
  mujoco.MjModel.from_xml_string('<mujoco/>')
except Exception as e:
  raise e from RuntimeError(
      'Something went wrong during installation. Check the shell output above '
      'for more information.\n'
      'If using a hosted Colab runtime, make sure you enable GPU acceleration '
      'by going to the Runtime menu and selecting "Choose runtime type".')

print('MuJoCo installation successful.')

# Install other Python dependencies
!pip install torch>=2.0.0
!pip install numpy==1.26.4
!pip install scipy>=1.7.0
!pip install gymnasium==0.28.1
!pip install imageio
!pip install imageio[ffmpeg]
!pip install imageio[pyav]
!pip install tyro>=0.5.0
!pip install tqdm>=4.60.0

print('All dependencies installed successfully!')

In [None]:
# Clone the repository
# Replace with your actual GitHub repository URL
repo_url = "https://github.com/wzzzzq/MuJoCo_Visual_PPO.git"
repo_name = "MuJoCo_Visual_PPO"

# Check if repository already exists
if os.path.exists(f"/content/{repo_name}"):
    print(f"Repository {repo_name} already exists. Pulling latest changes...")
    %cd /content/{repo_name}
    !git pull
else:
    print(f"Cloning repository from {repo_url}...")
    %cd /content
    !git clone {repo_url}
    %cd {repo_name}

# Verify the clone was successful
print("Repository contents:")
!ls -la

# Verify key files exist
required_files = ['single_piper_on_desk_env.py', 'ppo_rgb.py', 'model_assets/piper_on_desk/scene.xml']
missing_files = []
for file in required_files:
    if not os.path.exists(file):
        missing_files.append(file)

if missing_files:
    print(f"Warning: Missing required files: {missing_files}")
    print("Please check that the repository contains all necessary files.")
else:
    print("All required files found successfully!")

In [None]:
# Mount Google Drive to save models and logs
from google.colab import drive
drive.mount('/content/drive')

# Create directories for saving models
!mkdir -p /content/drive/MyDrive/ppo_training_runs

In [None]:
# Common imports and helper functions for the notebook
import time
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import clear_output, display

# Set up matplotlib for better plots in Colab
plt.style.use('default')
np.set_printoptions(precision=3, suppress=True, linewidth=100)

def print_section(title, char="=", width=50):
    """Print a formatted section header"""
    print(char * width)
    print(f" {title} ".center(width))
    print(char * width)

def print_success(message):
    """Print a success message with emoji"""
    print(f"✓ {message}")

def print_error(message):
    """Print an error message with emoji"""
    print(f"❌ {message}")

def print_warning(message):
    """Print a warning message with emoji"""
    print(f"⚠️  {message}")

def format_time(seconds):
    """Format seconds into a readable time string"""
    if seconds < 60:
        return f"{seconds:.1f}s"
    elif seconds < 3600:
        return f"{seconds/60:.1f}m"
    else:
        return f"{seconds/3600:.1f}h"

print_success("Helper functions loaded successfully!")
print("Ready to proceed with MuJoCo setup...")

In [None]:
# Basic rendering test - following MuJoCo tutorial simple pattern
print("🖼️  Basic MuJoCo Rendering Test")

try:
    # Load the Piper robot model directly (similar to tutorial approach)
    import mujoco
    import os
    
    # Find the model file
    model_path = None
    possible_paths = [
        'model_assets/piper_on_desk/scene.xml',
        '/content/MuJoCo_Visual_PPO/model_assets/piper_on_desk/scene.xml'
    ]
    
    for path in possible_paths:
        if os.path.exists(path):
            model_path = path
            break
    
    if model_path:
        print(f"Loading model from: {model_path}")
        
        # Load model and create data (tutorial pattern)
        model = mujoco.MjModel.from_xml_path(model_path)
        data = mujoco.MjData(model)
        
        # Basic rendering following exact tutorial pattern
        with mujoco.Renderer(model, height=480, width=640) as renderer:
            # Forward simulation to update positions (tutorial step)
            mujoco.mj_forward(model, data)
            renderer.update_scene(data)
            
            # Render and show the image (tutorial style)
            pixels = renderer.render()
            
            plt.figure(figsize=(10, 6))
            plt.imshow(pixels)
            plt.title('Piper Robot Environment - Basic MuJoCo Rendering\n(Following Official Tutorial Pattern)', 
                     fontsize=14, fontweight='bold', pad=20)
            plt.axis('off')
            
            # Add technical info as text overlay (tutorial style)
            plt.text(0.02, 0.98, f'Resolution: {pixels.shape[1]}×{pixels.shape[0]}\nBackend: {os.environ.get("MUJOCO_GL", "default")}', 
                    transform=plt.gca().transAxes, fontsize=10, verticalalignment='top',
                    bbox=dict(boxstyle="round,pad=0.3", facecolor="white", alpha=0.8))
            
            plt.tight_layout()
            plt.show()
            
        print_success("Basic MuJoCo rendering test passed!")
        print(f"✓ Rendered {pixels.shape} image using tutorial pattern")
        print("✓ Model loaded and rendered successfully")
        
    else:
        print_warning("Model file not found. This test requires the repository to be cloned first.")
        print("Please run the repository cloning cell first.")
        
except Exception as e:
    print_error(f"Basic rendering test failed: {e}")
    print("This is expected if MuJoCo or the model files are not yet available.")
    print("Please run the installation and repository setup cells first.")

In [None]:
# Import required libraries
import os
import sys
import torch
import numpy as np
import gymnasium as gym

# Ensure MuJoCo is configured for EGL rendering (headless GPU rendering for Colab)
os.environ['MUJOCO_GL'] = 'egl'

# Add the project directory to Python path
project_path = f'/content/{repo_name}'  # Use the repo name from previous cell
if os.path.exists(project_path):
    sys.path.insert(0, project_path)  # Insert at beginning to ensure our modules are found first
    os.chdir(project_path)
    print(f"Working directory set to: {os.getcwd()}")
else:
    print(f"Project path {project_path} not found. Please check the repository cloning step.")
    
# Verify MuJoCo setup
print("Verifying MuJoCo configuration...")
import mujoco
print(f"MuJoCo version: {mujoco.__version__}")
print(f"MuJoCo GL backend: {os.environ.get('MUJOCO_GL', 'Not set')}")

# Test basic MuJoCo functionality
try:
    test_model = mujoco.MjModel.from_xml_string('<mujoco><worldbody><geom size="1"/></worldbody></mujoco>')
    test_data = mujoco.MjData(test_model)
    mujoco.mj_step(test_model, test_data)
    print("✓ MuJoCo basic functionality test passed")
except Exception as e:
    print(f"✗ MuJoCo basic functionality test failed: {e}")

# Test MuJoCo rendering
try:
    with mujoco.Renderer(test_model, height=64, width=64) as renderer:
        mujoco.mj_forward(test_model, test_data)
        renderer.update_scene(test_data)
        pixels = renderer.render()
        print(f"✓ MuJoCo rendering test passed - rendered {pixels.shape} image")
except Exception as e:
    print(f"✗ MuJoCo rendering test failed: {e}")

# Import project modules
try:
    from single_piper_on_desk_env import PiperEnv
    from ppo_rgb import PPOArgs, train
    import tyro
    print("✓ Project modules imported successfully")
except Exception as e:
    print(f"✗ Failed to import project modules: {e}")
    print("Please check that all required files are in the repository.")

# Check PyTorch and CUDA setup
print(f"\nPyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA device count: {torch.cuda.device_count()}")

# Test the environment
print("\nTesting PiperEnv...")
try:
    env = PiperEnv(render_mode=None)  # No GUI rendering in headless Colab
    obs, info = env.reset()
    print(f"✓ Environment created successfully")
    print(f"  Observation space: {env.observation_space}")
    print(f"  Action space: {env.action_space}")
    print(f"  Observation keys: {list(obs.keys())}")
    print(f"  RGB image shape: {obs['rgb'].shape}")
    print(f"  Wrist camera shape: {obs['wrist_cam'].shape}")
    print(f"  State shape: {obs['state'].shape}")
    
    # Test a few steps
    for i in range(3):
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        print(f"  Step {i+1}: reward={reward:.3f}")
    
    env.close()
    print("✓ Environment test completed successfully!")
    
except Exception as e:
    print(f"✗ Environment test failed: {e}")
    import traceback
    traceback.print_exc()
    print("Please check that all model assets are properly included in the repository.")

In [None]:
# Demonstrate camera rendering - following MuJoCo tutorial pattern
print_section("Camera Rendering Demonstration")

try:
    # Create environment for rendering demo
    env = PiperEnv(render_mode=None)
    obs, info = env.reset()
    
    # Take a few random steps to create an interesting scene
    for i in range(10):
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        if terminated or truncated:
            obs, info = env.reset()
    
    print("📸 Capturing camera views using MuJoCo tutorial rendering pattern...")
    
    # Get both camera views using the exact same method as the tutorial
    rgb_view = obs['rgb']
    wrist_view = obs['wrist_cam']
    
    # Display both cameras side by side (following tutorial style)
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
    
    # Third-person camera view
    ax1.imshow(rgb_view)
    ax1.set_title('Third-Person Camera View\n(External perspective)', fontsize=12, fontweight='bold')
    ax1.axis('off')
    ax1.text(0.5, -0.05, f'Resolution: {rgb_view.shape[1]}×{rgb_view.shape[0]}', 
             ha='center', va='top', transform=ax1.transAxes, fontsize=10)
    
    # Wrist camera view  
    ax2.imshow(wrist_view)
    ax2.set_title('Wrist Camera View\n(Robot end-effector perspective)', fontsize=12, fontweight='bold')
    ax2.axis('off')
    ax2.text(0.5, -0.05, f'Resolution: {wrist_view.shape[1]}×{wrist_view.shape[0]}', 
             ha='center', va='top', transform=ax2.transAxes, fontsize=10)
    
    plt.tight_layout()
    plt.suptitle('MuJoCo Piper Robot Camera Views', fontsize=14, fontweight='bold', y=1.02)
    plt.show()
    
    # Show some technical details about the rendering
    print(f"\n🔍 Technical Details:")
    print(f"  Third-person camera shape: {rgb_view.shape}")
    print(f"  Wrist camera shape: {wrist_view.shape}")
    print(f"  Data type: {rgb_view.dtype}")
    print(f"  Value range: [{rgb_view.min()}, {rgb_view.max()}]")
    print(f"  Rendering backend: {os.environ.get('MUJOCO_GL', 'default')}")
    
    # Demonstrate the rendering pipeline (following tutorial approach)
    print(f"\n⚙️  Rendering Pipeline (following MuJoCo tutorial):")
    print(f"  1. Environment step → data.qpos, data.qvel updated")
    print(f"  2. mj_forward() → compute derived quantities")  
    print(f"  3. Renderer context → with mujoco.Renderer() as renderer:")
    print(f"  4. renderer.update_scene() → update visual scene")
    print(f"  5. renderer.render() → generate RGB pixels")
    
    # Show a sequence of frames to demonstrate motion
    print(f"\n🎬 Demonstrating motion sequence...")
    
    # Reset environment and take controlled steps
    obs, info = env.reset()
    frames_3rd = []
    frames_wrist = []
    
    # Collect frames from a short sequence
    for step in range(6):
        # Use a more controlled action for better visualization
        action = np.array([0.1, -0.1, 0.05, -0.05, 0.02, 0.1, 0.5])  # Deliberate movement
        obs, reward, terminated, truncated, info = env.step(action)
        
        if step % 2 == 0:  # Collect every other frame
            frames_3rd.append(obs['rgb'])
            frames_wrist.append(obs['wrist_cam'])
    
    # Display motion sequence
    num_frames = len(frames_3rd)
    fig, axes = plt.subplots(2, num_frames, figsize=(3*num_frames, 6))
    
    if num_frames == 1:
        axes = axes.reshape(2, 1)
    
    for i in range(num_frames):
        # Third-person sequence
        axes[0, i].imshow(frames_3rd[i])
        axes[0, i].set_title(f'3rd Person\nStep {i*2}', fontsize=10)
        axes[0, i].axis('off')
        
        # Wrist sequence  
        axes[1, i].imshow(frames_wrist[i])
        axes[1, i].set_title(f'Wrist Camera\nStep {i*2}', fontsize=10)
        axes[1, i].axis('off')
    
    plt.tight_layout()
    plt.suptitle('Motion Sequence - Both Camera Views', fontsize=14, fontweight='bold', y=1.02)
    plt.show()
    
    env.close()
    print_success("Camera rendering demonstration completed!")
    print("✓ Both camera views rendered successfully using MuJoCo tutorial pattern")
    print("✓ Motion sequence captured and displayed")
    
except Exception as e:
    print_error(f"Camera rendering demonstration failed: {e}")
    import traceback
    traceback.print_exc()

In [None]:
# Configure PPO training arguments
ppo_args = PPOArgs()

# Colab-optimized settings
# Note: Adjust these based on your Colab runtime (T4, V100, etc.)
ppo_args.total_timesteps = 500000   # Reduced for faster demo (increase to 1M+ for full training)
ppo_args.num_envs = 3               # Fewer environments to reduce memory usage
ppo_args.num_eval_envs = 1          # Single eval environment
ppo_args.num_steps = 80             # Steps per environment per update
ppo_args.num_minibatches = 4        # Number of minibatches for gradient updates
ppo_args.learning_rate = 3e-4       # Standard learning rate
ppo_args.track = False              # Disable wandb tracking for simplicity
ppo_args.save_model = True          # Save model checkpoints
ppo_args.cuda = torch.cuda.is_available()  # Use GPU if available

# Reproducibility settings
ppo_args.seed = 42
ppo_args.torch_deterministic = True

# Memory optimization for Colab
ppo_args.batch_size = ppo_args.num_envs * ppo_args.num_steps  # Calculate batch size
print(f"Calculated batch size: {ppo_args.batch_size}")

print("PPO Configuration for Google Colab:")
print(f"  Total timesteps: {ppo_args.total_timesteps:,}")
print(f"  Number of environments: {ppo_args.num_envs}")
print(f"  Steps per environment: {ppo_args.num_steps}")
print(f"  Batch size: {ppo_args.batch_size}")
print(f"  Learning rate: {ppo_args.learning_rate}")
print(f"  Using CUDA: {ppo_args.cuda}")
print(f"  Expected training time: ~{ppo_args.total_timesteps // (ppo_args.num_envs * ppo_args.num_steps * 60):.1f} minutes")

# Estimate memory usage
if torch.cuda.is_available():
    torch.cuda.empty_cache()  # Clear GPU cache
    print(f"\nGPU Memory before training:")
    print(f"  Allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
    print(f"  Cached: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")

# Start training with proper error handling
print("\n" + "="*50)
print("Starting PPO training...")
print("="*50)

try:
    # Create a simple environment test before training
    test_env = PiperEnv(render_mode=None)
    test_obs, _ = test_env.reset()
    test_env.close()
    print("✓ Environment pre-check passed")
    
    # Start actual training
    train(args=ppo_args)
    print("\n" + "="*50)
    print("🎉 Training completed successfully!")
    print("="*50)
    
except KeyboardInterrupt:
    print("\n" + "="*50)
    print("⚠️  Training interrupted by user")
    print("="*50)
    
except Exception as e:
    print("\n" + "="*50)
    print(f"❌ Training failed with error: {e}")
    print("="*50)
    import traceback
    traceback.print_exc()
    
    # Provide helpful debugging information
    print("\nDebugging information:")
    print(f"  Working directory: {os.getcwd()}")
    print(f"  Python path: {sys.path[:3]}...")  # Show first 3 entries
    print(f"  CUDA available: {torch.cuda.is_available()}")
    if torch.cuda.is_available():
        print(f"  GPU memory: {torch.cuda.memory_allocated() / 1024**3:.2f} GB allocated")
        
finally:
    # Cleanup
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        print(f"\nGPU Memory after training:")
        print(f"  Allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
        print(f"  Cached: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")

In [None]:
# Test the trained policy
print("="*50)
print("Testing the trained policy...")
print("="*50)

# Find the latest checkpoint
import glob
import matplotlib.pyplot as plt
from IPython.display import display, clear_output

# Look for checkpoints in the runs directory
checkpoint_patterns = ["runs/*/final_ckpt.pt", "runs/*/best_ckpt.pt", "runs/*/*.pt"]
checkpoint_files = []
for pattern in checkpoint_patterns:
    checkpoint_files.extend(glob.glob(pattern))

if checkpoint_files:
    # Sort by modification time to get the latest
    latest_checkpoint = max(checkpoint_files, key=os.path.getctime)
    print(f"✓ Found checkpoint: {latest_checkpoint}")
    
    # Show available checkpoints
    print("\nAvailable checkpoints:")
    for i, ckpt in enumerate(sorted(checkpoint_files, key=os.path.getctime, reverse=True)[:5]):
        size_mb = os.path.getsize(ckpt) / (1024 * 1024)
        mtime = os.path.getmtime(ckpt)
        print(f"  {i+1}. {ckpt} ({size_mb:.1f} MB, {time.ctime(mtime)})")

    # Manual testing with visualization
    print(f"\n📊 Manual testing with {latest_checkpoint}...")
    
    try:
        from ppo_rgb import Agent
        import torch
        import time

        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        print(f"Using device: {device}")

        # Create environment (no GUI rendering in Colab)
        env = PiperEnv(render_mode=None)
        sample_obs, _ = env.reset()

        # Load the trained agent
        agent = Agent(env, sample_obs)
        agent.load_state_dict(torch.load(latest_checkpoint, map_location=device))
        agent.eval()
        print("✓ Agent loaded successfully")

        # Test for multiple episodes with statistics
        num_test_episodes = 5
        episode_rewards = []
        episode_lengths = []
        success_count = 0
        
        print(f"\nRunning {num_test_episodes} test episodes...")
        
        for episode in range(num_test_episodes):
            obs, _ = env.reset()
            done = False
            episode_reward = 0
            step_count = 0
            
            # Store observations for visualization
            rgb_frames = []
            wrist_frames = []
            
            while not done and step_count < 200:  # Limit steps per episode
                # Store frames (every 10 steps to save memory)
                if step_count % 10 == 0:
                    rgb_frames.append(obs['rgb'])
                    wrist_frames.append(obs['wrist_cam'])
                
                # Get action from trained agent
                with torch.no_grad():
                    action = agent.get_action(obs, deterministic=True)
                    action = action.cpu().numpy()

                obs, reward, terminated, truncated, info = env.step(action)
                episode_reward += reward
                step_count += 1
                done = terminated or truncated
                
                # Check for success
                if info.get('is_success', False):
                    success_count += 1

            episode_rewards.append(episode_reward)
            episode_lengths.append(step_count)
            
            print(f"  Episode {episode + 1}: Reward = {episode_reward:.3f}, Steps = {step_count}, Success = {info.get('is_success', False)}")
            
            # Show final frames from last episode
            if episode == num_test_episodes - 1 and rgb_frames:
                fig, axes = plt.subplots(2, min(4, len(rgb_frames)), figsize=(12, 6))
                if len(rgb_frames) == 1:
                    axes = axes.reshape(2, 1)
                
                for i, idx in enumerate(np.linspace(0, len(rgb_frames)-1, min(4, len(rgb_frames)), dtype=int)):
                    axes[0, i].imshow(rgb_frames[idx])
                    axes[0, i].set_title(f'3rd Person (Step {idx*10})')
                    axes[0, i].axis('off')
                    
                    axes[1, i].imshow(wrist_frames[idx])
                    axes[1, i].set_title(f'Wrist Camera (Step {idx*10})')
                    axes[1, i].axis('off')
                
                plt.tight_layout()
                plt.show()

        env.close()
        
        # Display statistics
        print(f"\n📈 Test Results Summary:")
        print(f"  Average Reward: {np.mean(episode_rewards):.3f} ± {np.std(episode_rewards):.3f}")
        print(f"  Average Episode Length: {np.mean(episode_lengths):.1f} ± {np.std(episode_lengths):.1f}")
        print(f"  Success Rate: {success_count}/{num_test_episodes} ({100*success_count/num_test_episodes:.1f}%)")
        print(f"  Min/Max Reward: {np.min(episode_rewards):.3f} / {np.max(episode_rewards):.3f}")
        
        # Plot results
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
        
        ax1.plot(episode_rewards, 'o-')
        ax1.set_title('Episode Rewards')
        ax1.set_xlabel('Episode')
        ax1.set_ylabel('Reward')
        ax1.grid(True)
        
        ax2.plot(episode_lengths, 'o-', color='orange')
        ax2.set_title('Episode Lengths')
        ax2.set_xlabel('Episode')
        ax2.set_ylabel('Steps')
        ax2.grid(True)
        
        plt.tight_layout()
        plt.show()
        
        print("✓ Manual testing completed successfully!")

    except Exception as e:
        print(f"❌ Manual testing failed: {e}")
        import traceback
        traceback.print_exc()

else:
    print("❌ No checkpoint files found.")
    print("Please check that training completed successfully and checkpoint files exist in the runs/ directory.")
    print("\nLooking for files in runs/ directory:")
    if os.path.exists("runs"):
        for root, dirs, files in os.walk("runs"):
            for file in files:
                print(f"  {os.path.join(root, file)}")
    else:
        print("  runs/ directory does not exist")

In [None]:
# Final cleanup and model saving
print("="*50)
print("Training Session Cleanup and Model Management")
print("="*50)

import shutil
from datetime import datetime

# 1. Check the saved models and create a timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
print(f"Session timestamp: {timestamp}")

if os.path.exists("runs/"):
    print("\n📁 Checking saved models in runs/ directory:")
    
    total_size = 0
    checkpoint_count = 0
    
    for root, dirs, files in os.walk("runs/"):
        for file in files:
            filepath = os.path.join(root, file)
            size = os.path.getsize(filepath)
            total_size += size
            
            if file.endswith('.pt'):
                checkpoint_count += 1
                print(f"  ✓ {filepath} ({size / (1024*1024):.1f} MB)")
            elif file.endswith('.txt') or file.endswith('.log'):
                print(f"  📄 {filepath} ({size / 1024:.1f} KB)")
    
    print(f"\nSummary: {checkpoint_count} checkpoints, total size: {total_size / (1024*1024):.1f} MB")
    
    # 2. Copy models to Google Drive with timestamp
    drive_path = f"/content/drive/MyDrive/ppo_training_runs/session_{timestamp}"
    
    try:
        if not os.path.exists("/content/drive/MyDrive/ppo_training_runs/"):
            os.makedirs("/content/drive/MyDrive/ppo_training_runs/")
        
        print(f"\n💾 Copying models to Google Drive: {drive_path}")
        shutil.copytree("runs/", drive_path)
        
        # Create a summary file
        summary_path = f"{drive_path}/session_summary.txt"
        with open(summary_path, 'w') as f:
            f.write(f"PPO Training Session Summary\n")
            f.write(f"Timestamp: {timestamp}\n")
            f.write(f"Total timesteps: {ppo_args.total_timesteps:,}\n")
            f.write(f"Number of environments: {ppo_args.num_envs}\n")
            f.write(f"Learning rate: {ppo_args.learning_rate}\n")
            f.write(f"Checkpoints: {checkpoint_count}\n")
            f.write(f"Total size: {total_size / (1024*1024):.1f} MB\n")
            f.write(f"CUDA used: {ppo_args.cuda}\n")
        
        print(f"✓ Models successfully copied to Google Drive")
        print(f"✓ Session summary saved to: {summary_path}")
        
    except Exception as e:
        print(f"❌ Failed to copy to Google Drive: {e}")
        print("You may need to mount Google Drive first or check permissions.")

else:
    print("❌ No runs/ directory found. Training may not have completed successfully.")

# 3. Show training artifacts
print(f"\n📊 Training Artifacts Generated:")
artifacts = []
if os.path.exists("runs/"):
    for root, dirs, files in os.walk("runs/"):
        for file in files:
            artifacts.append(os.path.join(root, file))

for artifact in sorted(artifacts)[:10]:  # Show first 10
    print(f"  📄 {artifact}")
if len(artifacts) > 10:
    print(f"  ... and {len(artifacts) - 10} more files")

# 4. Memory cleanup
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print(f"\n🧹 GPU memory cleared")
    print(f"  Current GPU memory: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")

# 5. Next steps and recommendations
print(f"\n🎯 Next Steps and Recommendations:")
print(f"")
print(f"1. 📥 Download your models:")
print(f"   - Check Google Drive: /MyDrive/ppo_training_runs/session_{timestamp}/")
print(f"   - Download the entire folder for local testing")
print(f"")
print(f"2. 🔧 Improve training (if needed):")
print(f"   - Increase total_timesteps to 1M+ for better performance")
print(f"   - Adjust learning_rate (try 1e-4 or 5e-4)")
print(f"   - Increase num_envs if you have more GPU memory")
print(f"")
print(f"3. 📊 Monitor training:")
print(f"   - Enable wandb tracking for better visualization")
print(f"   - Add tensorboard logging")
print(f"   - Save training curves")
print(f"")
print(f"4. 🚀 Deploy your model:")
print(f"   - Test on real robot hardware")
print(f"   - Fine-tune with real-world data")
print(f"   - Implement safety mechanisms")
print(f"")
print(f"5. 🔄 Colab session management:")
print(f"   - This session will timeout after 12 hours")
print(f"   - Consider using Colab Pro for longer sessions")
print(f"   - Save intermediate checkpoints more frequently")

print(f"\n🎉 Training session completed successfully!")
print(f"All models and logs have been saved to Google Drive.")
print("="*50)