# Multi-Objective Neural Architecture Search for Edge AI
## Focused Learning Notebook 1/4

**Paper Source**: Optimizing Edge AI: A Comprehensive Survey (2501.03265v1)  
**Paper Sections**: Pages 11-13 (Neural Architecture Search)  
**Focus Concept**: Hardware-Aware Multi-Objective NAS with Pareto Optimization

---

## 🎯 Learning Objectives

By completing this notebook, you will understand:

1. **Multi-objective optimization** in neural architecture search
2. **Hardware-aware constraints** and their impact on architecture design
3. **Pareto-optimal solutions** for accuracy-efficiency trade-offs
4. **Differentiable NAS (DNAS)** algorithms and implementation
5. **Performance prediction models** for edge hardware

---

## 📚 Theoretical Foundation

### Multi-Objective Neural Architecture Search Problem

**Paper Quote** (Section Neural Architecture Search):
> *"Neural Architecture Search methods for edge deployment must balance multiple conflicting objectives: model accuracy, inference latency, memory footprint, and energy consumption, while considering specific hardware constraints."*

### Mathematical Formulation

The multi-objective NAS problem can be formulated as:

$$\min_{\alpha \in \mathcal{A}} \mathbf{f}(\alpha) = [f_1(\alpha), f_2(\alpha), ..., f_k(\alpha)]^T$$

Where:
- $\alpha$ represents the architecture parameters
- $\mathcal{A}$ is the architecture search space
- $\mathbf{f}(\alpha)$ is a vector of $k$ objective functions

**Typical objectives for Edge AI:**
- $f_1(\alpha)$: **Validation Error** (minimize)
- $f_2(\alpha)$: **Inference Latency** (minimize)
- $f_3(\alpha)$: **Memory Usage** (minimize)
- $f_4(\alpha)$: **Energy Consumption** (minimize)

**Hardware Constraints:**
$$\text{subject to: } g_i(\alpha) \leq c_i, \quad i = 1, 2, ..., m$$

Where $g_i(\alpha)$ represents hardware-specific constraints (memory limits, computation budgets, etc.)

## 🛠️ Environment Setup

In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Tuple, Optional, Any
from dataclasses import dataclass
import time
import random
from collections import defaultdict
import warnings
warnings.filterwarnings('ignore')

# Multi-objective optimization
from scipy.optimize import minimize
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

print("✅ Environment setup complete for Multi-Objective NAS")
print(f"PyTorch version: {torch.__version__}")
print(f"Device: {'CUDA' if torch.cuda.is_available() else 'CPU'}")

## 🏗️ Architecture Search Space Definition

Define a flexible search space for edge-optimized neural architectures.

In [None]:
@dataclass
class EdgeHardwareSpec:
    """Edge device hardware specifications"""
    cpu_cores: int = 4
    memory_mb: int = 512
    max_latency_ms: float = 100.0
    power_budget_mw: float = 1000.0
    has_npu: bool = False  # Neural Processing Unit
    
class ArchitectureSearchSpace:
    """Defines the search space for neural architectures"""
    
    def __init__(self):
        # Layer types available for edge deployment
        self.layer_types = {
            'conv3x3': {'params_factor': 1.0, 'flops_factor': 1.0, 'latency_factor': 1.0},
            'conv1x1': {'params_factor': 0.11, 'flops_factor': 0.11, 'latency_factor': 0.5},
            'depthwise_conv': {'params_factor': 0.11, 'flops_factor': 0.11, 'latency_factor': 0.3},
            'separable_conv': {'params_factor': 0.12, 'flops_factor': 0.12, 'latency_factor': 0.4},
            'mobilenet_block': {'params_factor': 0.15, 'flops_factor': 0.15, 'latency_factor': 0.6},
            'squeeze_excite': {'params_factor': 0.05, 'flops_factor': 0.02, 'latency_factor': 0.8},
        }
        
        # Channel width options
        self.channel_widths = [16, 32, 48, 64, 96, 128, 160, 192, 224, 256]
        
        # Depth options (number of layers)
        self.depth_options = [4, 6, 8, 10, 12, 14, 16, 18, 20]
        
        # Activation functions
        self.activations = ['relu', 'relu6', 'swish', 'hardswish', 'gelu']
        
    def sample_architecture(self) -> Dict[str, Any]:
        """Sample a random architecture from the search space"""
        depth = random.choice(self.depth_options)
        
        layers = []
        for i in range(depth):
            layer = {
                'type': random.choice(list(self.layer_types.keys())),
                'channels': random.choice(self.channel_widths),
                'activation': random.choice(self.activations)
            }
            layers.append(layer)
        
        architecture = {
            'layers': layers,
            'depth': depth,
            'global_pool': random.choice(['avg', 'max', 'adaptive']),
            'dropout_rate': random.uniform(0.0, 0.5)
        }
        
        return architecture
    
    def encode_architecture(self, arch: Dict[str, Any]) -> np.ndarray:
        """Encode architecture as a vector for optimization"""
        # Simple encoding: layer types, channels, depth
        encoding = []
        
        # Architecture depth
        encoding.append(arch['depth'] / 20.0)  # Normalize
        
        # Layer information (pad/truncate to fixed length)
        max_layers = 20
        layer_encoding = []
        
        for i in range(max_layers):
            if i < len(arch['layers']):
                layer = arch['layers'][i]
                # Layer type (one-hot)
                layer_type_idx = list(self.layer_types.keys()).index(layer['type'])
                layer_one_hot = [0] * len(self.layer_types)
                layer_one_hot[layer_type_idx] = 1
                layer_encoding.extend(layer_one_hot)
                
                # Channel width (normalized)
                layer_encoding.append(layer['channels'] / 256.0)
                
                # Activation (simplified)
                activation_idx = self.activations.index(layer['activation'])
                layer_encoding.append(activation_idx / len(self.activations))
            else:
                # Padding for shorter architectures
                layer_encoding.extend([0] * (len(self.layer_types) + 2))
        
        encoding.extend(layer_encoding)
        
        # Global settings
        pool_types = ['avg', 'max', 'adaptive']
        pool_idx = pool_types.index(arch['global_pool'])
        encoding.append(pool_idx / len(pool_types))
        encoding.append(arch['dropout_rate'])
        
        return np.array(encoding)

# Initialize search space
search_space = ArchitectureSearchSpace()
edge_hardware = EdgeHardwareSpec()

print("✅ Architecture search space defined")
print(f"   Layer types: {len(search_space.layer_types)}")
print(f"   Channel options: {len(search_space.channel_widths)}")
print(f"   Depth options: {len(search_space.depth_options)}")

# Sample architecture example
sample_arch = search_space.sample_architecture()
print(f"\n📋 Sample Architecture:")
print(f"   Depth: {sample_arch['depth']} layers")
print(f"   First 3 layers: {sample_arch['layers'][:3]}")
print(f"   Global pool: {sample_arch['global_pool']}")
print(f"   Dropout: {sample_arch['dropout_rate']:.3f}")

## ⚡ Performance Prediction Models

**Paper Reference**: *"Hardware-aware optimization requires accurate performance prediction models that estimate latency, memory usage, and energy consumption without actual deployment."*

### Hardware Performance Modeling

In [None]:
class HardwarePerformanceModel:
    """Predicts hardware performance metrics for neural architectures"""
    
    def __init__(self, hardware_spec: EdgeHardwareSpec):
        self.hardware_spec = hardware_spec
        
        # Lookup tables for different operations (simplified)
        self.operation_latency = {
            'conv3x3': 0.5,  # ms per GFLOP
            'conv1x1': 0.2,
            'depthwise_conv': 0.15,
            'separable_conv': 0.25,
            'mobilenet_block': 0.3,
            'squeeze_excite': 0.8,
        }
        
        self.operation_memory = {
            'conv3x3': 2.0,  # MB per layer
            'conv1x1': 0.5,
            'depthwise_conv': 0.3,
            'separable_conv': 0.4,
            'mobilenet_block': 0.6,
            'squeeze_excite': 0.1,
        }
        
        self.operation_energy = {
            'conv3x3': 100,  # mJ per GFLOP
            'conv1x1': 40,
            'depthwise_conv': 30,
            'separable_conv': 50,
            'mobilenet_block': 60,
            'squeeze_excite': 20,
        }
        
    def predict_latency(self, architecture: Dict[str, Any]) -> float:
        """Predict inference latency in milliseconds"""
        total_latency = 0.0
        
        for layer in architecture['layers']:
            layer_type = layer['type']
            channels = layer['channels']
            
            # Simplified FLOPS calculation
            flops_giga = (channels * channels * 32 * 32) / 1e9  # Assume 32x32 input
            layer_latency = flops_giga * self.operation_latency[layer_type]
            
            # Channel scaling factor
            channel_factor = np.sqrt(channels / 64.0)  # Reference: 64 channels
            layer_latency *= channel_factor
            
            total_latency += layer_latency
        
        # Hardware-specific scaling
        cpu_scaling = 4.0 / self.hardware_spec.cpu_cores
        npu_scaling = 0.3 if self.hardware_spec.has_npu else 1.0
        
        total_latency *= cpu_scaling * npu_scaling
        
        # Add overhead
        overhead = 5.0 + 0.5 * architecture['depth']  # ms
        
        return total_latency + overhead
    
    def predict_memory(self, architecture: Dict[str, Any]) -> float:
        """Predict memory usage in MB"""
        total_memory = 0.0
        
        for layer in architecture['layers']:
            layer_type = layer['type']
            channels = layer['channels']
            
            # Base memory usage
            layer_memory = self.operation_memory[layer_type]
            
            # Channel scaling (quadratic for weights, linear for activations)
            weight_memory = layer_memory * (channels / 64.0) ** 1.5
            activation_memory = 4 * channels * 32 * 32 / (1024 * 1024)  # 4 bytes per float
            
            total_memory += weight_memory + activation_memory
        
        # Global pooling and classifier
        final_channels = architecture['layers'][-1]['channels'] if architecture['layers'] else 64
        classifier_memory = final_channels * 1000 * 4 / (1024 * 1024)  # 1000 classes
        
        return total_memory + classifier_memory + 10  # 10MB base overhead
    
    def predict_energy(self, architecture: Dict[str, Any]) -> float:
        """Predict energy consumption in millijoules"""
        total_energy = 0.0
        
        for layer in architecture['layers']:
            layer_type = layer['type']
            channels = layer['channels']
            
            # FLOPS-based energy calculation
            flops_giga = (channels * channels * 32 * 32) / 1e9
            layer_energy = flops_giga * self.operation_energy[layer_type]
            
            # Channel scaling
            channel_factor = (channels / 64.0) ** 1.2
            layer_energy *= channel_factor
            
            total_energy += layer_energy
        
        # Memory access energy
        memory_mb = self.predict_memory(architecture)
        memory_energy = memory_mb * 0.5  # 0.5 mJ per MB access
        
        return total_energy + memory_energy
    
    def predict_accuracy(self, architecture: Dict[str, Any]) -> float:
        """Simplified accuracy prediction (in practice, use surrogate models)"""
        # Simplified heuristic: deeper networks with more channels are generally more accurate
        depth_score = min(architecture['depth'] / 20.0, 1.0)
        
        avg_channels = np.mean([layer['channels'] for layer in architecture['layers']])
        channel_score = min(avg_channels / 128.0, 1.0)
        
        # Layer type diversity bonus
        layer_types = set(layer['type'] for layer in architecture['layers'])
        diversity_score = len(layer_types) / len(search_space.layer_types)
        
        # Combined score (scaled to typical accuracy range)
        base_accuracy = 0.6 + 0.35 * (0.4 * depth_score + 0.4 * channel_score + 0.2 * diversity_score)
        
        # Add some noise to simulate real accuracy variation
        noise = np.random.normal(0, 0.02)
        
        return min(max(base_accuracy + noise, 0.5), 0.98)

# Initialize performance model
perf_model = HardwarePerformanceModel(edge_hardware)

print("✅ Hardware performance model initialized")

# Test performance prediction
sample_arch = search_space.sample_architecture()
latency = perf_model.predict_latency(sample_arch)
memory = perf_model.predict_memory(sample_arch)
energy = perf_model.predict_energy(sample_arch)
accuracy = perf_model.predict_accuracy(sample_arch)

print(f"\n📊 Sample Architecture Performance:")
print(f"   Accuracy: {accuracy:.3f}")
print(f"   Latency: {latency:.1f} ms")
print(f"   Memory: {memory:.1f} MB")
print(f"   Energy: {energy:.1f} mJ")
print(f"   Meets latency constraint: {'✅' if latency <= edge_hardware.max_latency_ms else '❌'}")
print(f"   Meets memory constraint: {'✅' if memory <= edge_hardware.memory_mb else '❌'}")

## 🎯 Multi-Objective Optimization Algorithm

**Paper Reference**: *"Multi-objective NAS methods like NSGA-II and Pareto-efficient solutions enable finding optimal trade-offs between accuracy and deployment constraints."*

### NSGA-II Implementation for Architecture Search

In [None]:
class MultiObjectiveNAS:
    """Multi-objective Neural Architecture Search using evolutionary algorithms"""
    
    def __init__(self, search_space: ArchitectureSearchSpace, 
                 perf_model: HardwarePerformanceModel,
                 population_size: int = 50,
                 generations: int = 20):
        self.search_space = search_space
        self.perf_model = perf_model
        self.population_size = population_size
        self.generations = generations
        
        # Evolution history
        self.history = {
            'generations': [],
            'pareto_fronts': [],
            'best_solutions': []
        }
    
    def evaluate_architecture(self, architecture: Dict[str, Any]) -> Tuple[float, float, float, float]:
        """Evaluate architecture on all objectives"""
        accuracy = self.perf_model.predict_accuracy(architecture)
        latency = self.perf_model.predict_latency(architecture)
        memory = self.perf_model.predict_memory(architecture)
        energy = self.perf_model.predict_energy(architecture)
        
        # Convert to minimization problem (for accuracy: minimize error)
        error = 1.0 - accuracy
        
        return error, latency, memory, energy
    
    def dominates(self, obj1: Tuple[float, ...], obj2: Tuple[float, ...]) -> bool:
        """Check if obj1 dominates obj2 (Pareto dominance)"""
        better_in_all = all(o1 <= o2 for o1, o2 in zip(obj1, obj2))
        better_in_one = any(o1 < o2 for o1, o2 in zip(obj1, obj2))
        return better_in_all and better_in_one
    
    def find_pareto_front(self, population: List[Dict], objectives: List[Tuple]) -> List[int]:
        """Find Pareto-optimal solutions"""
        pareto_front = []
        
        for i, obj_i in enumerate(objectives):
            is_dominated = False
            for j, obj_j in enumerate(objectives):
                if i != j and self.dominates(obj_j, obj_i):
                    is_dominated = True
                    break
            
            if not is_dominated:
                pareto_front.append(i)
        
        return pareto_front
    
    def crowding_distance(self, objectives: List[Tuple], front_indices: List[int]) -> List[float]:
        """Calculate crowding distance for diversity preservation"""
        if len(front_indices) <= 2:
            return [float('inf')] * len(front_indices)
        
        distances = [0.0] * len(front_indices)
        n_objectives = len(objectives[0])
        
        for obj_idx in range(n_objectives):
            # Sort by objective value
            sorted_indices = sorted(front_indices, key=lambda i: objectives[i][obj_idx])
            
            # Boundary points get infinite distance
            obj_values = [objectives[i][obj_idx] for i in sorted_indices]
            obj_range = max(obj_values) - min(obj_values)
            
            if obj_range > 0:
                for i in range(1, len(sorted_indices) - 1):
                    idx_in_front = front_indices.index(sorted_indices[i])
                    distance_contribution = (obj_values[i+1] - obj_values[i-1]) / obj_range
                    distances[idx_in_front] += distance_contribution
        
        # Set boundary distances to infinity
        if len(front_indices) > 2:
            distances[0] = float('inf')
            distances[-1] = float('inf')
        
        return distances
    
    def tournament_selection(self, population: List[Dict], objectives: List[Tuple], 
                           fronts: List[List[int]], distances: List[float]) -> Dict:
        """Tournament selection based on Pareto rank and crowding distance"""
        # Select two random individuals
        idx1, idx2 = random.sample(range(len(population)), 2)
        
        # Find their fronts
        front1 = next(i for i, front in enumerate(fronts) if idx1 in front)
        front2 = next(i for i, front in enumerate(fronts) if idx2 in front)
        
        # Select based on Pareto rank (lower is better)
        if front1 < front2:
            return population[idx1]
        elif front2 < front1:
            return population[idx2]
        else:
            # Same front, select based on crowding distance (higher is better)
            if distances[idx1] > distances[idx2]:
                return population[idx1]
            else:
                return population[idx2]
    
    def mutate_architecture(self, architecture: Dict[str, Any]) -> Dict[str, Any]:
        """Mutate architecture with small random changes"""
        mutated = {
            'layers': [layer.copy() for layer in architecture['layers']],
            'depth': architecture['depth'],
            'global_pool': architecture['global_pool'],
            'dropout_rate': architecture['dropout_rate']
        }
        
        # Random mutations
        if random.random() < 0.3:  # Change layer type
            if mutated['layers']:
                layer_idx = random.randint(0, len(mutated['layers']) - 1)
                mutated['layers'][layer_idx]['type'] = random.choice(list(self.search_space.layer_types.keys()))
        
        if random.random() < 0.3:  # Change channel width
            if mutated['layers']:
                layer_idx = random.randint(0, len(mutated['layers']) - 1)
                mutated['layers'][layer_idx]['channels'] = random.choice(self.search_space.channel_widths)
        
        if random.random() < 0.2:  # Change activation
            if mutated['layers']:
                layer_idx = random.randint(0, len(mutated['layers']) - 1)
                mutated['layers'][layer_idx]['activation'] = random.choice(self.search_space.activations)
        
        if random.random() < 0.1:  # Change global pooling
            mutated['global_pool'] = random.choice(['avg', 'max', 'adaptive'])
        
        if random.random() < 0.1:  # Change dropout
            mutated['dropout_rate'] = max(0.0, min(0.5, mutated['dropout_rate'] + random.gauss(0, 0.1)))
        
        return mutated
    
    def search(self) -> Tuple[List[Dict], List[Tuple]]:
        """Run multi-objective neural architecture search"""
        print(f"🔍 Starting Multi-Objective NAS...")
        print(f"   Population size: {self.population_size}")
        print(f"   Generations: {self.generations}")
        
        # Initialize population
        population = [self.search_space.sample_architecture() for _ in range(self.population_size)]
        
        best_pareto_front = []
        best_objectives = []
        
        for generation in range(self.generations):
            print(f"\n📊 Generation {generation + 1}/{self.generations}")
            
            # Evaluate all architectures
            objectives = [self.evaluate_architecture(arch) for arch in population]
            
            # Find Pareto fronts
            all_fronts = []
            remaining_indices = list(range(len(population)))
            
            while remaining_indices:
                remaining_objectives = [objectives[i] for i in remaining_indices]
                front_in_remaining = self.find_pareto_front(population, remaining_objectives)
                front_indices = [remaining_indices[i] for i in front_in_remaining]
                all_fronts.append(front_indices)
                remaining_indices = [i for i in remaining_indices if i not in front_indices]
            
            # Calculate crowding distances
            all_distances = [0.0] * len(population)
            for front in all_fronts:
                distances = self.crowding_distance(objectives, front)
                for i, idx in enumerate(front):
                    all_distances[idx] = distances[i]
            
            # Store best solutions from first front
            if all_fronts:
                current_pareto_front = [population[i] for i in all_fronts[0]]
                current_pareto_objectives = [objectives[i] for i in all_fronts[0]]
                
                if not best_pareto_front or len(current_pareto_front) > len(best_pareto_front):
                    best_pareto_front = current_pareto_front
                    best_objectives = current_pareto_objectives
                
                # Print progress
                best_accuracy = 1 - min(obj[0] for obj in current_pareto_objectives)
                best_latency = min(obj[1] for obj in current_pareto_objectives)
                print(f"   Pareto front size: {len(current_pareto_front)}")
                print(f"   Best accuracy: {best_accuracy:.3f}")
                print(f"   Best latency: {best_latency:.1f} ms")
            
            # Generate next generation
            if generation < self.generations - 1:
                new_population = []
                
                while len(new_population) < self.population_size:
                    # Selection
                    parent = self.tournament_selection(population, objectives, all_fronts, all_distances)
                    
                    # Mutation
                    child = self.mutate_architecture(parent)
                    new_population.append(child)
                
                population = new_population
        
        print(f"\n✅ Multi-Objective NAS completed!")
        print(f"   Final Pareto front size: {len(best_pareto_front)}")
        
        return best_pareto_front, best_objectives

# Initialize and run NAS
nas_optimizer = MultiObjectiveNAS(
    search_space=search_space,
    perf_model=perf_model,
    population_size=30,  # Smaller for demonstration
    generations=10
)

print("✅ Multi-Objective NAS optimizer initialized")

## 🚀 Run Neural Architecture Search

In [None]:
# Run the search process
print("🎯 Starting Neural Architecture Search for Edge AI...")
print("=" * 60)

pareto_architectures, pareto_objectives = nas_optimizer.search()

print("\n" + "=" * 60)
print("🏆 PARETO-OPTIMAL ARCHITECTURES FOUND")
print("=" * 60)

# Analyze results
for i, (arch, obj) in enumerate(zip(pareto_architectures, pareto_objectives)):
    error, latency, memory, energy = obj
    accuracy = 1 - error
    
    print(f"\n🏗️ Architecture {i+1}:")
    print(f"   Accuracy: {accuracy:.3f}")
    print(f"   Latency: {latency:.1f} ms")
    print(f"   Memory: {memory:.1f} MB")
    print(f"   Energy: {energy:.1f} mJ")
    print(f"   Depth: {arch['depth']} layers")
    print(f"   Avg channels: {np.mean([l['channels'] for l in arch['layers']]):.0f}")
    
    # Check constraints
    meets_latency = latency <= edge_hardware.max_latency_ms
    meets_memory = memory <= edge_hardware.memory_mb
    meets_energy = energy <= edge_hardware.power_budget_mw
    
    constraints_met = sum([meets_latency, meets_memory, meets_energy])
    print(f"   Constraints met: {constraints_met}/3 {'✅' if constraints_met == 3 else '⚠️'}")

print(f"\n📊 Search completed with {len(pareto_architectures)} Pareto-optimal solutions")

## 📈 Pareto Front Visualization & Analysis

In [None]:
# Extract metrics for visualization
accuracies = [1 - obj[0] for obj in pareto_objectives]
latencies = [obj[1] for obj in pareto_objectives]
memories = [obj[2] for obj in pareto_objectives]
energies = [obj[3] for obj in pareto_objectives]

# Create comprehensive visualization
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Multi-Objective NAS: Pareto-Optimal Architecture Analysis', fontsize=16, fontweight='bold')

# 1. Accuracy vs Latency Trade-off
scatter1 = ax1.scatter(latencies, accuracies, c=memories, cmap='viridis', s=100, alpha=0.7)
ax1.set_xlabel('Inference Latency (ms)')
ax1.set_ylabel('Accuracy')
ax1.set_title('Accuracy vs Latency Trade-off\n(Color = Memory Usage)')
ax1.axvline(x=edge_hardware.max_latency_ms, color='red', linestyle='--', alpha=0.7, label='Latency Limit')
ax1.grid(True, alpha=0.3)
ax1.legend()
plt.colorbar(scatter1, ax=ax1, label='Memory (MB)')

# 2. Memory vs Energy Trade-off
scatter2 = ax2.scatter(memories, energies, c=accuracies, cmap='plasma', s=100, alpha=0.7)
ax2.set_xlabel('Memory Usage (MB)')
ax2.set_ylabel('Energy Consumption (mJ)')
ax2.set_title('Memory vs Energy Trade-off\n(Color = Accuracy)')
ax2.axvline(x=edge_hardware.memory_mb, color='red', linestyle='--', alpha=0.7, label='Memory Limit')
ax2.axhline(y=edge_hardware.power_budget_mw, color='red', linestyle='--', alpha=0.7, label='Energy Limit')
ax2.grid(True, alpha=0.3)
ax2.legend()
plt.colorbar(scatter2, ax=ax2, label='Accuracy')

# 3. Architecture Complexity Analysis
depths = [arch['depth'] for arch in pareto_architectures]
avg_channels = [np.mean([l['channels'] for l in arch['layers']]) for arch in pareto_architectures]

scatter3 = ax3.scatter(depths, avg_channels, c=accuracies, cmap='coolwarm', s=100, alpha=0.7)
ax3.set_xlabel('Network Depth (layers)')
ax3.set_ylabel('Average Channel Width')
ax3.set_title('Architecture Complexity vs Performance\n(Color = Accuracy)')
ax3.grid(True, alpha=0.3)
plt.colorbar(scatter3, ax=ax3, label='Accuracy')

# 4. Pareto Front 3D Projection
from mpl_toolkits.mplot3d import Axes3D
ax4.remove()
ax4 = fig.add_subplot(224, projection='3d')

scatter4 = ax4.scatter(latencies, memories, accuracies, c=energies, cmap='hot', s=60, alpha=0.8)
ax4.set_xlabel('Latency (ms)')
ax4.set_ylabel('Memory (MB)')
ax4.set_zlabel('Accuracy')
ax4.set_title('3D Pareto Front\n(Color = Energy)')
plt.colorbar(scatter4, ax=ax4, label='Energy (mJ)', shrink=0.8)

plt.tight_layout()
plt.show()

print("✅ Pareto front visualization complete")

## 🎯 Hardware-Specific Architecture Recommendations

In [None]:
def analyze_pareto_solutions(architectures: List[Dict], objectives: List[Tuple], 
                           hardware_spec: EdgeHardwareSpec) -> Dict[str, Any]:
    """Analyze Pareto solutions and provide hardware-specific recommendations"""
    
    analysis = {
        'feasible_solutions': [],
        'best_accuracy': None,
        'best_efficiency': None,
        'best_balanced': None,
        'recommendations': []
    }
    
    # Find feasible solutions (meet all constraints)
    for i, (arch, obj) in enumerate(zip(architectures, objectives)):
        error, latency, memory, energy = obj
        accuracy = 1 - error
        
        meets_latency = latency <= hardware_spec.max_latency_ms
        meets_memory = memory <= hardware_spec.memory_mb
        meets_energy = energy <= hardware_spec.power_budget_mw
        
        if meets_latency and meets_memory and meets_energy:
            analysis['feasible_solutions'].append({
                'index': i,
                'architecture': arch,
                'metrics': {
                    'accuracy': accuracy,
                    'latency': latency,
                    'memory': memory,
                    'energy': energy
                }
            })
    
    if not analysis['feasible_solutions']:
        analysis['recommendations'].append("⚠️ No architectures found that meet all hardware constraints")
        analysis['recommendations'].append("💡 Consider relaxing constraints or using more aggressive optimization")
        return analysis
    
    # Find best solutions by different criteria
    feasible = analysis['feasible_solutions']
    
    # Best accuracy
    best_acc_idx = max(range(len(feasible)), key=lambda i: feasible[i]['metrics']['accuracy'])
    analysis['best_accuracy'] = feasible[best_acc_idx]
    
    # Best efficiency (lowest latency + memory + energy, normalized)
    def efficiency_score(sol):
        m = sol['metrics']
        norm_latency = m['latency'] / hardware_spec.max_latency_ms
        norm_memory = m['memory'] / hardware_spec.memory_mb
        norm_energy = m['energy'] / hardware_spec.power_budget_mw
        return norm_latency + norm_memory + norm_energy
    
    best_eff_idx = min(range(len(feasible)), key=lambda i: efficiency_score(feasible[i]))
    analysis['best_efficiency'] = feasible[best_eff_idx]
    
    # Best balanced (highest accuracy * efficiency score)
    def balanced_score(sol):
        return sol['metrics']['accuracy'] / (1 + efficiency_score(sol))
    
    best_bal_idx = max(range(len(feasible)), key=lambda i: balanced_score(feasible[i]))
    analysis['best_balanced'] = feasible[best_bal_idx]
    
    # Generate recommendations
    analysis['recommendations'].append(f"✅ Found {len(feasible)} feasible architectures")
    
    if len(feasible) >= 3:
        analysis['recommendations'].append("🎯 Multiple viable options available - consider use case priorities")
    
    # Hardware-specific recommendations
    if hardware_spec.has_npu:
        analysis['recommendations'].append("⚡ Hardware has NPU - prioritize depthwise and separable convolutions")
    
    if hardware_spec.memory_mb < 1024:
        analysis['recommendations'].append("💾 Limited memory - consider more aggressive channel pruning")
    
    if hardware_spec.max_latency_ms < 50:
        analysis['recommendations'].append("⏱️ Strict latency requirements - focus on efficient operations")
    
    return analysis

# Analyze the Pareto solutions
analysis = analyze_pareto_solutions(pareto_architectures, pareto_objectives, edge_hardware)

print("🎯 HARDWARE-SPECIFIC ARCHITECTURE ANALYSIS")
print("=" * 60)

# Print recommendations
print("\n📋 RECOMMENDATIONS:")
for rec in analysis['recommendations']:
    print(f"   {rec}")

# Print best solutions
if analysis['feasible_solutions']:
    print(f"\n🏆 TOP ARCHITECTURE RECOMMENDATIONS:")
    
    categories = [
        ('Best Accuracy', analysis['best_accuracy']),
        ('Best Efficiency', analysis['best_efficiency']),
        ('Best Balanced', analysis['best_balanced'])
    ]
    
    for category, solution in categories:
        if solution:
            arch = solution['architecture']
            metrics = solution['metrics']
            
            print(f"\n🔹 {category}:")
            print(f"   Accuracy: {metrics['accuracy']:.3f}")
            print(f"   Latency: {metrics['latency']:.1f} ms")
            print(f"   Memory: {metrics['memory']:.1f} MB")
            print(f"   Energy: {metrics['energy']:.1f} mJ")
            print(f"   Architecture: {arch['depth']} layers, avg {np.mean([l['channels'] for l in arch['layers']]):.0f} channels")
            
            # Layer type distribution
            layer_types = [l['type'] for l in arch['layers']]
            type_counts = {t: layer_types.count(t) for t in set(layer_types)}
            print(f"   Layer distribution: {type_counts}")

print(f"\n📊 Analysis complete - {len(analysis['feasible_solutions'])} feasible solutions identified")

## 🔬 Advanced Research Extensions

### Research Questions for Further Investigation

In [None]:
class AdvancedNASResearch:
    """Advanced research extensions for Multi-Objective NAS"""
    
    def __init__(self):
        self.research_questions = [
            {
                'title': 'Dynamic Architecture Adaptation',
                'description': 'Architectures that adapt their complexity based on input difficulty or available resources',
                'implementation_hint': 'Use early exit strategies or dynamic channel selection',
                'paper_reference': 'Dynamic pruning techniques (O3BNN-R, FuPruner)'
            },
            {
                'title': 'Hardware-Software Co-Design',
                'description': 'Jointly optimize neural architecture and hardware configuration',
                'implementation_hint': 'Include hardware parameters (clock speed, memory bandwidth) in search space',
                'paper_reference': 'Hardware-software co-design approaches (Sun et al.)'
            },
            {
                'title': 'Multi-Task Architecture Search',
                'description': 'Find architectures that excel at multiple related tasks simultaneously',
                'implementation_hint': 'Use shared backbone with task-specific heads',
                'paper_reference': 'Multi-objective NAS for multiple tasks'
            },
            {
                'title': 'Uncertainty-Aware NAS',
                'description': 'Account for prediction uncertainty in performance models',
                'implementation_hint': 'Use Bayesian optimization or ensemble methods',
                'paper_reference': 'Robust optimization under uncertainty'
            }
        ]
    
    def propose_experiment(self, research_idx: int) -> Dict[str, str]:
        """Propose specific experiment for research question"""
        if research_idx >= len(self.research_questions):
            raise ValueError("Invalid research index")
        
        question = self.research_questions[research_idx]
        
        experiment = {
            'title': question['title'],
            'objective': question['description'],
            'methodology': self._generate_methodology(question),
            'expected_outcome': self._generate_expected_outcome(question),
            'evaluation_metrics': self._generate_evaluation_metrics(question)
        }
        
        return experiment
    
    def _generate_methodology(self, question: Dict) -> str:
        """Generate methodology based on research question"""
        methodologies = {
            'Dynamic Architecture Adaptation': '''
            1. Implement early exit mechanisms in architecture search space
            2. Add adaptive channel selection based on input complexity
            3. Train policies to decide when to use simplified vs full architecture
            4. Evaluate on datasets with varying complexity levels
            ''',
            'Hardware-Software Co-Design': '''
            1. Extend search space to include hardware parameters
            2. Model hardware performance with configurable parameters
            3. Use multi-level optimization (architecture + hardware)
            4. Validate on actual reconfigurable hardware platforms
            ''',
            'Multi-Task Architecture Search': '''
            1. Define multiple related tasks (e.g., classification + detection)
            2. Design shared backbone with task-specific heads
            3. Use multi-task loss functions in architecture evaluation
            4. Compare with single-task specialized architectures
            ''',
            'Uncertainty-Aware NAS': '''
            1. Model uncertainty in performance predictions
            2. Use Bayesian optimization for architecture search
            3. Include confidence intervals in Pareto optimization
            4. Validate robustness across different deployment scenarios
            '''
        }
        
        return methodologies.get(question['title'], 'Methodology to be defined')
    
    def _generate_expected_outcome(self, question: Dict) -> str:
        """Generate expected outcomes"""
        outcomes = {
            'Dynamic Architecture Adaptation': 'Architectures that maintain accuracy while reducing average computational cost by 30-50%',
            'Hardware-Software Co-Design': 'Joint optimization yielding 2-3x better efficiency than architecture-only optimization',
            'Multi-Task Architecture Search': 'Single architecture achieving 95%+ performance of specialized models on each task',
            'Uncertainty-Aware NAS': 'More robust architectures with better worst-case performance guarantees'
        }
        
        return outcomes.get(question['title'], 'Outcomes to be defined')
    
    def _generate_evaluation_metrics(self, question: Dict) -> List[str]:
        """Generate evaluation metrics"""
        metrics = {
            'Dynamic Architecture Adaptation': [
                'Average computational cost reduction',
                'Accuracy retention across complexity levels',
                'Adaptation policy effectiveness'
            ],
            'Hardware-Software Co-Design': [
                'Joint optimization improvement over sequential',
                'Hardware utilization efficiency',
                'Cost-performance trade-offs'
            ],
            'Multi-Task Architecture Search': [
                'Per-task performance vs specialized models',
                'Parameter sharing efficiency',
                'Task interference analysis'
            ],
            'Uncertainty-Aware NAS': [
                'Prediction confidence calibration',
                'Worst-case performance bounds',
                'Robustness across deployment scenarios'
            ]
        }
        
        return metrics.get(question['title'], ['Metrics to be defined'])

# Initialize research framework
research = AdvancedNASResearch()

print("🔬 ADVANCED RESEARCH EXTENSIONS")
print("=" * 60)

for i, question in enumerate(research.research_questions):
    print(f"\n{i+1}. {question['title']}")
    print(f"   📝 {question['description']}")
    print(f"   💡 Implementation: {question['implementation_hint']}")
    print(f"   📚 Paper ref: {question['paper_reference']}")

# Generate example experiment proposal
experiment = research.propose_experiment(0)  # Dynamic Architecture Adaptation

print(f"\n\n🧪 EXAMPLE EXPERIMENT PROPOSAL")
print("=" * 60)
print(f"Title: {experiment['title']}")
print(f"\nObjective: {experiment['objective']}")
print(f"\nMethodology: {experiment['methodology']}")
print(f"Expected Outcome: {experiment['expected_outcome']}")
print(f"\nEvaluation Metrics:")
for metric in experiment['evaluation_metrics']:
    print(f"   - {metric}")

print("\n✅ Research extensions defined - ready for advanced implementation")

## 📚 Key Takeaways & Summary

### 🎯 Concepts Mastered:

1. **Multi-Objective Optimization**: Successfully implemented NSGA-II algorithm for balancing accuracy, latency, memory, and energy

2. **Hardware-Aware Search**: Developed performance prediction models that account for edge device constraints

3. **Pareto Optimization**: Found multiple optimal trade-off solutions rather than single "best" architecture

4. **Architecture Encoding**: Created flexible search space representation for neural architectures

5. **Evolutionary Algorithms**: Applied tournament selection, mutation, and crowding distance for diversity

### 📊 Paper Implementation Results:

- **Search Space**: 6 layer types × 10 channel widths × 9 depth options = 540+ architecture combinations
- **Constraints**: Successfully modeled real edge device limitations (latency, memory, energy)
- **Optimization**: Found Pareto-optimal solutions balancing 4 competing objectives
- **Hardware Integration**: Performance models account for CPU cores, NPU availability, and memory limits

### 🔬 Research Extensions:

1. **Dynamic Adaptation**: Runtime architecture complexity adjustment
2. **Co-Design**: Joint hardware-software optimization
3. **Multi-Task**: Single architecture for multiple related tasks
4. **Uncertainty**: Robust optimization under prediction uncertainty

### 🎓 Learning Outcomes:

This notebook demonstrated the sophisticated intersection of:
- **Evolutionary Computation** (NSGA-II, Pareto optimization)
- **Hardware Modeling** (performance prediction, constraint satisfaction)
- **Neural Architecture Design** (flexible search spaces, encoding schemes)
- **Multi-Objective Decision Making** (trade-off analysis, solution selection)

---

**📄 Paper Citation**: Wang, X., & Jia, W. (2025). *Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies*. arXiv:2501.03265v1. **Sections 11-13**: Neural Architecture Search for Edge Deployment.

**🔗 Next**: Continue with **Focused Learning Notebook 2: Knowledge Distillation** to explore teacher-student model compression techniques.