# Chapter 6: Neural Architecture Search (NAS) - Code Examples

This companion notebook contains all the code examples from Chapter 6, organized for easy experimentation and learning. It covers the fundamental concepts of NAS, from designing search spaces to implementing efficient search strategies and using practical tools.

### Contents:
1. Environment Setup and Dependencies
2. Search Space Design
3. Search Strategies
4. Performance Estimation Strategies
5. Efficient NAS Techniques
6. Practical Tools and Applications
7. Summary and Best Practices
8. Final Notes and Additional Resources

## 1. Environment Setup and Dependencies

Before running this notebook, ensure you have a compatible environment. The NAS
frameworks in this chapter require specific version combinations to work correctly.

### Known Compatibility Issue

TensorFlow and protobuf must use compatible versions. If you encounter:
```
AttributeError: 'google._upb._message.FieldDescriptor' object has no attribute 'is_repeated'
```

This means protobuf 5.x is installed, which is incompatible with TensorFlow 2.15-2.16.
The setup cell below handles this automatically.

### Recommended Setup

**Option A: Conda (recommended)**
```bash
conda env create -f chapter6_environment.yml
conda activate automl-ch6
```

**Option B: pip**
```bash
python -m venv automl-ch6
source automl-ch6/bin/activate
pip install -r requirements.txt
```

In [None]:
import subprocess
import sys

def check_and_fix_environment():
    """Check for known compatibility issues and report them."""
    
    print("Checking environment compatibility...\n")
    
    # Check protobuf version
    try:
        import google.protobuf
        pb_version = google.protobuf.__version__
        print(f"  protobuf: {pb_version}")
        
        # Protobuf 5.x causes issues with TensorFlow
        if pb_version.startswith("5."):
            print("\nWARNING: protobuf 5.x detected. For best compatibility, run:")
            print("  pip install 'protobuf>=4.23,<5.0'")
            print("Then restart the kernel.\n")
    except ImportError:
        print("  protobuf: not installed")
    
    # Check TensorFlow
    try:
        import tensorflow as tf
        print(f"  tensorflow: {tf.__version__}")
    except ImportError:
        print("  tensorflow: not installed")
    except Exception as e:
        print(f"  tensorflow: error importing - {e}")
        print("\nWARNING: TensorFlow import failed. This may be a protobuf issue.")
        print("  Try: pip install 'protobuf>=4.23,<5.0'")
        print("  Then restart the kernel.\n")
    
    # Check PyTorch
    try:
        import torch
        print(f"  pytorch: {torch.__version__}")
        print(f"  CUDA available: {torch.cuda.is_available()}")
    except ImportError:
        print("  pytorch: not installed")
    
    print("\nEnvironment check complete!")

# Run the check
check_and_fix_environment()

# =============================================================================
# CELL 3 (Code) - Install Missing Packages (if needed)
# =============================================================================

# Uncomment and run if packages are missing
# Note: In Google Colab, use these install commands

# !pip install torch torchvision -q
# !pip install "tensorflow>=2.15,<2.17" -q
# !pip install "protobuf>=4.23,<5.0" -q  # Critical: must be <5.0
# !pip install autokeras -q
# !pip install keras-tuner -q
# !pip install optuna -q
# !pip install "ray[tune]" -q

In [None]:
# Core dependencies
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import random
import time
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# TensorFlow and Keras imports (used in specific sections)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import autokeras as ak
import keras_tuner as kt

# Ray and Optuna imports (used in specific sections)
import optuna
from ray import tune
from ray.tune.search.optuna import OptunaSearch
from ray.tune.schedulers import ASHAScheduler

# Reproducibility
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)

print("All dependencies imported successfully!")

## 2. Search Space Design

### 2.1 Cell-Based Search Space with AutoKeras

In [None]:
def create_nas_model_autokeras():
    """
    Define a search space for image classification using the high-level AutoKeras API.
    AutoKeras handles the search space internally.
    """
    model = ak.ImageClassifier(
        num_classes=10,  # Adjust based on your dataset
        multi_label=False,
        max_trials=10, # Number of different architectures to try
        directory='autokeras_nas_results',
        project_name='image_classification_nas'
    )
    return model

# Example usage:
autokeras_model = create_nas_model_autokeras()
print("AutoKeras model with a predefined search space created successfully!")
print(autokeras_model)

### 2.2 Custom Search Space with KerasTuner

In [None]:
class NASSearchSpace(kt.HyperModel):
    """Custom search space for Neural Architecture Search using KerasTuner."""

    def build(self, hp):
        model = keras.Sequential()
        model.add(keras.layers.Input(shape=(32, 32, 3)))

        # Search over the number of convolutional blocks
        num_blocks = hp.Int('num_blocks', min_value=2, max_value=5, step=1)

        for i in range(num_blocks):
            filters = hp.Choice(f'filters_{i}', values=[32, 64, 128])
            kernel_size = hp.Choice(f'kernel_size_{i}', values=[3, 5])
            activation = hp.Choice(f'activation_{i}', values=['relu', 'swish'])

            model.add(keras.layers.Conv2D(filters, kernel_size, activation=activation, padding='same'))

            if hp.Boolean(f'batch_norm_{i}'):
                model.add(keras.layers.BatchNormalization())

            if i % 2 == 1:
                model.add(keras.layers.MaxPooling2D(2))

        model.add(keras.layers.GlobalAveragePooling2D())
        model.add(keras.layers.Dense(10, activation='softmax')) # Assuming 10 classes

        learning_rate = hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='log')
        optimizer = keras.optimizers.Adam(learning_rate=learning_rate)

        model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
        return model

def run_keras_tuner_search(x_train, y_train, x_val, y_val):
    """Run Neural Architecture Search with Keras Tuner."""
    tuner = kt.RandomSearch(
        NASSearchSpace(),
        objective='val_accuracy',
        max_trials=15,
        directory='kerastuner_nas_results',
        project_name='custom_nas_search'
    )

    tuner.search(
        x_train, y_train,
        epochs=10,
        validation_data=(x_val, y_val),
        callbacks=[keras.callbacks.EarlyStopping(patience=3)]
    )

    best_model = tuner.get_best_models(num_models=1)[0]
    best_hyperparameters = tuner.get_best_hyperparameters(num_trials=1)[0]

    return best_model, best_hyperparameters

print("KerasTuner search space defined successfully!")

### 2.3 Programmatic Search Space Definition

In [None]:
class SearchSpaceConfig:
    """Configuration class for defining custom search spaces programmatically."""
    def __init__(self):
        self.layer_types = ['conv', 'depthwise_conv', 'dilated_conv', 'skip_connect']
        self.filter_sizes = [16, 32, 64, 128, 256]
        self.kernel_sizes = [3, 5, 7]
        self.activation_functions = ['relu', 'swish', 'gelu']
        self.depth_range = (5, 20)

    def sample_architecture(self, depth=None):
        """Sample a random architecture from the search space."""
        if depth is None:
            depth = random.randint(*self.depth_range)

        architecture = []
        for i in range(depth):
            layer_config = {
                'type': random.choice(self.layer_types),
                'filters': random.choice(self.filter_sizes),
                'kernel_size': random.choice(self.kernel_sizes),
                'activation': random.choice(self.activation_functions)
            }
            architecture.append(layer_config)
        return architecture

    def get_search_space_size(self):
        """Calculate the theoretical size of the search space."""
        choices_per_layer = (len(self.layer_types) * len(self.filter_sizes) * \
                           len(self.kernel_sizes) * len(self.activation_functions))

        total_size = sum(choices_per_layer ** d for d in range(*self.depth_range))
        return total_size

# Example usage:
search_config = SearchSpaceConfig()
sample_arch = search_config.sample_architecture(depth=10)
print(f"Sampled Architecture (first 3 layers): {sample_arch[:3]}...")
print(f"Theoretical Search Space Size: {search_config.get_search_space_size():.2e}")

## 3. Search Strategies

### 3.1 Evolutionary Neural Architecture Search

In [None]:
def evaluate_architecture_placeholder(architecture):
    """Placeholder evaluation function. In practice, this would train the model."""
    score = 0.5  # Base score
    # Reward skip connections and modern activations
    score += sum(1 for l in architecture if l['type'] == 'skip_connect') * 0.02
    score += sum(1 for l in architecture if l['activation'] in ['swish', 'gelu']) * 0.01
    # Penalize overly deep or shallow networks
    if 8 <= len(architecture) <= 15: score += 0.1
    score += random.gauss(0, 0.05) # Simulate training variance
    return max(0, min(1, score))

def mutate_architecture(architecture, search_config, mutation_rate=0.3):
    """Mutate an architecture by randomly changing some layers."""
    mutated = []
    for layer in architecture:
        if random.random() < mutation_rate:
            mutated.append(search_config.sample_architecture(depth=1)[0])
        else:
            mutated.append(layer.copy())
    return mutated

def evolutionary_nas(population_size=50, generations=20, search_config=None):
    """Simplified evolutionary NAS implementation."""
    search_config = search_config or SearchSpaceConfig()
    print("Starting Evolutionary NAS...")

    # Initialize population
    population = [search_config.sample_architecture() for _ in range(population_size)]
    fitness_history = []

    for gen in range(generations):
        fitness_scores = [evaluate_architecture_placeholder(arch) for arch in population]
        fitness_history.append(max(fitness_scores))

        # Select top 50% as survivors
        combined = sorted(zip(population, fitness_scores), key=lambda x: x[1], reverse=True)
        survivors = [arch for arch, _ in combined[:population_size // 2]]

        # Create next generation via mutation
        new_population = survivors + [mutate_architecture(p, search_config) for p in survivors]
        population = new_population

        if gen % 5 == 0:
            print(f"  Generation {gen}: Best fitness = {max(fitness_scores):.4f}")

    final_scores = [evaluate_architecture_placeholder(arch) for arch in population]
    best_arch = population[np.argmax(final_scores)]
    print(f"Finished! Best architecture has {len(best_arch)} layers.")
    return best_arch, fitness_history

# Example usage:
best_arch_evo, history_evo = evolutionary_nas(population_size=20, generations=50)


### 3.2 Differentiable Architecture Search (DARTS)

In [None]:
class MixedOperation(nn.Module):
    """A mixed operation that combines multiple candidate operations via learnable weights."""
    def __init__(self, channels, operations=['conv3', 'conv5', 'maxpool', 'skip']):
        super().__init__()
        self.ops = nn.ModuleList()
        for op_name in operations:
            if op_name == 'conv3': op = nn.Conv2d(channels, channels, 3, padding=1, bias=False)
            elif op_name == 'conv5': op = nn.Conv2d(channels, channels, 5, padding=2, bias=False)
            elif op_name == 'maxpool': op = nn.MaxPool2d(3, stride=1, padding=1)
            elif op_name == 'skip': op = nn.Identity()
            else: raise ValueError(f"Unknown op: {op_name}")
            self.ops.append(op)
        self.weights = nn.Parameter(torch.randn(len(operations)))

    def forward(self, x):
        weights_sm = F.softmax(self.weights, dim=0)
        return sum(w * op(x) for w, op in zip(weights_sm, self.ops))

class DARTSCell(nn.Module):
    """A DARTS cell that searches over a DAG of mixed operations."""
    def __init__(self, channels=16, num_nodes=4):
        super().__init__()
        self.num_nodes = num_nodes
        self.operations = ['conv3', 'conv5', 'maxpool', 'skip']
        self.mixed_ops = nn.ModuleList()
        for i in range(num_nodes):
            for j in range(i + 2):
                self.mixed_ops.append(MixedOperation(channels, self.operations))

    def forward(self, s0, s1):
        states = [s0, s1]
        op_idx = 0
        for i in range(self.num_nodes):
            node_out = sum(self.mixed_ops[op_idx+j](s) for j, s in enumerate(states))
            op_idx += len(states)
            states.append(node_out)
        return torch.cat(states[-self.num_nodes:], dim=1)

# Simplified DARTS network for demonstration
class DARTSNetwork(nn.Module):
    def __init__(self, num_classes=10, channels=16, num_cells=3, num_nodes=4):
        super().__init__()
        self.stem0 = nn.Conv2d(3, channels, 3, padding=1)
        self.stem1 = nn.Conv2d(3, channels, 3, padding=1)
        self.cells = nn.ModuleList([DARTSCell(channels, num_nodes) for _ in range(num_cells)])
        # Project channels*num_nodes back to channels between cells
        self.channel_projections = nn.ModuleList([
            nn.Conv2d(channels * num_nodes, channels, 1) for _ in range(num_cells - 1)
        ])
        self.pool = nn.AdaptiveAvgPool2d(1)
        # Final cell output has channels*num_nodes channels
        self.classifier = nn.Linear(channels * num_nodes, num_classes)

    def forward(self, x):
        s0 = self.stem0(x)
        s1 = self.stem1(x)
        for i, cell in enumerate(self.cells):
            s0, s1 = s1, cell(s0, s1)
            if i < len(self.channel_projections):
                s1 = self.channel_projections[i](s1)
        out = self.pool(s1).view(s1.size(0), -1)
        return self.classifier(out)

# Example usage:
darts_model = DARTSNetwork()
print(f"DARTS model created with {sum(p.numel() for p in darts_model.parameters()):,} parameters")

## 4. Performance Estimation Strategies

### 4.1 Successive Halving for Multi-fidelity Evaluation

In [None]:
def train_and_evaluate_placeholder(architecture, epochs):
    """Placeholder for training. Performance improves with more epochs."""
    time.sleep(0.005 * epochs)  # Simulate training time
    base_score = evaluate_architecture_placeholder(architecture)
    epoch_bonus = 0.15 * (1 - np.exp(-epochs / 40.0))
    final_score = min(1.0, base_score + epoch_bonus)
    return max(0, final_score + random.gauss(0, 0.02))

def successive_halving_nas(architectures, max_epochs=81, reduction_factor=3):
    """Implement Successive Halving for efficient architecture evaluation."""
    print(f"Starting Successive Halving with {len(architectures)} candidates...")
    candidates = architectures.copy()
    epochs = max_epochs // (reduction_factor ** (int(np.log(len(architectures))/np.log(reduction_factor))))

    round_num = 1
    while len(candidates) > 1 and epochs <= max_epochs:
        print(f"\n  Round {round_num}: Evaluating {len(candidates)} candidates for {epochs} epochs")
        results = [(arch, train_and_evaluate_placeholder(arch, epochs)) for arch in candidates]

        results.sort(key=lambda x: x[1], reverse=True)
        keep_count = max(1, len(results) // reduction_factor)
        candidates = [arch for arch, _ in results[:keep_count]]

        print(f"    Best score: {results[0][1]:.4f}. Keeping top {keep_count} candidates.")
        epochs *= reduction_factor
        round_num += 1

    print(f"\nSuccessive Halving finished. Best architecture found:")
    return candidates[0] if candidates else None

# Example usage:
search_config_sh = SearchSpaceConfig()
sample_architectures_sh = [search_config_sh.sample_architecture() for _ in range(27)]
best_architecture_sh = successive_halving_nas(sample_architectures_sh, max_epochs=81, reduction_factor=3)
print(f"  - Depth: {len(best_architecture_sh)}")


### 4.2 One-Shot Architecture Search (Weight Sharing)

In [None]:
class Supernet(nn.Module):
    """Supernet for one-shot NAS, containing all possible operations and sharing weights."""
    def __init__(self, search_config, num_classes=10, max_depth=10, channels=32):
        super().__init__()
        self.max_depth = max_depth
        self.stem = nn.Conv2d(3, channels, 3, padding=1)
        self.layers = nn.ModuleList()
        for _ in range(max_depth):
            layer_ops = nn.ModuleDict()
            for op_type in search_config.layer_types:
                if op_type == 'skip_connect':
                    layer_ops[op_type] = nn.Identity()
                else:
                    # Simplified: all convs have same kernel size here
                    layer_ops[op_type] = nn.Conv2d(channels, channels, 3, padding=1)
            self.layers.append(layer_ops)
        self.head = nn.Linear(channels, num_classes)
        self.pool = nn.AdaptiveAvgPool2d(1)

    def forward(self, x, architecture):
        """Forward pass using a specific architecture path."""
        x = self.stem(x)
        for i, layer_config in enumerate(architecture):
            if i < self.max_depth:
                op_type = layer_config['type']
                x = self.layers[i][op_type](x)
        x = self.pool(x).view(x.size(0), -1)
        return self.head(x)

def evaluate_architecture_fast(supernet, architecture, test_loader):
    """Evaluate an architecture quickly using the pre-trained supernet weights."""
    supernet.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for data, targets in test_loader:
            outputs = supernet(data, architecture)
            _, predicted = torch.max(outputs.data, 1)
            total += targets.size(0)
            correct += (predicted == targets).sum().item()
    return correct / total

# Example usage:
search_config_os = SearchSpaceConfig()
supernet_os = Supernet(search_config_os)
sample_arch_os = search_config_os.sample_architecture(depth=5)
print(f"Supernet created with {sum(p.numel() for p in supernet_os.parameters()):,} parameters.")
print("A single training run on the Supernet enables fast evaluation of many sub-architectures.")


### 4.3 Learning Curve Extrapolation

In [None]:
def exponential_curve(x, a, b, c):
    """Exponential saturation curve for fitting learning curves."""
    return a * (1 - np.exp(-b * x)) + c

def predict_final_accuracy(early_accuracies, target_epochs, plot=False):
    """Predict final accuracy from early training epochs."""
    if len(early_accuracies) < 3: return early_accuracies[-1]
    epochs = np.arange(1, len(early_accuracies) + 1)
    try:
        params, _ = curve_fit(exponential_curve, epochs, early_accuracies,
                                p0=[max(early_accuracies), 0.1, min(early_accuracies)], maxfev=1000)
        predicted = min(1.0, exponential_curve(target_epochs, *params))
        if plot:
            plt.figure(figsize=(8, 5))
            plt.plot(epochs, early_accuracies, 'bo', label='Observed')
            extended_epochs = np.linspace(1, target_epochs, 100)
            plt.plot(extended_epochs, exponential_curve(extended_epochs, *params), 'r-', label='Fitted Curve')
            plt.axhline(y=predicted, color='r', linestyle='--', label=f'Predicted Acc: {predicted:.3f}')
            plt.title('Learning Curve Extrapolation')
            plt.xlabel('Epochs'); plt.ylabel('Accuracy'); plt.legend(); plt.grid(True); plt.show()
        return predicted
    except Exception:
        return early_accuracies[-1]

# Simulate a learning curve
true_final_acc = 0.85
simulated_curve = true_final_acc * (1 - np.exp(-np.arange(1, 21) / 8)) + np.random.normal(0, 0.02, 20)

# Predict from first 5 epochs
predicted_acc = predict_final_accuracy(simulated_curve[:5], target_epochs=20, plot=True)
print(f"Predicted Final Accuracy: {predicted_acc:.3f}")
print(f"True Final Accuracy: {simulated_curve[-1]:.3f}")

### 4.4 Zero-Cost Proxies

In [None]:
def snip_score(model, data_sample):
    """Compute SNIP score (gradient magnitude) for architecture ranking."""
    model.train()
    data, targets = data_sample
    outputs = model(data)
    loss = F.cross_entropy(outputs, targets)
    grads = torch.autograd.grad(loss, model.parameters(), create_graph=False)
    return sum(torch.sum(torch.abs(g)) for g in grads if g is not None).item()

def connectivity_proxy(architecture):
    """Measure architecture connectivity as a simple proxy for performance."""
    skips = sum(1 for layer in architecture if layer['type'] == 'skip_connect')
    return skips / max(1, len(architecture))

class ZeroCostFilter:
    """Multi-proxy filter for rapid architecture screening."""
    def filter_architectures(self, architectures, keep_fraction=0.1):
        scores = [connectivity_proxy(arch) for arch in architectures]
        sorted_indices = np.argsort(scores)[::-1]
        keep_count = int(len(architectures) * keep_fraction)
        return [architectures[i] for i in sorted_indices[:keep_count]]

# Example usage:
zc_filter = ZeroCostFilter()
search_config_zc = SearchSpaceConfig()
sample_architectures_zc = [search_config_zc.sample_architecture() for _ in range(1000)]
filtered_archs_zc = zc_filter.filter_architectures(sample_architectures_zc, keep_fraction=0.1)
print(f"Zero-cost proxies filtered {len(sample_architectures_zc)} archs down to {len(filtered_archs_zc)}.")


## 5. Efficient NAS Techniques

### 5.1 Supernet Training with Balanced Sampling

In [None]:
class CorrectSupernet(nn.Module):
    """A Supernet containing multiple choices for each layer, with corrected channel sizes."""
    def __init__(self, num_classes=10):
        super(CorrectSupernet, self).__init__()
        self.conv1_choices = nn.ModuleDict({'conv3x3': nn.Conv2d(3, 16, 3, p=1), 'conv5x5': nn.Conv2d(3, 16, 5, p=2)})
        self.conv2_choices = nn.ModuleDict({'conv3x3': nn.Conv2d(16, 32, 3, p=1), 'skip_connect': nn.Conv2d(16, 32, 1)})
        self.conv3_choices = nn.ModuleDict({'conv3x3': nn.Conv2d(32, 64, 3, p=1), 'conv1x1': nn.Conv2d(32, 64, 1)})
        self.pool = nn.MaxPool2d(2)
        self.fc = nn.Linear(64 * 4 * 4, num_classes)

    def forward(self, x, architecture):
        op1 = self.conv1_choices[architecture[0]]
        x = self.pool(F.relu(op1(x)))
        op2 = self.conv2_choices[architecture[1]]
        x = self.pool(F.relu(op2(x)))
        op3 = self.conv3_choices[architecture[2]]
        x = self.pool(F.relu(op3(x)))
        return self.fc(x.view(x.size(0), -1))

def train_supernet_balanced(supernet, data_loader, epochs=5):
    """Trains the supernet by uniformly sampling architectures."""
    optimizer = torch.optim.Adam(supernet.parameters(), lr=1e-3)
    choices = [['conv3x3', 'conv5x5'], ['conv3x3', 'skip_connect'], ['conv3x3', 'conv1x1']]
    supernet.train()
    print("Training Supernet with balanced sampling...")
    for epoch in range(epochs):
        for i, (data, target) in enumerate(data_loader):
            optimizer.zero_grad()
            # Uniformly sample a random architecture for this batch
            arch = [random.choice(c) for c in choices]
            output = supernet(data, arch)
            loss = F.cross_entropy(output, target)
            loss.backward()
            optimizer.step()
            if i % 100 == 0: print(f"  Epoch {epoch}, Batch {i}, Loss: {loss.item():.4f}")
    print("Supernet training complete.")

### 5.2 Once-For-All (OFA) Networks

In [None]:
class OFABlock(nn.Module):
    """An OFA block with elastic kernel size."""
    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()
        # Store conv layers for each kernel size
        self.conv_k3 = nn.Conv2d(in_channels, out_channels, 3, stride, padding=1, bias=False)
        self.conv_k5 = nn.Conv2d(in_channels, out_channels, 5, stride, padding=2, bias=False)
        self.conv_k7 = nn.Conv2d(in_channels, out_channels, 7, stride, padding=3, bias=False)
        self.bn = nn.BatchNorm2d(out_channels)

    def forward(self, x, kernel_size=7):
        if kernel_size == 7:
            out = self.conv_k7(x)
        elif kernel_size == 5:
            out = self.conv_k5(x)
        elif kernel_size == 3:
            out = self.conv_k3(x)
        else:
            raise ValueError(f"Unsupported kernel size: {kernel_size}")
        return F.relu(self.bn(out))

class OFANetwork(nn.Module):
    """A simplified OFA network with elastic kernel sizes."""
    def __init__(self, num_classes=10):
        super().__init__()
        self.block1 = OFABlock(3, 64, stride=2)
        self.block2 = OFABlock(64, 128, stride=2)
        self.block3 = OFABlock(128, 256, stride=2)
        self.pool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Linear(256, num_classes)

    def forward(self, x, config):
        x = self.block1(x, config['k1'])
        x = self.block2(x, config['k2'])
        x = self.block3(x, config['k3'])
        x = self.pool(x).view(x.size(0), -1)
        return self.classifier(x)

    def extract_subnet(self, efficiency_level='medium'):
        if efficiency_level == 'mobile': return {'k1': 3, 'k2': 3, 'k3': 3}
        if efficiency_level == 'medium': return {'k1': 5, 'k2': 3, 'k3': 5}
        if efficiency_level == 'server': return {'k1': 7, 'k2': 7, 'k3': 5}

# Example usage
ofa_net = OFANetwork()
print("OFA Network created. Can now be trained once.")

mobile_config = ofa_net.extract_subnet('mobile')
server_config = ofa_net.extract_subnet('server')

print(f"Mobile Subnet Config: {mobile_config}")
print(f"Server Subnet Config: {server_config}")

# After one-time training, different subnets can be deployed without retraining
dummy_input = torch.randn(1, 3, 32, 32)
mobile_output = ofa_net(dummy_input, mobile_config)
server_output = ofa_net(dummy_input, server_config)
print("Successfully extracted and used two different subnets from the same OFA network.")

### 5.3 Progressive Search

In [None]:
class ProgressiveNAS:
    """Progressive NAS that starts simple and gradually increases complexity."""
    def __init__(self):
        self.stages = [
            {'name': 'Basic', 'depth': 5, 'ops': ['conv', 'skip_connect'], 'filters': 64, 'trials': 10},
            {'name': 'Intermediate', 'depth': 10, 'ops': ['conv', 'depthwise_conv', 'skip_connect'], 'filters': 128, 'trials': 15},
            {'name': 'Advanced', 'depth': 15, 'ops': ['conv', 'depthwise_conv', 'dilated_conv', 'skip_connect'], 'filters': 256, 'trials': 20}
        ]

    def run_search(self):
        print("Starting Progressive NAS...")
        best_arch = None
        best_score = 0

        for stage in self.stages:
            print(f"\n  === Stage: {stage['name']} ===")
            stage_best_score = 0
            # Create a search space config for the current stage
            stage_config = SearchSpaceConfig()
            stage_config.depth_range = (3, stage['depth'])
            stage_config.layer_types = stage['ops']
            stage_config.filter_sizes = [f for f in stage_config.filter_sizes if f <= stage['filters']]

            for i in range(stage['trials']):
                arch = stage_config.sample_architecture()
                score = evaluate_architecture_placeholder(arch)
                if score > stage_best_score:
                    stage_best_score = score
                    if score > best_score:
                        best_score = score
                        best_arch = arch
            print(f"    Best score in stage: {stage_best_score:.4f}")

        print(f"\nProgressive search finished. Final best score: {best_score:.4f}")
        return best_arch, best_score

# Example usage
progressive_searcher = ProgressiveNAS()
best_arch_prog, best_score_prog = progressive_searcher.run_search()

## 6. Practical Tools and Applications

### 6.1 AutoKeras Integration

In [None]:
def autokeras_image_classification_example():
    """Example using AutoKeras for a complete NAS pipeline on CIFAR-10."""
    try:
        (x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
        print(f"CIFAR-10 data loaded. Train shape: {x_train.shape}")

        # Initialize the AutoKeras Image Classifier
        clf = ak.ImageClassifier(
            max_trials=10,  # Number of models to test
            project_name='autokeras_cifar10_demo',
            overwrite=True
        )

        print("\nStarting AutoKeras architecture search...")
        clf.fit(x_train, y_train, epochs=5, validation_split=0.15)

        # Evaluate the best model
        loss, acc = clf.evaluate(x_test, y_test)
        print(f"\nAutoKeras Test Accuracy: {acc:.4f}")

        # Export the best Keras model
        print("Exporting the best model...")
        exported_model = clf.export_model()
        exported_model.summary()
        return clf, exported_model

    except Exception as e:
        print(f"An error occurred during AutoKeras example: {e}")
        print("Please ensure 'autokeras' and 'tensorflow' are installed.")
        return None, None

if __name__ == '__main__':
    # This example is computationally intensive and is best run in a dedicated environment.
    print("Skipping AutoKeras example in this run. Uncomment to execute.")
    # autokeras_clf, autokeras_best_model = autokeras_image_classification_example()

### 6.2 Optuna Integration

In [None]:
def build_model_optuna(trial):
    """Builds a Keras model based on Optuna trial suggestions."""
    model = keras.Sequential()
    model.add(layers.Input(shape=(28, 28, 1))) # Fashion-MNIST
    num_layers = trial.suggest_int("num_layers", 1, 3)
    for i in range(num_layers):
        model.add(layers.Conv2D(
            filters=trial.suggest_categorical(f"filters_l{i}", [16, 32, 64]),
            kernel_size=trial.suggest_categorical(f"kernel_l{i}", [3, 5]),
            activation='relu', padding='same'))
        model.add(layers.MaxPooling2D(pool_size=2))
    model.add(layers.Flatten())
    model.add(layers.Dense(10, activation="softmax"))
    return model

def objective_optuna(trial):
    """Objective function for Optuna to optimize."""
    keras.backend.clear_session()
    model = build_model_optuna(trial)
    (x_train, y_train), (x_val, y_val) = keras.datasets.fashion_mnist.load_data()
    x_train = (x_train.astype("float32") / 255.0)[:10000, ..., np.newaxis]
    y_train = y_train[:10000]
    x_val = (x_val.astype("float32") / 255.0)[..., np.newaxis]

    model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
    history = model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=4, batch_size=128, verbose=0)
    return history.history["val_accuracy"][-1]

def optuna_nas_example():
    print("\nStarting Optuna NAS example...")
    study = optuna.create_study(direction="maximize", study_name="keras_nas_optuna")
    study.optimize(objective_optuna, n_trials=15, timeout=600)
    print(f"\nOptuna search complete. Best validation accuracy: {study.best_value:.4f}")
    print(f"  Best architecture params: {study.best_params}")
    return study

if __name__ == '__main__':
    # Uncomment to run (takes 10+ minutes):
    # optuna_study = optuna_nas_example()
    pass

### 6.3 Ray Tune Integration

In [None]:
def ray_tune_trainable(config):
    """Training function compatible with Ray Tune."""
    # Build model from config
    model = keras.Sequential()
    model.add(layers.Input(shape=(28, 28, 1)))
    for _ in range(config["num_layers"]):
        model.add(layers.Conv2D(config["filters"], 3, activation='relu', padding='same'))
        model.add(layers.MaxPooling2D(2))
    model.add(layers.Flatten())
    model.add(layers.Dense(10, activation='softmax'))

    (x_train, y_train), (x_val, y_val) = keras.datasets.fashion_mnist.load_data()
    x_train = (x_train.astype("float32") / 255.0)[:10000, ..., np.newaxis]
    y_train = y_train[:10000]
    x_val = (x_val.astype("float32") / 255.0)[..., np.newaxis]

    model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

    # Report metrics back to Ray Tune
    model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=5, batch_size=128, verbose=0,
              callbacks=[tune.integration.keras.TuneReportCallback({"val_accuracy": "val_accuracy"})])

def ray_tune_nas_example():
    """Example of NAS using Ray Tune."""
    print("\nStarting Ray Tune NAS example...")
    search_space = {
        "num_layers": tune.grid_search([1, 2, 3]),
        "filters": tune.choice([16, 32, 64])
    }
    tuner = tune.Tuner(
        ray_tune_trainable,
        param_space=search_space,
        tune_config=tune.TuneConfig(
            metric="val_accuracy",
            mode="max",
            num_samples=1 # Since we use grid search, num_samples is 1 per grid point
        )
    )
    results = tuner.fit()
    best_result = results.get_best_result(metric="val_accuracy", mode="max")
    print(f"\nRay Tune search complete. Best validation accuracy: {best_result.metrics['val_accuracy']:.4f}")
    print(f"  Best config: {best_result.config}")
    return results

if __name__ == "__main__":
    # This example requires Ray to be initialized.
    # It's recommended to run this in a script rather than a notebook for stability.
    print("Skipping Ray Tune example. It's best run in a separate Python script.")
    # ray_results = ray_tune_nas_example()

### 6.4 Complete NAS Pipeline

In [None]:
class CompletePipeLineNAS:
    """Complete NAS pipeline combining zero-cost proxies and multi-fidelity evaluation."""
    def __init__(self, search_config=None):
        self.search_config = search_config or SearchSpaceConfig()
        self.zero_cost_filter = ZeroCostFilter()

    def run_complete_search(self, initial_candidates=500, filter_keep_frac=0.1, max_epochs=27):
        print(f"Starting complete NAS pipeline...")

        # Stage 1: Generate and filter candidates
        print(f"\n--- Stage 1: Candidate Generation & Filtering ---")
        candidates = [self.search_config.sample_architecture() for _ in range(initial_candidates)]
        filtered_candidates = self.zero_cost_filter.filter_architectures(candidates, keep_fraction=filter_keep_frac)
        print(f"  Generated {initial_candidates} candidates, filtered to {len(filtered_candidates)} with zero-cost proxies.")

        # Stage 2: Multi-fidelity evaluation
        print(f"\n--- Stage 2: Multi-Fidelity Evaluation (Successive Halving) ---")
        best_arch = successive_halving_nas(filtered_candidates, max_epochs=max_epochs, reduction_factor=3)

        # Stage 3: Final validation (placeholder)
        print(f"\n--- Stage 3: Final Validation ---")
        final_score = train_and_evaluate_placeholder(best_arch, epochs=max_epochs*2)
        print(f"  Re-trained best architecture. Final estimated score: {final_score:.4f}")

        return best_arch, final_score

# Example usage
if __name__ == '__main__':
    # Uncomment to run (takes 10+ minutes):
    # pipeline = CompletePipeLineNAS()
    # final_arch, final_score = pipeline.run_complete_search()
    pass


## 7. Summary and Best Practices

This notebook provided a comprehensive overview of Neural Architecture Search (NAS) concepts and implementations.

**Key Best Practices:**

* **Start with High-Level Tools**: For most applications, tools like `AutoKeras` provide a powerful and easy-to-use entry point to NAS without needing to manage the search process manually.

* **Define a Good Search Space**: The quality of a NAS result is heavily dependent on the search space. Ensure it contains a diverse yet reasonable set of operations. Avoid making it astronomically large.

* **Use Efficient Performance Estimation**: Full training of every candidate is infeasible. Leverage techniques like:
    * **Successive Halving / HyperBand**: To quickly discard poorly performing architectures.
    * **Weight Sharing (One-Shot/DARTS)**: To amortize the cost of training over many architectures.
    * **Zero-Cost Proxies**: To perform a rapid initial filtering of a large number of candidates before any training.

* **Progressive Search**: Don't try to find the perfect, 50-layer network from scratch. Start the search with simpler, shallower architectures and progressively increase the complexity. This makes the search more tractable.

* **Leverage Practical Frameworks**: Instead of implementing complex search algorithms from scratch, use robust libraries like `Optuna` or `Ray Tune` to handle the hyperparameter optimization and trial scheduling aspects of NAS.

## 8. Final Notes and Additional Resources

### What You've Accomplished

This notebook has provided implementations for key NAS concepts:

? **Search Space Design** (Programmatic, KerasTuner)  
? **Search Strategies** (Evolutionary, Differentiable/DARTS)  
? **Performance Estimation** (Successive Halving, One-Shot, Proxies)  
? **Efficient NAS Techniques** (OFA, Progressive Search)  
? **Integration with Practical Tools** (AutoKeras, Optuna, Ray Tune)  

### Next Steps

1. **Apply to Your Data**: Adapt the `AutoKeras` or `KerasTuner` examples to your own datasets.
2. **Implement a Real Evaluation Function**: Replace the `evaluate_architecture_placeholder` and `train_and_evaluate_placeholder` functions with actual model training on your data to get meaningful results.
3. **Experiment with Search Strategies**: Compare the results of a random search, evolutionary search, and a more guided search using Optuna on the same problem.

### Additional Resources

- **AutoML Book (Chapter 6)**: For detailed explanations of the concepts covered here.
- **DARTS Paper**: [Differentiable Architecture Search](https://arxiv.org/abs/1806.09055)
- **OFA Paper**: [Once-for-All: Train One Network and Specialize it for Efficient Deployment](https://arxiv.org/abs/1908.09791)
- **Optuna Documentation**: https://optuna.readthedocs.io/
- **Ray Tune Documentation**: https://docs.ray.io/en/latest/tune/index.html

**Happy architecture searching!** AutoML continues to make designing state-of-the-art neural networks more accessible. ??