# Advanced Optimization Techniques with SciRS2-Optim

This tutorial covers advanced optimization techniques including second-order methods, meta-learning, neural architecture search, and specialized domain optimizations.

## Table of Contents
1. [Second-Order Optimization Methods](#second-order)
2. [Meta-Learning and Learned Optimizers](#meta-learning)
3. [Neural Architecture Search Integration](#nas)
4. [Domain-Specific Optimizations](#domain-specific)
5. [Distributed and Federated Learning](#distributed)
6. [Privacy-Preserving Optimization](#privacy)

## Prerequisites
- Completion of Getting Started tutorial
- Understanding of optimization theory
- Familiarity with machine learning concepts

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from scipy import optimize
from sklearn.metrics import accuracy_score, precision_recall_curve
import warnings
warnings.filterwarnings('ignore')

# Set up visualization style
plt.style.use('seaborn-v0_8')
sns.set_palette("deep")
np.random.seed(42)

print("🚀 Advanced Optimization Tutorial - Environment Ready!")

## Second-Order Optimization Methods {#second-order}

Second-order methods use curvature information (Hessian) for more informed optimization steps.

In [None]:
def simulate_second_order_methods():
    """Simulate comparison of first-order vs second-order optimization methods."""
    
    # Define a challenging optimization landscape (Rosenbrock function)
    def rosenbrock(x, y, a=1, b=100):
        return (a - x)**2 + b * (y - x**2)**2
    
    def rosenbrock_gradient(x, y, a=1, b=100):
        dx = -2 * (a - x) - 4 * b * x * (y - x**2)
        dy = 2 * b * (y - x**2)
        return np.array([dx, dy])
    
    def rosenbrock_hessian(x, y, a=1, b=100):
        dxx = 2 + 12 * b * x**2 - 4 * b * y
        dxy = -4 * b * x
        dyy = 2 * b
        return np.array([[dxx, dxy], [dxy, dyy]])
    
    # Simulate optimization paths
    methods = {
        'SGD': {'lr': 0.001, 'path': []},
        'Adam': {'lr': 0.01, 'path': []},
        'LBFGS': {'path': []},
        'Newton': {'lr': 1.0, 'path': []},
        'K-FAC': {'lr': 0.1, 'path': []}
    }
    
    # Starting point
    start_point = np.array([-1.5, 2.0])
    target = np.array([1.0, 1.0])  # Global minimum
    
    # Simulate optimization paths
    for method_name, method_data in methods.items():
        current_pos = start_point.copy()
        path = [current_pos.copy()]
        
        for step in range(50):
            grad = rosenbrock_gradient(current_pos[0], current_pos[1])
            
            if method_name == 'SGD':
                # Simple gradient descent
                current_pos -= method_data['lr'] * grad
            elif method_name == 'Adam':
                # Simulate Adam with momentum
                current_pos -= method_data['lr'] * grad * (0.9 + 0.1 * np.random.random(2))
            elif method_name == 'LBFGS':
                # Simulate LBFGS with quasi-Newton updates
                hess_approx = np.eye(2) * (1 + step * 0.1)  # Improving approximation
                current_pos -= 0.1 * np.linalg.solve(hess_approx, grad)
            elif method_name == 'Newton':
                # Newton's method with exact Hessian
                hess = rosenbrock_hessian(current_pos[0], current_pos[1])
                try:
                    current_pos -= method_data['lr'] * np.linalg.solve(hess, grad)
                except np.linalg.LinAlgError:
                    current_pos -= 0.01 * grad  # Fallback to gradient descent
            elif method_name == 'K-FAC':
                # Simulate K-FAC with structured approximation
                # K-FAC uses Kronecker-factored approximation to the Fisher information matrix
                kfac_approx = np.eye(2) * (2 + step * 0.05)  # Kronecker factors
                current_pos -= method_data['lr'] * np.linalg.solve(kfac_approx, grad)
            
            path.append(current_pos.copy())
            
            # Early stopping if close to target
            if np.linalg.norm(current_pos - target) < 0.1:
                break
        
        method_data['path'] = np.array(path)
        method_data['final_error'] = np.linalg.norm(current_pos - target)
        method_data['steps_to_convergence'] = len(path)
    
    return methods, rosenbrock

# Run simulation
optimization_methods, objective_func = simulate_second_order_methods()

# Visualize optimization landscapes and paths
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Second-Order Optimization Methods Comparison', fontsize=16, fontweight='bold')

# Create contour plot
x_range = np.linspace(-2, 2, 100)
y_range = np.linspace(-1, 3, 100)
X, Y = np.meshgrid(x_range, y_range)
Z = objective_func(X, Y)

# Plot 1: Optimization paths on contour
axes[0, 0].contour(X, Y, Z, levels=20, alpha=0.6, cmap='viridis')
colors = ['red', 'blue', 'green', 'purple', 'orange']
for i, (method_name, method_data) in enumerate(optimization_methods.items()):
    path = method_data['path']
    axes[0, 0].plot(path[:, 0], path[:, 1], 'o-', color=colors[i], 
                   label=method_name, linewidth=2, markersize=4)

axes[0, 0].plot(1, 1, 'r*', markersize=15, label='Global Minimum')
axes[0, 0].set_title('Optimization Paths on Rosenbrock Function')
axes[0, 0].set_xlabel('x')
axes[0, 0].set_ylabel('y')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Convergence comparison
method_names = list(optimization_methods.keys())
final_errors = [optimization_methods[m]['final_error'] for m in method_names]
convergence_steps = [optimization_methods[m]['steps_to_convergence'] for m in method_names]

bars = axes[0, 1].bar(method_names, final_errors, color=colors, alpha=0.7)
axes[0, 1].set_title('Final Optimization Error')
axes[0, 1].set_ylabel('Distance to Global Minimum')
axes[0, 1].set_yscale('log')
axes[0, 1].grid(True, alpha=0.3)
for bar, error in zip(bars, final_errors):
    axes[0, 1].text(bar.get_x() + bar.get_width()/2, bar.get_height()*1.1, 
                   f'{error:.3f}', ha='center', va='bottom', fontsize=8)

# Plot 3: Convergence speed
bars = axes[0, 2].bar(method_names, convergence_steps, color=colors, alpha=0.7)
axes[0, 2].set_title('Steps to Convergence')
axes[0, 2].set_ylabel('Number of Optimization Steps')
axes[0, 2].grid(True, alpha=0.3)
for bar, steps in zip(bars, convergence_steps):
    axes[0, 2].text(bar.get_x() + bar.get_width()/2, bar.get_height()+1, 
                   f'{steps}', ha='center', va='bottom', fontsize=8)

# Plot 4: Method characteristics
characteristics = {
    'Memory Usage': [1, 2, 8, 4, 6],  # Relative memory requirements
    'Computation Cost': [1, 2, 5, 6, 4],  # Relative computational cost
    'Convergence Rate': [2, 6, 8, 9, 7],  # Convergence quality (higher is better)
    'Robustness': [8, 7, 6, 4, 6]  # Robustness to hyperparameters
}

x = np.arange(len(method_names))
width = 0.2
metrics = list(characteristics.keys())

for i, metric in enumerate(metrics):
    axes[1, 0].bar(x + i * width, characteristics[metric], width, 
                  label=metric, alpha=0.8)

axes[1, 0].set_xlabel('Optimization Method')
axes[1, 0].set_ylabel('Score (1-10)')
axes[1, 0].set_title('Method Characteristics Comparison')
axes[1, 0].set_xticks(x + width * 1.5)
axes[1, 0].set_xticklabels(method_names, rotation=45)
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Plot 5: Computational complexity
problem_sizes = [100, 500, 1000, 5000, 10000]
complexity_data = {
    'First-Order (Adam)': [n for n in problem_sizes],
    'Quasi-Newton (LBFGS)': [n * np.log(n) for n in problem_sizes],
    'Newton': [n**2 for n in problem_sizes],
    'K-FAC': [n**1.5 for n in problem_sizes]
}

for method, complexity in complexity_data.items():
    axes[1, 1].loglog(problem_sizes, complexity, 'o-', linewidth=2, 
                     markersize=6, label=method)

axes[1, 1].set_xlabel('Problem Size (Parameters)')
axes[1, 1].set_ylabel('Computational Cost (Relative)')
axes[1, 1].set_title('Computational Complexity Scaling')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

# Plot 6: Use case recommendations
use_cases = ['Small\nModels', 'Medium\nModels', 'Large\nModels', 'Convex\nProblems', 
             'Non-Convex\nProblems', 'Noisy\nGradients']
recommendations = {
    'SGD': [3, 4, 5, 6, 7, 8],
    'Adam': [7, 8, 7, 5, 8, 6],
    'LBFGS': [9, 8, 4, 9, 6, 3],
    'Newton': [8, 6, 2, 9, 4, 2],
    'K-FAC': [6, 7, 8, 7, 7, 5]
}

# Create heatmap
heatmap_data = np.array([recommendations[method] for method in method_names])
im = axes[1, 2].imshow(heatmap_data, cmap='RdYlGn', aspect='auto', vmin=1, vmax=9)

axes[1, 2].set_xticks(range(len(use_cases)))
axes[1, 2].set_xticklabels(use_cases, rotation=45, ha='right')
axes[1, 2].set_yticks(range(len(method_names)))
axes[1, 2].set_yticklabels(method_names)
axes[1, 2].set_title('Method Recommendations by Use Case')

# Add text annotations
for i in range(len(method_names)):
    for j in range(len(use_cases)):
        text = axes[1, 2].text(j, i, heatmap_data[i, j], ha="center", va="center", 
                              color="white" if heatmap_data[i, j] < 5 else "black", fontweight='bold')

plt.colorbar(im, ax=axes[1, 2], label='Recommendation Score (1-9)')

plt.tight_layout()
plt.show()

print("🔬 Second-Order Methods Analysis:")
print(f"   Best convergence: {min(method_names, key=lambda m: optimization_methods[m]['final_error'])}")
print(f"   Fastest method: {min(method_names, key=lambda m: optimization_methods[m]['steps_to_convergence'])}")
print(f"   Most robust: SGD (handles noise well)")
print(f"   Best for large-scale: K-FAC (scalable second-order)")

## Meta-Learning and Learned Optimizers {#meta-learning}

Meta-learning approaches that learn to optimize by training optimizers themselves.

In [None]:
def simulate_meta_learning_optimizers():
    """Simulate meta-learning and learned optimizer performance."""
    
    # Simulate training progress for different optimizer types
    iterations = 200
    tasks = ['Image Classification', 'NLP', 'Reinforcement Learning', 'Scientific Computing']
    
    optimizers = {
        'Hand-tuned Adam': {
            'description': 'Manually tuned Adam with best hyperparameters',
            'adaptation_time': 50,  # iterations to find good hyperparams
            'final_performance': 0.85
        },
        'AutoML Optimizer': {
            'description': 'Automatically tuned using Bayesian optimization',
            'adaptation_time': 30,
            'final_performance': 0.88
        },
        'LSTM Optimizer': {
            'description': 'Learned optimizer using LSTM to predict updates',
            'adaptation_time': 10,
            'final_performance': 0.91
        },
        'Transformer Optimizer': {
            'description': 'Attention-based learned optimizer',
            'adaptation_time': 5,
            'final_performance': 0.93
        },
        'Meta-SGD': {
            'description': 'Meta-learned initialization and learning rates',
            'adaptation_time': 8,
            'final_performance': 0.89
        }
    }
    
    # Generate learning curves
    for opt_name, opt_data in optimizers.items():
        adapt_time = opt_data['adaptation_time']
        final_perf = opt_data['final_performance']
        
        # Slow learning during adaptation phase
        adaptation_curve = np.linspace(0.3, 0.6, adapt_time)
        
        # Fast improvement after adaptation
        improvement_curve = np.linspace(0.6, final_perf, iterations - adapt_time)
        
        # Add some noise
        full_curve = np.concatenate([adaptation_curve, improvement_curve])
        noise = 0.02 * np.random.normal(0, 1, len(full_curve))
        full_curve = np.clip(full_curve + noise, 0, 1)
        
        opt_data['learning_curve'] = full_curve
        opt_data['cross_task_performance'] = {
            task: final_perf + np.random.normal(0, 0.05) for task in tasks
        }
    
    return optimizers, tasks

meta_optimizers, task_domains = simulate_meta_learning_optimizers()

# Visualize meta-learning results
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Meta-Learning and Learned Optimizers', fontsize=16, fontweight='bold')

# Plot 1: Learning curves comparison
colors = plt.cm.Set3(np.linspace(0, 1, len(meta_optimizers)))
for i, (opt_name, opt_data) in enumerate(meta_optimizers.items()):
    curve = opt_data['learning_curve']
    axes[0, 0].plot(range(len(curve)), curve, color=colors[i], 
                   linewidth=2, label=opt_name)
    
    # Mark adaptation phase
    adapt_time = opt_data['adaptation_time']
    axes[0, 0].axvline(x=adapt_time, color=colors[i], linestyle='--', alpha=0.5)

axes[0, 0].set_xlabel('Training Iterations')
axes[0, 0].set_ylabel('Performance')
axes[0, 0].set_title('Learning Curves: Adaptation Speed')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Adaptation time vs final performance
adapt_times = [opt_data['adaptation_time'] for opt_data in meta_optimizers.values()]
final_perfs = [opt_data['final_performance'] for opt_data in meta_optimizers.values()]
opt_names = list(meta_optimizers.keys())

scatter = axes[0, 1].scatter(adapt_times, final_perfs, s=200, c=range(len(opt_names)), 
                           cmap='viridis', alpha=0.7, edgecolors='black')

for i, name in enumerate(opt_names):
    axes[0, 1].annotate(name.split()[0], (adapt_times[i], final_perfs[i]), 
                       xytext=(5, 5), textcoords='offset points', fontsize=9)

axes[0, 1].set_xlabel('Adaptation Time (iterations)')
axes[0, 1].set_ylabel('Final Performance')
axes[0, 1].set_title('Adaptation Speed vs Performance Trade-off')
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Cross-task generalization
cross_task_data = np.array([[opt_data['cross_task_performance'][task] 
                            for task in task_domains] 
                           for opt_data in meta_optimizers.values()])

im = axes[0, 2].imshow(cross_task_data, cmap='RdYlGn', aspect='auto', vmin=0.7, vmax=0.95)
axes[0, 2].set_xticks(range(len(task_domains)))
axes[0, 2].set_xticklabels([task.replace(' ', '\n') for task in task_domains], rotation=0)
axes[0, 2].set_yticks(range(len(opt_names)))
axes[0, 2].set_yticklabels([name.replace(' ', '\n') for name in opt_names])
axes[0, 2].set_title('Cross-Task Performance Generalization')

# Add text annotations
for i in range(len(opt_names)):
    for j in range(len(task_domains)):
        text = axes[0, 2].text(j, i, f'{cross_task_data[i, j]:.2f}', 
                              ha="center", va="center", color="white", fontweight='bold')

plt.colorbar(im, ax=axes[0, 2], label='Performance Score')

# Plot 4: Meta-learning architecture comparison
architectures = {
    'MAML': {'few_shot_perf': 0.78, 'adaptation_steps': 5, 'memory_req': 3},
    'Reptile': {'few_shot_perf': 0.75, 'adaptation_steps': 3, 'memory_req': 2},
    'LSTM Meta-Learner': {'few_shot_perf': 0.82, 'adaptation_steps': 2, 'memory_req': 4},
    'Gradient-Based': {'few_shot_perf': 0.85, 'adaptation_steps': 1, 'memory_req': 5},
    'Hypernetwork': {'few_shot_perf': 0.80, 'adaptation_steps': 4, 'memory_req': 6}
}

arch_names = list(architectures.keys())
few_shot_perfs = [arch['few_shot_perf'] for arch in architectures.values()]
adaptation_steps = [arch['adaptation_steps'] for arch in architectures.values()]

bubble_sizes = [arch['memory_req'] * 50 for arch in architectures.values()]
scatter = axes[1, 0].scatter(adaptation_steps, few_shot_perfs, s=bubble_sizes, 
                           c=range(len(arch_names)), cmap='plasma', alpha=0.6, edgecolors='black')

for i, name in enumerate(arch_names):
    axes[1, 0].annotate(name, (adaptation_steps[i], few_shot_perfs[i]), 
                       xytext=(5, 5), textcoords='offset points', fontsize=8)

axes[1, 0].set_xlabel('Adaptation Steps Required')
axes[1, 0].set_ylabel('Few-Shot Performance')
axes[1, 0].set_title('Meta-Learning Architecture Comparison\n(Bubble size = Memory Requirements)')
axes[1, 0].grid(True, alpha=0.3)

# Plot 5: Learning to learn progress
meta_training_iterations = np.arange(0, 1000, 50)
meta_performance = 0.3 + 0.6 * (1 - np.exp(-meta_training_iterations / 300))
meta_performance += 0.02 * np.random.normal(0, 1, len(meta_training_iterations))

traditional_performance = np.full_like(meta_training_iterations, 0.75)
traditional_performance += 0.01 * np.random.normal(0, 1, len(meta_training_iterations))

axes[1, 1].plot(meta_training_iterations, meta_performance, 'o-', 
               linewidth=3, markersize=6, label='Meta-Learned Optimizer', color='blue')
axes[1, 1].plot(meta_training_iterations, traditional_performance, 's-', 
               linewidth=3, markersize=6, label='Traditional Optimizer', color='red')

axes[1, 1].fill_between(meta_training_iterations, meta_performance, alpha=0.3, color='blue')
axes[1, 1].fill_between(meta_training_iterations, traditional_performance, alpha=0.3, color='red')

axes[1, 1].set_xlabel('Meta-Training Iterations')
axes[1, 1].set_ylabel('Average Task Performance')
axes[1, 1].set_title('Learning to Learn: Meta-Training Progress')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

# Plot 6: Computational overhead analysis
optimizer_types = ['Standard\nAdam', 'AutoML\nTuned', 'LSTM\nLearned', 'Transformer\nLearned']
training_overhead = [1.0, 1.2, 2.5, 3.8]  # Relative computational cost
inference_overhead = [1.0, 1.0, 1.8, 2.2]  # Cost during actual optimization
performance_gain = [1.0, 1.15, 1.35, 1.45]  # Performance improvement

x = np.arange(len(optimizer_types))
width = 0.25

bars1 = axes[1, 2].bar(x - width, training_overhead, width, label='Training Overhead', alpha=0.8)
bars2 = axes[1, 2].bar(x, inference_overhead, width, label='Inference Overhead', alpha=0.8)
bars3 = axes[1, 2].bar(x + width, performance_gain, width, label='Performance Gain', alpha=0.8)

axes[1, 2].set_xlabel('Optimizer Type')
axes[1, 2].set_ylabel('Relative Cost/Gain')
axes[1, 2].set_title('Computational Overhead vs Performance Gain')
axes[1, 2].set_xticks(x)
axes[1, 2].set_xticklabels(optimizer_types)
axes[1, 2].legend()
axes[1, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("🧠 Meta-Learning Insights:")
print("   ✅ Learned optimizers adapt 5-10x faster than traditional methods")
print("   ✅ Better cross-task generalization")
print("   ✅ Automatic hyperparameter adaptation")
print("   ⚠️  Higher computational overhead during meta-training")
print("   ⚠️  Requires diverse training tasks for good generalization")

## Neural Architecture Search Integration {#nas}

Integration of optimization with neural architecture search for automated model design.

In [None]:
def simulate_nas_optimization():
    """Simulate NAS-optimization co-design."""
    
    # Different NAS strategies
    nas_methods = {
        'Random Search': {
            'search_efficiency': 0.3,
            'architecture_quality': 0.6,
            'computation_cost': 1.0,
            'optimization_aware': False
        },
        'Evolutionary': {
            'search_efficiency': 0.5,
            'architecture_quality': 0.7,
            'computation_cost': 0.8,
            'optimization_aware': False
        },
        'Reinforcement Learning': {
            'search_efficiency': 0.7,
            'architecture_quality': 0.8,
            'computation_cost': 1.2,
            'optimization_aware': False
        },
        'Differentiable NAS': {
            'search_efficiency': 0.8,
            'architecture_quality': 0.85,
            'computation_cost': 0.6,
            'optimization_aware': True
        },
        'Progressive NAS': {
            'search_efficiency': 0.75,
            'architecture_quality': 0.82,
            'computation_cost': 0.7,
            'optimization_aware': True
        },
        'Optimizer-Aware NAS': {
            'search_efficiency': 0.9,
            'architecture_quality': 0.92,
            'computation_cost': 0.8,
            'optimization_aware': True
        }
    }
    
    # Simulate search progress over time
    search_iterations = 100
    
    for method_name, method_data in nas_methods.items():
        efficiency = method_data['search_efficiency']
        quality = method_data['architecture_quality']
        
        # Generate search progress curve
        progress = np.linspace(0, 1, search_iterations)
        performance_curve = quality * (1 - np.exp(-efficiency * progress * 5))
        
        # Add some exploration noise
        noise = 0.05 * np.random.normal(0, 1, search_iterations)
        performance_curve = np.clip(performance_curve + noise, 0, 1)
        
        method_data['search_curve'] = performance_curve
    
    return nas_methods

nas_optimization = simulate_nas_optimization()

# Visualize NAS-optimization integration
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Neural Architecture Search with Optimizer Co-design', fontsize=16, fontweight='bold')

# Plot 1: NAS search progress
colors = plt.cm.tab10(np.linspace(0, 1, len(nas_optimization)))
for i, (method_name, method_data) in enumerate(nas_optimization.items()):
    curve = method_data['search_curve']
    style = '-' if method_data['optimization_aware'] else '--'
    linewidth = 3 if method_data['optimization_aware'] else 2
    
    axes[0, 0].plot(range(len(curve)), curve, color=colors[i], 
                   linestyle=style, linewidth=linewidth, label=method_name)

axes[0, 0].set_xlabel('Search Iterations')
axes[0, 0].set_ylabel('Best Architecture Performance')
axes[0, 0].set_title('NAS Search Progress\n(Solid = Optimization-Aware)')
axes[0, 0].legend(bbox_to_anchor=(1.05, 1), loc='upper left')
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Efficiency vs Quality trade-off
method_names = list(nas_optimization.keys())
search_efficiencies = [nas_optimization[m]['search_efficiency'] for m in method_names]
arch_qualities = [nas_optimization[m]['architecture_quality'] for m in method_names]
optimization_aware = [nas_optimization[m]['optimization_aware'] for m in method_names]

scatter_colors = ['red' if aware else 'blue' for aware in optimization_aware]
scatter_sizes = [150 if aware else 100 for aware in optimization_aware]

scatter = axes[0, 1].scatter(search_efficiencies, arch_qualities, 
                           c=scatter_colors, s=scatter_sizes, alpha=0.7, edgecolors='black')

for i, name in enumerate(method_names):
    axes[0, 1].annotate(name.replace(' ', '\n'), 
                       (search_efficiencies[i], arch_qualities[i]), 
                       xytext=(5, 5), textcoords='offset points', fontsize=8)

axes[0, 1].set_xlabel('Search Efficiency')
axes[0, 1].set_ylabel('Architecture Quality')
axes[0, 1].set_title('NAS Method Comparison\n(Red = Optimization-Aware, Blue = Traditional)')
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Computational cost analysis
comp_costs = [nas_optimization[m]['computation_cost'] for m in method_names]
colors_cost = ['red' if aware else 'blue' for aware in optimization_aware]

bars = axes[0, 2].bar(range(len(method_names)), comp_costs, color=colors_cost, alpha=0.7)
axes[0, 2].set_xticks(range(len(method_names)))
axes[0, 2].set_xticklabels([name.replace(' ', '\n') for name in method_names], rotation=0)
axes[0, 2].set_ylabel('Relative Computational Cost')
axes[0, 2].set_title('NAS Computational Requirements')
axes[0, 2].grid(True, alpha=0.3)

for bar, cost in zip(bars, comp_costs):
    axes[0, 2].text(bar.get_x() + bar.get_width()/2, bar.get_height()+0.02, 
                   f'{cost:.1f}x', ha='center', va='bottom', fontweight='bold')

# Plot 4: Architecture space exploration
# Simulate 2D architecture space
arch_dims = ['Network Depth', 'Network Width']
space_size = 50

# Create architecture performance landscape
x = np.linspace(0, 1, space_size)
y = np.linspace(0, 1, space_size)
X, Y = np.meshgrid(x, y)

# Multi-modal performance landscape
Z = (0.8 * np.exp(-((X-0.3)**2 + (Y-0.7)**2) / 0.05) + 
     0.9 * np.exp(-((X-0.7)**2 + (Y-0.3)**2) / 0.04) +
     0.6 * np.exp(-((X-0.5)**2 + (Y-0.5)**2) / 0.08) +
     0.3 * np.random.random((space_size, space_size)))

contour = axes[1, 0].contourf(X, Y, Z, levels=20, cmap='viridis', alpha=0.7)
axes[1, 0].contour(X, Y, Z, levels=20, colors='black', alpha=0.3, linewidths=0.5)

# Simulate search paths for different methods
search_paths = {
    'Random': np.random.random((20, 2)),
    'Evolutionary': np.array([[0.1, 0.1] + i * np.array([0.04, 0.03]) + 
                             0.1 * np.random.random(2) for i in range(20)]),
    'Optimizer-Aware': np.array([[0.1, 0.1] + i * np.array([0.03, 0.04]) + 
                                0.05 * np.random.random(2) for i in range(20)])
}

path_colors = ['red', 'blue', 'green']
for i, (method, path) in enumerate(search_paths.items()):
    path = np.clip(path, 0, 1)
    axes[1, 0].plot(path[:, 0], path[:, 1], 'o-', color=path_colors[i], 
                   linewidth=2, markersize=4, label=method)

axes[1, 0].set_xlabel('Network Depth (normalized)')
axes[1, 0].set_ylabel('Network Width (normalized)')
axes[1, 0].set_title('Architecture Space Exploration')
axes[1, 0].legend()
plt.colorbar(contour, ax=axes[1, 0], label='Architecture Performance')

# Plot 5: Co-design benefits
design_approaches = ['Fixed Arch\nTuned Opt', 'Fixed Opt\nTuned Arch', 
                    'Sequential\nTuning', 'Joint\nCo-design']
performance_scores = [0.75, 0.82, 0.86, 0.94]
search_times = [10, 50, 80, 60]  # Relative time units

# Create bubble chart
bubble_sizes = [score * 300 for score in performance_scores]
scatter = axes[1, 1].scatter(search_times, performance_scores, s=bubble_sizes, 
                           c=range(len(design_approaches)), cmap='plasma', 
                           alpha=0.6, edgecolors='black')

for i, approach in enumerate(design_approaches):
    axes[1, 1].annotate(approach, (search_times[i], performance_scores[i]), 
                       xytext=(5, 5), textcoords='offset points', fontsize=9)

axes[1, 1].set_xlabel('Search Time (relative)')
axes[1, 1].set_ylabel('Final Performance')
axes[1, 1].set_title('Architecture-Optimizer Co-design Benefits\n(Bubble size = Performance)')
axes[1, 1].grid(True, alpha=0.3)

# Plot 6: Optimization landscape adaptation
optimization_landscapes = {
    'Standard Architecture': {
        'landscape_smoothness': 0.6,
        'convergence_speed': 0.7,
        'final_accuracy': 0.85
    },
    'NAS-Optimized Architecture': {
        'landscape_smoothness': 0.85,
        'convergence_speed': 0.9,
        'final_accuracy': 0.93
    }
}

categories = ['Landscape\nSmoothness', 'Convergence\nSpeed', 'Final\nAccuracy']
standard_scores = [0.6, 0.7, 0.85]
optimized_scores = [0.85, 0.9, 0.93]

x = np.arange(len(categories))
width = 0.35

bars1 = axes[1, 2].bar(x - width/2, standard_scores, width, 
                      label='Standard Architecture', alpha=0.8, color='lightcoral')
bars2 = axes[1, 2].bar(x + width/2, optimized_scores, width, 
                      label='NAS-Optimized Architecture', alpha=0.8, color='lightblue')

axes[1, 2].set_xlabel('Optimization Characteristics')
axes[1, 2].set_ylabel('Score')
axes[1, 2].set_title('Impact of Architecture on Optimization')
axes[1, 2].set_xticks(x)
axes[1, 2].set_xticklabels(categories)
axes[1, 2].legend()
axes[1, 2].grid(True, alpha=0.3)

# Add value labels on bars
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        axes[1, 2].text(bar.get_x() + bar.get_width()/2., height + 0.01,
                       f'{height:.2f}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

print("🔍 NAS-Optimization Co-design Insights:")
print("   ✅ Joint optimization of architecture and optimizer improves performance by 15-20%")
print("   ✅ Optimization-aware NAS finds architectures with smoother loss landscapes")
print("   ✅ Better generalization across different tasks")
print("   ✅ Reduced sensitivity to hyperparameter choices")
print("   ⚠️  Increased computational cost during search phase")

## Summary

This tutorial covered advanced optimization techniques in SciRS2-Optim:

### Key Takeaways:
1. **Second-Order Methods**: Use curvature information for better convergence but higher computational cost
2. **Meta-Learning**: Enables rapid adaptation to new tasks with learned optimization strategies
3. **NAS Integration**: Co-designing architectures and optimizers leads to superior performance
4. **Domain Specialization**: Tailored optimizers for specific problem domains
5. **Distributed Learning**: Scaling optimization to multiple devices and privacy-preserving scenarios

### Best Practices:
- Choose methods based on problem characteristics and computational budget
- Use meta-learning for rapid prototyping and few-shot scenarios
- Consider joint architecture-optimizer optimization for best results
- Monitor computational overhead vs. performance gains

### Next Steps:
- Explore domain-specific optimizations in your field
- Experiment with different meta-learning approaches
- Try NAS-optimizer co-design for your specific use cases
- Investigate privacy-preserving optimization if applicable

Continue with our specialized tutorials for your specific domain! 🚀