<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [5]</a>'.</span>

# SciTeX Decorators Module - Comprehensive Tutorial

This notebook demonstrates the complete functionality of the `scitex.decorators` module for type conversion, batch processing, caching, and advanced function enhancement.

## Features Covered
* Type conversion decorators (numpy, torch, pandas)
* Batch processing with automatic vectorization
* Memory and disk caching for performance
* Function utilities (timeout, deprecation)
* Decorator composition and auto-ordering
* Real-world application patterns
* Custom decorator creation

## Table of Contents
1. [Setup and Auto-Ordering](#1-setup-and-auto-ordering)
2. [Type Conversion Decorators](#2-type-conversion-decorators)
3. [Batch Processing Decorators](#3-batch-processing-decorators)
4. [Caching Decorators](#4-caching-decorators)
5. [Utility Decorators](#5-utility-decorators)
6. [Decorator Composition](#6-decorator-composition)
7. [Real-World Applications](#7-real-world-applications)
8. [Advanced Patterns](#8-advanced-patterns)
9. [Complete Processing Pipeline](#9-complete-processing-pipeline)

In [1]:
# Detect notebook name for output directory
import os
from pathlib import Path

# Get notebook name (for papermill compatibility)
notebook_name = "21_scitex_decorators"
if 'PAPERMILL_NOTEBOOK_NAME' in os.environ:
    notebook_name = Path(os.environ['PAPERMILL_NOTEBOOK_NAME']).stem


## 1. Setup and Auto-Ordering

The decorators module provides automatic ordering to ensure decorators are applied in the correct sequence.

In [2]:
# Import required libraries
import sys
sys.path.insert(0, '../src')
import scitex as stx
import numpy as np
import pandas as pd
import time
import os
from pathlib import Path

# Check for optional dependencies
try:
    import torch
    TORCH_AVAILABLE = True
except ImportError:
    TORCH_AVAILABLE = False

try:
    import xarray as xr
    XARRAY_AVAILABLE = True
except ImportError:
    XARRAY_AVAILABLE = False

# Enable auto-ordering for decorators (IMPORTANT!)
stx.decorators.enable_auto_order()

Auto-ordering enabled for scitex decorators!
Decorators will now apply in predefined order:
  1. Type conversion (torch_fn, numpy_fn, pandas_fn)
  2. Batch processing (batch_fn)


## 2. Type Conversion Decorators

These decorators automatically convert function inputs to the appropriate data types.

### 2.1 NumPy Function Decorator

The `@numpy_fn` decorator converts inputs to NumPy arrays automatically.

In [3]:
# Basic numpy_fn usage
@stx.decorators.numpy_fn
def compute_statistics(x):
    """Compute comprehensive statistics of data."""
    return {
    'mean': x.mean(),
    'std': x.std(),
    'min': x.min(),
    'max': x.max(),
    'median': np.median(x),
    'shape': x.shape,
    'dtype': str(x.dtype)
    }


# Test with different input types
# 1. Python list
list_data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = compute_statistics(list_data)

# 2. Pandas Series
series_data = pd.Series([10, 20, 30, 40, 50], name='values')
result = compute_statistics(series_data)

# 3. Already NumPy array
array_data = np.random.randn(3, 4)
result = compute_statistics(array_data)

### 2.2 PyTorch Function Decorator

The `@torch_fn` decorator converts inputs to PyTorch tensors.

In [4]:
if TORCH_AVAILABLE:
    
    @stx.decorators.torch_fn
    def neural_operations(x, temperature=1.0):
        """Perform neural network-style operations."""
        # Softmax with temperature
        softmax = torch.softmax(x / temperature, dim=-1)
        
        # L2 normalization
        l2_norm = torch.nn.functional.normalize(x, p=2, dim=-1)
        
        # Attention weights (simplified)
        attention = torch.softmax(torch.matmul(x, x.transpose(-1, -2)), dim=-1)
        
        return {
        'softmax': softmax,
        'l2_normalized': l2_norm,
        'attention_shape': attention.shape,
        'input_device': x.device,
        'input_dtype': x.dtype
        }
    
    # Test with NumPy array
    np_data = np.random.randn(3, 4)
    result = neural_operations(np_data, temperature=0.5)
    
    # Test with different temperature values
    temperatures = [0.1, 1.0, 10.0]
    data = np.array([[1, 2, 3], [4, 5, 6]])
    
    for temp in temperatures:
        result = neural_operations(data, temperature=temp)
        entropy = -(result['softmax'] * torch.log(result['softmax'] + 1e-8)).sum(dim=-1).mean()
        max_prob = result['softmax'].max(dim=-1)[0].mean()
        
else:
    pass  # Fixed incomplete block


### 2.3 Pandas Function Decorator

The `@pandas_fn` decorator converts inputs to pandas DataFrames.

<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [5]:

@stx.decorators.pandas_fn
def comprehensive_dataframe_analysis(df):
    """Perform comprehensive DataFrame analysis."""
    analysis = {
    'basic_info': {
    'shape': df.shape,
    'columns': df.columns.tolist(),
    'dtypes': df.dtypes.to_dict(),
    'memory_usage': df.memory_usage(deep=True).sum()
    },
    'missing_data': {
    'missing_counts': df.isnull().sum().to_dict(),
    'missing_percentage': (df.isnull().sum() / len(df) * 100).to_dict(),
    'complete_rows': len(df.dropna())
    },
    'summary_stats': df.describe().to_dict() if len(df.select_dtypes(include=[np.number]).columns) > 0 else {},
    'categorical_info': {}
    }
    
    # Analyze categorical columns
    categorical_cols = df.select_dtypes(include=['object', 'category']).columns
    for col in categorical_cols:
        analysis['categorical_info'][col] = {
        'unique_count': df[col].nunique(),
        'top_values': df[col].value_counts().head(3).to_dict()
        }
    
    return analysis

# Test with dictionary input
dict_data = {
    'age': [25, 30, 35, 40, 45, None, 50],
    'salary': [50000, 60000, 70000, 80000, 90000, 95000, 100000],
    'department': ['IT', 'Finance', 'IT', 'HR', 'Finance', 'IT', 'HR'],
    'experience': [2, 5, 8, 12, 15, 18, 20]
}

result = comprehensive_dataframe_analysis(dict_data)

# # Test with NumPy array input(creates DataFrame with default column names)
array_data = np.random.randn(100, 4)
result = comprehensive_dataframe_analysis(array_data)

# Show summary statistics for numeric columns
if result['summary_stats']:
    mean_values = [f"{k}: {v['mean']:.2f}" for k, v in result['summary_stats'].items()]

TypeError: unhashable type: 'list'

## 3. Batch Processing Decorators

Batch processing decorators allow functions written for single samples to automatically work with batches.

### 3.1 Basic Batch Processing

In [None]:

# Simple scalar function
@stx.decorators.batch_fn
def complex_transform(x):
    """Apply complex mathematical transformation to single value."""
    return x**3 - 2*x**2 + 3*x - 1

# Test with single value
single_result = complex_transform(2.5)

# Test with array of values
batch_values = np.array([1, 2, 3, 4, 5])
batch_results = complex_transform(batch_values)

# Vector function
@stx.decorators.batch_fn
def normalize_and_analyze(vector):
    """Normalize vector and return analysis."""
    norm = np.linalg.norm(vector)
    if norm > 0:
        normalized = vector / norm
    else:
        normalized = vector
    
    return {
        'original_norm': norm,
        'normalized': normalized,
        'mean': normalized.mean(),
        'std': normalized.std()
    }

# Test with batch of vectors
vectors = np.random.randn(5, 3)  # 5 vectors of dimension 3
results = normalize_and_analyze(vectors)

for i in range(len(results)):
    # Process i

# Verify normalization
final_norms = [np.linalg.norm(results[i]['normalized']) for i in range(len(results))]

### 3.2 Multi-dimensional Batch Processing

In [None]:
# Matrix operations with batch processing
@stx.decorators.batch_fn(n_batch_dims=2)  # Process 2D matrices
def matrix_properties(matrix):
    """Compute properties of a 2D matrix."""
    return {
    'determinant': np.linalg.det(matrix),
    'trace': np.trace(matrix),
    'frobenius_norm': np.linalg.norm(matrix, 'fro'),
    'condition_number': np.linalg.cond(matrix),
    'rank': np.linalg.matrix_rank(matrix)
    }

# Create batch of 3x3 matrices
batch_size = 4
matrices = np.random.randn(batch_size, 3, 3)
properties = matrix_properties(matrices)

for i in range(batch_size):
    props = properties[i]

# Complex classification example
@stx.decorators.batch_fn
def classify_data_point(point):
    """Classify a multi-dimensional point."""
    if len(point) < 2:
        return 'invalid'
    
    x, y = point[0], point[1]
    
    # Distance from origin
    distance = np.sqrt(x**2 + y**2)
    
    # Classify based on multiple criteria
    if distance < 1:
        category = 'center'
    elif distance < 2:
        category = 'middle'
    else:
        category = 'outer'
    
    # Add quadrant information
    if x >= 0 and y >= 0:
        quadrant = 'Q1'
    elif x < 0 and y >= 0:
        quadrant = 'Q2'
    elif x < 0 and y < 0:
        quadrant = 'Q3'
    else:
        quadrant = 'Q4'
    
    return {
        'category': category,
        'quadrant': quadrant,
        'distance': distance,
        'angle': np.arctan2(y, x)
    }

# Generate random 2D points
points = np.random.randn(8, 2) * 2  # 8 points in 2D
classifications = classify_data_point(points)

for i, (point, cls) in enumerate(zip(points, classifications)):
    # Loop body

## 4. Caching Decorators

Caching decorators improve performance by storing function results.

### 4.1 Memory Caching

In [None]:

# Expensive computation with memory caching
@stx.decorators.cache_mem
def fibonacci_recursive(n):
    """Compute Fibonacci number recursively (expensive without caching)."""
    if n <= 1:
        return n
    return fibonacci_recursive(n-1) + fibonacci_recursive(n-2)

@stx.decorators.cache_mem
def expensive_matrix_operation(size, seed=42):
    """Simulate expensive matrix computation."""
    np.random.seed(seed)
    
    # Simulate expensive computation
    matrix = np.random.randn(size, size)
    time.sleep(0.5)  # Simulate computation time
    
    # Perform expensive operations
    eigenvals = np.linalg.eigvals(matrix)
    svd = np.linalg.svd(matrix)
    
    return {
    'matrix': matrix,
    'eigenvalues': eigenvals,
    'singular_values': svd[1],
    'condition_number': np.linalg.cond(matrix)
    }

# Test Fibonacci (dramatic speedup with caching)
start = time.time()
fib_30 = fibonacci_recursive(30)
time_30 = time.time() - start

start = time.time()
fib_35 = fibonacci_recursive(35)  # Uses cached results from smaller values
time_35 = time.time() - start

# Test matrix operation caching

# First call (slow - computes)
start = time.time()
result1 = expensive_matrix_operation(50)
time1 = time.time() - start

# Second call with same parameters (fast - cached)
start = time.time()
result2 = expensive_matrix_operation(50)
time2 = time.time() - start

# Different parameters (computes again)
start = time.time()
result3 = expensive_matrix_operation(50, seed=123)  # Different seed
time3 = time.time() - start

### 4.2 Disk Caching

In [None]:

# Large-scale computation with disk caching
@stx.decorators.cache_disk
def generate_synthetic_dataset(n_samples, n_features, noise_level=0.1, random_state=42):
    """Generate large synthetic dataset for machine learning."""
    
    np.random.seed(random_state)
    
    # Generate feature matrix
    X = np.random.randn(n_samples, n_features)
    
    # Generate synthetic target with some structure
    true_weights = np.random.randn(n_features)
    y = X @ true_weights + noise_level * np.random.randn(n_samples)
    
    # Add some categorical features
    categories = np.random.choice(['A', 'B', 'C'], size=n_samples)
    
    # Simulate expensive preprocessing
    time.sleep(1)  # Simulate computation time
    
    return {
    'X': X,
    'y': y,
    'categories': categories,
    'true_weights': true_weights,
    'feature_stats': {
    'mean': X.mean(axis=0),
    'std': X.std(axis=0)
    },
    'target_stats': {
    'mean': y.mean(),
    'std': y.std()
    }
    }

# Test disk caching

# First call (generates and saves to disk)
start = time.time()
dataset1 = generate_synthetic_dataset(10000, 100)
time1 = time.time() - start

# Second call (loads from disk)
start = time.time()
dataset2 = generate_synthetic_dataset(10000, 100)
time2 = time.time() - start

# Different parameters (new computation)
start = time.time()
dataset3 = generate_synthetic_dataset(5000, 50)  # Different size
time3 = time.time() - start

# Show cache information
cache_dir = Path.home() / ".cache" / "scitex" / "cache"
if cache_dir.exists():
    cache_files = list(cache_dir.rglob("*"))
    cache_size = sum(f.stat().st_size for f in cache_files if f.is_file())

## 5. Utility Decorators

Utility decorators provide additional function enhancements.

### 5.1 Timeout Decorator

In [None]:

# Function with timeout protection
@stx.decorators.timeout(seconds=2, error_message="Operation timed out after 2 seconds!")
def potentially_slow_operation(duration, computation_type="simple"):
    """Simulate operation that might take too long."""
    
    if computation_type == "matrix":
        # Simulate matrix computation
        for i in range(int(duration * 10)):
            _ = np.random.randn(100, 100) @ np.random.randn(100, 100)
            time.sleep(0.1)
    else:
        # Simple sleep
        time.sleep(duration)
    
    return f"Completed {computation_type} computation in {duration}s"

# Test cases
test_cases = [
    (1, "simple"),    # Should succeed
    (1.5, "matrix"),  # Should succeed
    (3, "simple"),    # Should timeout
    (2.5, "matrix")   # Should timeout
]

for duration, comp_type in test_cases:
    try:
        start = time.time()
        result = potentially_slow_operation(duration, comp_type)
        elapsed = time.time() - start
    except Exception as e:
        elapsed = time.time() - start

# Advanced timeout with custom handling
@stx.decorators.timeout(seconds=1)
def iterative_computation(n_iterations):
    """Computation that can be interrupted gracefully."""
    results = []
    for i in range(n_iterations):
        # Simulate work
        result = sum(j**2 for j in range(100))
        results.append(result)
        time.sleep(0.1)
    return results

try:
    results = iterative_computation(20)  # Would take ~2s, but timeout is 1s
except Exception as e:    pass  # Fixed incomplete except block


### 5.2 Deprecation Decorator

In [None]:

# Mark functions as deprecated
@stx.decorators.deprecated("Use calculate_advanced_stats() instead. This function will be removed in v2.0")
def calculate_basic_stats(data):
    """Old function for basic statistics (deprecated)."""
    return {
    'mean': np.mean(data),
    'std': np.std(data)
    }

def calculate_advanced_stats(data):
    """New improved function for statistics."""
    return {
    'mean': np.mean(data),
    'std': np.std(data),
    'median': np.median(data),
    'q25': np.percentile(data, 25),
    'q75': np.percentile(data, 75),
    'skewness': float(pd.Series(data).skew()),
    'kurtosis': float(pd.Series(data).kurtosis())
    }

# Test data
test_data = np.random.randn(1000)

# Capture deprecation warnings
import warnings
with warnings.catch_warnings(record=True) as w:
    warnings.simplefilter("always")
    
    # Using deprecated function
    old_result = calculate_basic_stats(test_data)
    
    if w:
        # Condition met

# Using new function (no warning)
new_result = calculate_advanced_stats(test_data)
for key, value in new_result.items():
    # Loop body

# Multiple levels of deprecation
@stx.decorators.deprecated("This is seriously outdated! Use modern_function() instead.", 
    category=FutureWarning)
def very_old_function():
    """Really old function that should definitely not be used."""
    return "This is very old!"

with warnings.catch_warnings(record=True) as w:
    warnings.simplefilter("always")
    result = very_old_function()
    if w:
        # Condition met

## 6. Decorator Composition

Combining multiple decorators for powerful functionality.

In [None]:

# Auto-ordering ensures decorators are applied in optimal order
@stx.decorators.cache_mem      # Applied third (outermost)
@stx.decorators.batch_fn       # Applied second
@stx.decorators.numpy_fn       # Applied first (innermost)
def advanced_feature_extraction(signal):
    """Extract advanced features from signal with full decorator stack."""
    # Time domain features
    time_features = np.array([
    signal.mean(),
    signal.std(),
    np.sqrt(np.mean(signal**2)),  # RMS
    np.mean(np.abs(signal)),      # Mean absolute value
    signal.max() - signal.min(),  # Range
    ])
    
    # Frequency domain features
    fft = np.fft.fft(signal)
    power_spectrum = np.abs(fft)**2
    freqs = np.fft.fftfreq(len(signal))
    
    # Find dominant frequency
    dominant_freq_idx = np.argmax(power_spectrum[1:len(signal)//2]) + 1
    dominant_freq = abs(freqs[dominant_freq_idx])
    
    freq_features = np.array([
    dominant_freq,
    power_spectrum.sum(),
    np.mean(power_spectrum),
    np.std(power_spectrum)
    ])
    
    # Statistical features
    stat_features = np.array([
    float(pd.Series(signal).skew()),
    float(pd.Series(signal).kurtosis()),
    np.percentile(signal, 95) - np.percentile(signal, 5)  # 90% range
    ])
    
    return np.concatenate([time_features, freq_features, stat_features])

# Generate test signals
t = np.linspace(0, 1, 1000)
signals = [
    np.sin(2 * np.pi * 5 * t) + 0.1 * np.random.randn(1000),   # 5 Hz sine + noise
    np.sin(2 * np.pi * 10 * t) + 0.2 * np.random.randn(1000),  # 10 Hz sine + noise
    np.sin(2 * np.pi * 15 * t) + np.sin(2 * np.pi * 25 * t),   # Mixed frequencies
    np.random.randn(1000),                                       # Pure noise
    np.exp(-t * 5) * np.sin(2 * np.pi * 20 * t)                # Damped oscillation
]

# Extract features (first call - computed and cached)
start = time.time()
features = advanced_feature_extraction(signals)
time1 = time.time() - start

# Second call (cached)
start = time.time()
features_cached = advanced_feature_extraction(signals)
time2 = time.time() - start

# Analyze features
feature_names = [
    'mean', 'std', 'rms', 'mav', 'range',           # Time domain
    'dom_freq', 'total_power', 'mean_power', 'std_power',  # Frequency domain
    'skewness', 'kurtosis', 'percentile_range'     # Statistical
]

signal_types = ['5Hz sine', '10Hz sine', 'Mixed freq', 'Noise', 'Damped osc']
for i, (signal_type, feature_vec) in enumerate(zip(signal_types, features)):
    for name, value in zip(feature_names[:5], feature_vec[:5]):  # Show first 5 features

### 6.1 PyTorch Integration with Decorators

In [None]:
if TORCH_AVAILABLE:
    
    # Pre-combined decorator for PyTorch batch processing
    @stx.decorators.batch_torch_fn
    def neural_network_simulation(x, hidden_dim=64):
        """Simulate simple neural network forward pass."""
        input_dim = x.shape[-1]
        
        # Create random weights (in practice, these would be learned)
        W1 = torch.randn(input_dim, hidden_dim) * 0.1
        b1 = torch.zeros(hidden_dim)
        W2 = torch.randn(hidden_dim, 1) * 0.1
        b2 = torch.zeros(1)
        
        # Forward pass
        h = torch.relu(x @ W1 + b1)  # Hidden layer
        output = h @ W2 + b2         # Output layer
        
        return {
        'output': output.squeeze(),
        'hidden_activation': h,
        'input_norm': torch.norm(x),
        'hidden_sparsity': (h == 0).float().mean()
        }
    
    # Test with batch of data
    batch_data = np.random.randn(10, 20)  # 10 samples, 20 features
    
    results = neural_network_simulation(batch_data, hidden_dim=32)
    
    
    # Attention mechanism simulation
    @stx.decorators.torch_fn
    @stx.decorators.cache_mem
    def compute_attention_weights(query, key, value, temperature=1.0):
        """Compute attention weights and apply to values."""
        # Compute attention scores
        scores = torch.matmul(query, key.transpose(-1, -2)) / temperature
        
        # Apply softmax
        attention_weights = torch.softmax(scores, dim=-1)
        
        # Apply attention to values
        attended_output = torch.matmul(attention_weights, value)
        
        return {
        'attention_weights': attention_weights,
        'attended_output': attended_output,
        'attention_entropy': -(attention_weights * torch.log(attention_weights + 1e-8)).sum(dim=-1).mean()
        }
    
    # Test attention mechanism
    seq_len, d_model = 8, 16
    query = np.random.randn(seq_len, d_model)
    key = np.random.randn(seq_len, d_model)
    value = np.random.randn(seq_len, d_model)
    
    attention_result = compute_attention_weights(query, key, value, temperature=0.1)
    
    
else:
    pass  # Fixed incomplete block


## 7. Real-World Applications

Complete examples showing how decorators work in real data science workflows.

In [None]:

class AdvancedDataProcessor:
    """Complete data processing pipeline using decorators."""
    
    def __init__(self):
        self.processing_steps = []
    
    @stx.decorators.cache_disk
    @stx.decorators.pandas_fn
    def load_and_validate_data(self, data, validation_rules=None):
        """Load data and apply validation rules."""
        self.processing_steps.append("Data loading and validation")
        
        # Simulate data loading delay
        time.sleep(0.2)
        
        validated_data = data.copy()
        validation_report = {
        'total_rows': len(validated_data),
        'total_columns': len(validated_data.columns),
        'missing_values': validated_data.isnull().sum().sum(),
        'duplicate_rows': validated_data.duplicated().sum(),
        'memory_usage_mb': validated_data.memory_usage(deep=True).sum() / 1024**2
        }
        
        # Apply validation rules if provided
        if validation_rules:
            for rule in validation_rules:
                # Simple validation rule application
                if rule['type'] == 'range':
                    col = rule['column']
                    if col in validated_data.columns:
                        mask = (validated_data[col] >= rule['min']) & (validated_data[col] <= rule['max'])
                        validation_report[f'{col}_out_of_range'] = (~mask).sum()
        
        return validated_data, validation_report
    
    @stx.decorators.batch_fn
    @stx.decorators.numpy_fn
    def preprocess_features(self, feature_vector, method='standardize'):
        """Preprocess individual feature vectors."""
        if method == 'standardize':
            # Z-score normalization
            mean = feature_vector.mean()
            std = feature_vector.std()
            if std > 0:
                processed = (feature_vector - mean) / std
            else:
                processed = feature_vector - mean
        elif method == 'minmax':
            # Min-max scaling
            min_val = feature_vector.min()
            max_val = feature_vector.max()
            if max_val > min_val:
                processed = (feature_vector - min_val) / (max_val - min_val)
            else:
                processed = feature_vector - min_val
        else:
            processed = feature_vector
        
        return {
            'processed': processed,
            'original_mean': feature_vector.mean(),
            'original_std': feature_vector.std(),
            'processed_mean': processed.mean(),
            'processed_std': processed.std()
        }
    
    @stx.decorators.timeout(seconds=10)
    @stx.decorators.cache_mem
    def extract_engineered_features(self, data):
        """Extract engineered features from processed data."""
        self.processing_steps.append("Feature engineering")
        
        numeric_cols = data.select_dtypes(include=[np.number]).columns
        
        engineered_features = pd.DataFrame(index=data.index)
        
        # Statistical features
        for col in numeric_cols:
            values = data[col].dropna()
            if len(values) > 0:
                engineered_features[f'{col}_mean'] = values.mean()
                engineered_features[f'{col}_std'] = values.std()
                engineered_features[f'{col}_skew'] = values.skew()
                engineered_features[f'{col}_kurt'] = values.kurtosis()
        
        # Interaction features (sample)
        if len(numeric_cols) >= 2:
            col1, col2 = numeric_cols[0], numeric_cols[1]
            engineered_features[f'{col1}_x_{col2}'] = data[col1] * data[col2]
            engineered_features[f'{col1}_div_{col2}'] = data[col1] / (data[col2] + 1e-8)
        
        return engineered_features
    
    def full_pipeline(self, raw_data, preprocessing_method='standardize'):
        """Execute complete data processing pipeline."""
        
        # Step 1: Load and validate
        validation_rules = [
        {'type': 'range', 'column': 'feature1', 'min': -10, 'max': 10}
        ]
        
        validated_data, validation_report = self.load_and_validate_data(
        raw_data, validation_rules
        )
        
        
        # Step 2: Preprocess features
        numeric_data = validated_data.select_dtypes(include=[np.number])
        if len(numeric_data.columns) > 0:
            preprocessing_results = self.preprocess_features(
            numeric_data.values, method=preprocessing_method
            )
            
        
        # Step 3: Feature engineering
        engineered_features = self.extract_engineered_features(validated_data)
        
        
        return {
            'validated_data': validated_data,
            'validation_report': validation_report,
            'preprocessing_results': preprocessing_results if len(numeric_data.columns) > 0 else None,
            'engineered_features': engineered_features,
            'processing_steps': self.processing_steps.copy()
        }

# Test the complete pipeline
processor = AdvancedDataProcessor()

# Generate sample dataset
np.random.seed(42)
sample_data = pd.DataFrame({
    'feature1': np.random.randn(1000),
    'feature2': np.random.randn(1000) * 2 + 1,
    'feature3': np.random.exponential(2, 1000),
    'category': np.random.choice(['A', 'B', 'C'], 1000),
    'target': np.random.randint(0, 2, 1000)
})

# Add some missing values
sample_data.loc[sample_data.sample(50).index, 'feature1'] = np.nan

# Run pipeline
start_time = time.time()
results = processor.full_pipeline(sample_data, preprocessing_method='standardize')
pipeline_time = time.time() - start_time


# Run again to test caching
start_time = time.time()
results_cached = processor.full_pipeline(sample_data, preprocessing_method='standardize')
cached_time = time.time() - start_time


## 8. Advanced Patterns

Advanced decorator patterns and custom decorator creation.

In [None]:

# Custom decorator factory
def robust_processor(timeout_seconds=30, use_cache=True, handle_errors=True):
    """Create a robust processing decorator with multiple features."""
    def decorator(func):
        # Build decorator chain
        decorated_func = func
        
        # Add error handling
        if handle_errors:
            def error_handler(*args, **kwargs):
                try:
                    return decorated_func(*args, **kwargs)
                except Exception as e:
                    return None
            decorated_func = error_handler
        
        # Add caching
        if use_cache:
            decorated_func = stx.decorators.cache_mem(decorated_func)
        
        # Add timeout
        decorated_func = stx.decorators.timeout(seconds=timeout_seconds)(decorated_func)
        
        # Add type conversion
        decorated_func = stx.decorators.numpy_fn(decorated_func)
        
        return decorated_func
    
    return decorator

# Use custom decorator
@robust_processor(timeout_seconds=5, use_cache=True, handle_errors=True)
def complex_signal_analysis(signal):
    """Perform complex signal analysis with full robustness."""
    # Simulate complex computation
    time.sleep(0.1)
    
    # Multiple analysis steps
    fft = np.fft.fft(signal)
    power_spectrum = np.abs(fft)**2
    
    # Statistical analysis
    stats = {
    'mean': signal.mean(),
    'std': signal.std(),
    'energy': np.sum(signal**2),
    'peak_frequency': np.argmax(power_spectrum[:len(signal)//2]),
    'spectral_centroid': np.sum(np.arange(len(power_spectrum)) * power_spectrum) / np.sum(power_spectrum)
    }
    
    return stats

# Test robust processor
test_signal = np.sin(2 * np.pi * 10 * np.linspace(0, 1, 1000)) + 0.1 * np.random.randn(1000)

result = complex_signal_analysis(test_signal)
if result:
    for key, value in result.items():
        # Loop body

# # Test with problematic input(should handle gracefully)
problematic_input = "not a valid signal"  # This will cause an error
result = complex_signal_analysis(problematic_input)

# Decorator introspection
def analyze_function_decorators(func):
    """Analyze the decorator chain of a function."""
    
    # Check for wrapped functions (decorator chain)
    current = func
    depth = 0
    decorators_found = []
    
    while hasattr(current, '__wrapped__'):
        decorator_name = getattr(current, '__class__', {}).get('__name__', 'Unknown')
        decorators_found.append(f"Level {depth}: {decorator_name}")
        current = current.__wrapped__
        depth += 1
        if depth > 10:  # Safety limit
        break
    
    if decorators_found:
        for decorator in decorators_found:
            # Process decorator
    else:
        pass  # Fixed incomplete block
    
    return depth

# Analyze our decorated function
decorator_depth = analyze_function_decorators(complex_signal_analysis)

# Performance comparison

# Undecorated version
def simple_analysis(signal):
    time.sleep(0.1)
    return {'mean': np.mean(signal), 'std': np.std(signal)}

# Test performance
test_signals = [np.random.randn(1000) for _ in range(3)]

# Time decorated version (first call)
start = time.time()
for signal in test_signals:
    _ = complex_signal_analysis(signal)
decorated_time = time.time() - start

# Time decorated version (second call - cached)
start = time.time()
for signal in test_signals:
    _ = complex_signal_analysis(signal)
cached_time = time.time() - start


## 9. Complete Processing Pipeline

A comprehensive example showing all decorators working together in a machine learning pipeline.

In [None]:

class MLPipelineWithDecorators:
    """Complete machine learning pipeline using all decorator features."""
    
    def __init__(self):
        self.pipeline_steps = []
        self.model_cache = {}
    
    @stx.decorators.cache_disk
    @stx.decorators.timeout(seconds=30)
    @stx.decorators.pandas_fn
    def load_and_clean_data(self, data, cleaning_strategy='default'):
        """Load and clean raw data with caching."""
        self.pipeline_steps.append(f"Data loading ({cleaning_strategy})")
        
        # Simulate data loading time
        time.sleep(0.2)
        
        cleaned_data = data.copy()
        
        if cleaning_strategy == 'default':
            # Remove duplicates
            cleaned_data = cleaned_data.drop_duplicates()
            
            # Handle missing values
            numeric_cols = cleaned_data.select_dtypes(include=[np.number]).columns
            for col in numeric_cols:
                cleaned_data[col] = cleaned_data[col].fillna(cleaned_data[col].median())
            
            categorical_cols = cleaned_data.select_dtypes(include=['object']).columns
            for col in categorical_cols:
                cleaned_data[col] = cleaned_data[col].fillna(cleaned_data[col].mode()[0] if not cleaned_data[col].mode().empty else 'Unknown')
        
        return cleaned_data
    
    @stx.decorators.batch_fn
    @stx.decorators.numpy_fn
    @stx.decorators.cache_mem
    def extract_ml_features(self, sample):
        """Extract machine learning features from individual samples."""
        # Statistical features
        basic_stats = np.array([
        sample.mean(),
        sample.std(),
        np.median(sample),
        np.percentile(sample, 25),
        np.percentile(sample, 75)
        ])
        
        # Distribution features
        dist_features = np.array([
        float(pd.Series(sample).skew()),
        float(pd.Series(sample).kurtosis()),
        sample.max() - sample.min(),
        np.sum(np.abs(sample))
        ])
        
        # Relative features
        if len(sample) > 1:
            diff_features = np.array([
            np.mean(np.diff(sample)),
            np.std(np.diff(sample)),
            np.sum(np.abs(np.diff(sample)))
            ])
        else:
            diff_features = np.zeros(3)
        
        return np.concatenate([basic_stats, dist_features, diff_features])
    
    @stx.decorators.timeout(seconds=60)
    def train_model(self, X, y, model_type='random_forest'):
        """Train machine learning model with timeout protection."""
        from sklearn.ensemble import RandomForestClassifier
        from sklearn.linear_model import LogisticRegression
        from sklearn.model_selection import cross_val_score
        
        self.pipeline_steps.append(f"Model training ({model_type})")
        
        # Select model
        if model_type == 'random_forest':
            model = RandomForestClassifier(n_estimators=100, random_state=42)
        elif model_type == 'logistic':
            model = LogisticRegression(random_state=42, max_iter=1000)
        else:
            raise ValueError(f"Unknown model type: {model_type}")
        
        # Train model
        model.fit(X, y)
        
        # Cross-validation
        cv_scores = cross_val_score(model, X, y, cv=5)
        
        # Cache model
        model_key = f"{model_type}_{hash(str(X.shape) + str(y.shape))}"
        self.model_cache[model_key] = model
        
        return {
            'model': model,
            'cv_scores': cv_scores,
            'cv_mean': cv_scores.mean(),
            'cv_std': cv_scores.std(),
            'feature_importance': getattr(model, 'feature_importances_', None),
            'model_key': model_key
        }
    
    if TORCH_AVAILABLE:
        @stx.decorators.batch_torch_fn
        @stx.decorators.cache_mem
        def neural_feature_extraction(self, x):
            """Extract features using simple neural network."""
            input_dim = x.shape[-1]
            hidden_dim = min(64, input_dim * 2)
            
            # Simple autoencoder-style feature extraction
            W_encode = torch.randn(input_dim, hidden_dim) * 0.1
            W_decode = torch.randn(hidden_dim, input_dim) * 0.1
            
            # Encode
            encoded = torch.relu(x @ W_encode)
            
            # Decode (for reconstruction error)
            decoded = encoded @ W_decode
            reconstruction_error = torch.mean((x - decoded)**2)
            
            return {
            'encoded_features': encoded,
            'reconstruction_error': reconstruction_error,
            'sparsity': (encoded == 0).float().mean()
            }
    
    def complete_pipeline(self, raw_data, target_column, model_types=['random_forest']):
        """Execute complete ML pipeline."""
        
        # Step 1: Data cleaning
        cleaned_data = self.load_and_clean_data(raw_data)
        
        # Step 2: Feature extraction
        feature_columns = [col for col in cleaned_data.columns if col != target_column]
        numeric_features = cleaned_data[feature_columns].select_dtypes(include=[np.number])
        
        if len(numeric_features.columns) > 0:
            extracted_features = self.extract_ml_features(numeric_features.values)
            
            # Add neural features if PyTorch is available
            if TORCH_AVAILABLE and len(numeric_features.columns) >= 2:
                neural_results = self.neural_feature_extraction(numeric_features.values)
                neural_features = neural_results[0]['encoded_features'].numpy()
                
                # Combine features
                combined_features = np.hstack([extracted_features, neural_features])
                X = combined_features
            else:
                X = extracted_features
        else:
            return None
        
        # Step 3: Model training
        y = cleaned_data[target_column].values
        
        models = {}
        for model_type in model_types:
            try:
                model_result = self.train_model(X, y, model_type)
                models[model_type] = model_result
            except Exception as e:
                pass  # Fixed incomplete except block
        
        return {
            'cleaned_data': cleaned_data,
            'features': X,
            'target': y,
            'models': models,
            'pipeline_steps': self.pipeline_steps.copy(),
            'feature_shape': X.shape
        }

# Test complete pipeline
pipeline = MLPipelineWithDecorators()

# Generate comprehensive test dataset
np.random.seed(42)
n_samples = 1000

# Create correlated features
base_features = np.random.randn(n_samples, 3)
feature_matrix = np.column_stack([
    base_features,
    base_features[:, 0] * 2 + np.random.randn(n_samples) * 0.1,  # Correlated
    base_features[:, 1] ** 2,  # Non-linear
    np.random.exponential(1, n_samples),  # Different distribution
])

# Create target with some structure
target = (feature_matrix[:, 0] + 0.5 * feature_matrix[:, 2] - 0.3 * feature_matrix[:, 4] > 0).astype(int)

# Add some noise and missing values
feature_matrix[np.random.choice(n_samples, 50), 0] = np.nan

# Create DataFrame
ml_data = pd.DataFrame(
    feature_matrix,
    columns=[f'feature_{i}' for i in range(feature_matrix.shape[1])]
)
ml_data['target'] = target
ml_data['category'] = np.random.choice(['A', 'B', 'C'], n_samples)

# Run complete pipeline
start_time = time.time()
results = pipeline.complete_pipeline(
    ml_data, 
    target_column='target',
    model_types=['random_forest', 'logistic']
)
total_time = time.time() - start_time

if results:
    
    # Show model comparison
    for model_name, model_result in results['models'].items():
        # Loop body
    
    for i, step in enumerate(results['pipeline_steps'], 1):
        # Loop body

## Summary

This comprehensive tutorial has demonstrated the full power of the SciTeX decorators module:

### Key Features Covered:
1. **Type Conversion** - Automatic conversion to NumPy, PyTorch, and Pandas formats
2. **Batch Processing** - Seamlessly process individual samples or batches
3. **Caching** - Memory and disk caching for performance optimization
4. **Utilities** - Timeout protection, deprecation warnings, and error handling
5. **Auto-Ordering** - Automatic optimal ordering of decorator chains
6. **Real-World Integration** - Complete ML and data processing pipelines

### Best Practices:
1. **Always enable auto-ordering** with `stx.decorators.enable_auto_order()`
2. **Use appropriate caching** - `@cache_mem` for small/frequent, `@cache_disk` for large/persistent
3. **Combine decorators wisely** - type conversion → batch processing → caching → utilities
4. **Add timeout protection** for long-running operations
5. **Handle errors gracefully** with try-catch or custom error handlers

### Performance Benefits:
- **Caching**: 10-100x speedup for repeated computations
- **Batch processing**: Automatic vectorization without code changes
- **Type safety**: Seamless integration between NumPy, PyTorch, and Pandas
- **Robustness**: Timeout protection and error handling

The decorators module transforms simple functions into robust, high-performance, and type-safe components for scientific computing and machine learning workflows.