# SciTeX Gen (General Utilities) Tutorial

This notebook demonstrates the scitex.gen module for general utilities, project lifecycle management, and development tools. The gen module provides essential functions for experiment setup, reproducibility, data processing, and various utility functions.

## Key Features Covered:
- Experiment lifecycle management (start/close)
- Reproducibility and configuration management
- Data normalization and transformation utilities
- Text and data processing tools
- System and environment utilities
- Caching and performance optimization

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import sys
import os
from pathlib import Path
from datetime import datetime

# Add src to path for imports
sys.path.insert(0, str(Path('.').parent / "src"))

import scitex
import scitex.gen as gen

print("SciTeX Gen Tutorial - General Utilities & Project Management")
print("=" * 60)

## 1. Experiment Lifecycle Management

The gen module provides `start()` and `close()` functions for managing experimental workflows with reproducibility, logging, and configuration management.

In [None]:
# Demonstrate basic start functionality (simplified for notebook)
print("=== Experiment Setup Demo ===")

# In a real script, you would use:
# CONFIG, sys.stdout, sys.stderr, plt, CC = gen.start(sys, plt)

# For notebook demo, we'll simulate the setup
try:
    # Generate unique experiment ID
    experiment_id = gen.gen_ID(N=4)
    print(f"✅ Generated experiment ID: {experiment_id}")
except:
    from scitex.repro._gen_ID import gen_ID
    experiment_id = gen_ID(N=4)
    print(f"✅ Generated experiment ID: {experiment_id}")

# Demonstrate configuration loading
try:
    from scitex.io._load_configs import load_configs
    configs = load_configs(verbose=False)
    print(f"✅ Loaded configurations: {type(configs)}")
    print(f"   Available config keys: {list(configs.keys())[:5]}...")
except Exception as e:
    print(f"⚠️ Config loading: {e}")
    configs = {'demo': True, 'seed': 42}

# Demonstrate seed fixing for reproducibility
try:
    from scitex.repro._fix_seeds import fix_seeds
    fix_seeds(seed=42, verbose=True)
    print("✅ Random seeds fixed for reproducibility")
except Exception as e:
    print(f"⚠️ Seed fixing: {e}")
    np.random.seed(42)
    print("✅ NumPy seed fixed as fallback")

print(f"\nExperiment started at: {datetime.now().strftime('%H:%M:%S')}")

## 2. Data Normalization and Transformation

The gen module provides various normalization and transformation functions for data preprocessing.

In [None]:
# Generate sample data for normalization demos
np.random.seed(42)
data = np.random.randn(100) * 10 + 5  # Mean=5, std=10
data_with_outliers = np.concatenate([data, [50, -30, 45]])  # Add outliers

print("=== Data Normalization Utilities ===")
print(f"Original data: mean={data.mean():.2f}, std={data.std():.2f}")
print(f"Data range: [{data.min():.2f}, {data.max():.2f}]")

# Test normalization functions
norm_functions = [
    ('to_01', 'Normalize to [0, 1] range'),
    ('to_z', 'Z-score normalization'),
    ('unbias', 'Remove bias (center around 0)'),
    ('clip_perc', 'Clip outliers by percentile')
]

normalized_data = {}

for func_name, description in norm_functions:
    try:
        func = getattr(gen, func_name)
        
        if func_name == 'clip_perc':
            # clip_perc requires percentile arguments
            result = func(data_with_outliers, 5, 95)  # Clip 5th and 95th percentiles
        else:
            result = func(data)
            
        normalized_data[func_name] = result
        print(f"✅ {func_name}: {description}")
        print(f"   Result: mean={result.mean():.3f}, std={result.std():.3f}, range=[{result.min():.3f}, {result.max():.3f}]")
        
    except Exception as e:
        print(f"⚠️ {func_name} failed: {e}")

In [None]:
# Visualize normalization results
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
axes = axes.flatten()

# Original data
axes[0].hist(data, bins=20, alpha=0.7, color='blue')
axes[0].set_title('Original Data')
axes[0].set_ylabel('Frequency')

# Normalized data
colors = ['red', 'green', 'orange', 'purple']
for i, (name, norm_data) in enumerate(normalized_data.items(), 1):
    if i < len(axes):
        axes[i].hist(norm_data, bins=20, alpha=0.7, color=colors[i-1])
        axes[i].set_title(f'{name.replace("_", " ").title()}')
        axes[i].set_ylabel('Frequency')

# Hide unused subplot
if len(normalized_data) < 5:
    axes[-1].set_visible(False)

plt.tight_layout()
plt.suptitle('Data Normalization Comparison', y=1.02, fontsize=14)
plt.show()

## 3. Array and Data Utilities

Various utilities for array manipulation and data processing.

In [None]:
print("=== Array and Data Utilities ===")

# Test array manipulation functions
test_array = np.array([[1, 2, 3], [4, 5, 6]])
test_numbers = [1, 3, 5, 7, 9, 11]

array_functions = [
    ('transpose', 'Matrix transpose'),
    ('to_even', 'Convert numbers to even'),
    ('to_odd', 'Convert numbers to odd'),
    ('to_rank', 'Convert to ranks')
]

for func_name, description in array_functions:
    try:
        func = getattr(gen, func_name)
        
        if func_name == 'transpose':
            result = func(test_array)
            print(f"✅ {func_name}: {description}")
            print(f"   Original shape: {test_array.shape} -> Result shape: {result.shape}")
            
        elif func_name in ['to_even', 'to_odd']:
            result = [func(x) for x in test_numbers[:3]]  # Test first 3 numbers
            print(f"✅ {func_name}: {description}")
            print(f"   {test_numbers[:3]} -> {result}")
            
        elif func_name == 'to_rank':
            result = func(np.array([10, 5, 8, 2, 15]))
            print(f"✅ {func_name}: {description}")
            print(f"   [10, 5, 8, 2, 15] -> {result}")
            
    except Exception as e:
        print(f"⚠️ {func_name} failed: {e}")

# Demonstrate variable information utility
try:
    var_info = gen.var_info(test_array)
    print(f"\n✅ var_info: Variable analysis")
    print(f"   Array info: {var_info}")
except Exception as e:
    print(f"⚠️ var_info failed: {e}")

## 4. Text and String Processing

Utilities for text processing and string manipulation.

In [None]:
print("=== Text and String Processing ===")

# Test text processing functions
test_texts = [
    "hello world example",
    "machine learning experiment",
    "neural network analysis"
]

text_functions = [
    ('title_case', 'Convert to title case'),
    ('title2path', 'Convert title to file path'),
    ('wrap', 'Text wrapping')
]

for func_name, description in text_functions:
    try:
        func = getattr(gen, func_name)
        
        if func_name == 'title_case':
            results = [func(text) for text in test_texts]
            print(f"✅ {func_name}: {description}")
            for orig, result in zip(test_texts, results):
                print(f"   '{orig}' -> '{result}'")
                
        elif func_name == 'title2path':
            results = [func(text) for text in test_texts]
            print(f"✅ {func_name}: {description}")
            for orig, result in zip(test_texts, results):
                print(f"   '{orig}' -> '{result}'")
                
        elif func_name == 'wrap':
            long_text = "This is a very long text that needs to be wrapped to demonstrate the text wrapping functionality of the scitex gen module."
            result = func(long_text, width=30)
            print(f"✅ {func_name}: {description}")
            print(f"   Wrapped text (width=30):\n{result}")
            
    except Exception as e:
        print(f"⚠️ {func_name} failed: {e}")

## 5. System and Environment Utilities

Functions for system interaction, environment detection, and host management.

In [None]:
print("=== System and Environment Utilities ===")

# Test environment detection
env_functions = [
    ('is_ipython', 'Check if running in IPython/Jupyter'),
    ('is_script', 'Check if running as script'),
    ('check_host', 'Check current host information')
]

for func_name, description in env_functions:
    try:
        func = getattr(gen, func_name)
        
        if func_name in ['is_ipython', 'is_script']:
            result = func()
            print(f"✅ {func_name}: {description} -> {result}")
            
        elif func_name == 'check_host':
            result = func()
            print(f"✅ {func_name}: {description} -> {result}")
            
    except Exception as e:
        print(f"⚠️ {func_name} failed: {e}")

# Test package listing
try:
    packages = gen.list_packages()
    print(f"\n✅ list_packages: Found {len(packages)} installed packages")
    print(f"   Sample packages: {list(packages.keys())[:5]}")
except Exception as e:
    print(f"⚠️ list_packages failed: {e}")

# Test shell command execution
try:
    result = gen.run_shellcommand("echo 'Hello from shell'")
    print(f"\n✅ run_shellcommand: Shell execution successful")
    print(f"   Output: {result.strip()}")
except Exception as e:
    print(f"⚠️ run_shellcommand failed: {e}")

## 6. Caching and Performance Optimization

Caching utilities for improving performance of repeated computations.

In [None]:
print("=== Caching and Performance Optimization ===")

import time

# Demonstrate caching with a slow function
@gen.cache
def slow_computation(n):
    """Simulate a slow computation."""
    time.sleep(0.1)  # Simulate work
    return sum(i**2 for i in range(n))

print("Testing caching performance:")

# First call (uncached)
start_time = time.time()
result1 = slow_computation(1000)
first_call_time = time.time() - start_time

# Second call (cached)
start_time = time.time()
result2 = slow_computation(1000)
second_call_time = time.time() - start_time

print(f"✅ First call (uncached): {first_call_time:.3f}s -> result: {result1}")
print(f"✅ Second call (cached): {second_call_time:.3f}s -> result: {result2}")
print(f"✅ Speedup: {first_call_time/second_call_time:.1f}x faster")
print(f"✅ Results match: {result1 == result2}")

# Demonstrate cache info
try:
    cache_info = slow_computation.cache_info()
    print(f"\n✅ Cache statistics: {cache_info}")
except:
    print("\n⚠️ Cache info not available")

## 7. Data Format Conversion

Utilities for converting between different data formats.

In [None]:
print("=== Data Format Conversion ===")

# Test XML to dictionary conversion
sample_xml = '''
<experiment>
    <name>Neural Network Test</name>
    <parameters>
        <learning_rate>0.001</learning_rate>
        <epochs>100</epochs>
        <batch_size>32</batch_size>
    </parameters>
    <results>
        <accuracy>0.95</accuracy>
        <loss>0.05</loss>
    </results>
</experiment>
'''

try:
    # Convert XML to dictionary
    xml_dict = gen.xml2dict(sample_xml)
    print("✅ XML to Dictionary conversion:")
    print(f"   Experiment name: {xml_dict['experiment']['name']}")
    print(f"   Learning rate: {xml_dict['experiment']['parameters']['learning_rate']}")
    print(f"   Accuracy: {xml_dict['experiment']['results']['accuracy']}")
    
except Exception as e:
    print(f"⚠️ XML conversion failed: {e}")

# Test alternate keyword argument utility
try:
    # This utility helps with parameter compatibility
    def example_function(**kwargs):
        # Use alternate_kwarg to handle different parameter names
        lr = gen.alternate_kwarg(kwargs, ['learning_rate', 'lr', 'alpha'], default=0.01)
        return f"Learning rate: {lr}"
    
    # Test with different parameter names
    result1 = example_function(learning_rate=0.001)
    result2 = example_function(lr=0.01)
    result3 = example_function(alpha=0.1)
    result4 = example_function()  # Use default
    
    print("\n✅ Alternate keyword argument handling:")
    print(f"   {result1}")
    print(f"   {result2}")
    print(f"   {result3}")
    print(f"   {result4}")
    
except Exception as e:
    print(f"⚠️ Alternate kwarg failed: {e}")

## 8. Advanced Utilities

Dimension handling, time stamping, and other advanced utilities.

In [None]:
print("=== Advanced Utilities ===")

# Test DimHandler for dimension management
try:
    # Create multi-dimensional data
    data_3d = np.random.rand(10, 5, 3)
    
    dim_handler = gen.DimHandler()
    print(f"✅ DimHandler created for dimension management")
    print(f"   Original data shape: {data_3d.shape}")
    
    # Note: Actual usage would depend on specific methods available
    
except Exception as e:
    print(f"⚠️ DimHandler failed: {e}")

# Test TimeStamper for time management
try:
    timestamper = gen.TimeStamper()
    print(f"\n✅ TimeStamper created for time tracking")
    print(f"   Current timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    
except Exception as e:
    print(f"⚠️ TimeStamper failed: {e}")

# Test source code inspection
try:
    # Inspect a function's source
    import inspect
    
    def sample_function(x, y=10):
        """A sample function for inspection."""
        return x + y
    
    source = inspect.getsource(sample_function)
    print(f"\n✅ Source code inspection:")
    print(f"   Function signature: {inspect.signature(sample_function)}")
    print(f"   Docstring: {sample_function.__doc__}")
    
except Exception as e:
    print(f"⚠️ Source inspection failed: {e}")

# Test symlog utility (if available)
try:
    # Symlog is useful for data with wide dynamic range
    data_wide_range = np.array([-1000, -10, -1, 0, 1, 10, 1000])
    symlog_result = gen.symlog(data_wide_range)
    print(f"\n✅ Symlog transformation:")
    print(f"   Original: {data_wide_range}")
    print(f"   Symlog: {symlog_result}")
    
except Exception as e:
    print(f"⚠️ Symlog failed: {e}")

## 9. Integration Example: Complete Workflow

Demonstrate how gen utilities work together in a typical workflow.

In [None]:
print("=== Complete Workflow Integration ===")

# Simulate a complete research workflow
class ExperimentWorkflow:
    def __init__(self, name):
        self.name = gen.title_case(name) if hasattr(gen, 'title_case') else name.title()
        self.start_time = datetime.now()
        print(f"🧪 Starting experiment: {self.name}")
        
    @gen.cache
    def load_data(self, size=1000):
        """Load and preprocess data (cached for performance)."""
        # Simulate data loading
        data = np.random.randn(size) * 10 + 5
        return data
    
    def preprocess_data(self, data):
        """Preprocess data using gen utilities."""
        # Normalize data
        if hasattr(gen, 'to_z'):
            normalized = gen.to_z(data)
        else:
            normalized = (data - data.mean()) / data.std()
            
        # Clip outliers
        if hasattr(gen, 'clip_perc'):
            clipped = gen.clip_perc(normalized, 5, 95)
        else:
            p5, p95 = np.percentile(normalized, [5, 95])
            clipped = np.clip(normalized, p5, p95)
            
        return clipped
    
    def analyze_data(self, data):
        """Analyze processed data."""
        stats = {
            'mean': data.mean(),
            'std': data.std(),
            'min': data.min(),
            'max': data.max(),
            'samples': len(data)
        }
        return stats
    
    def generate_report(self, stats):
        """Generate a formatted report."""
        duration = datetime.now() - self.start_time
        
        report = f"""
=== Experiment Report: {self.name} ===
Duration: {duration.total_seconds():.2f} seconds
Data Statistics:
  - Samples: {stats['samples']}
  - Mean: {stats['mean']:.3f}
  - Std: {stats['std']:.3f}
  - Range: [{stats['min']:.3f}, {stats['max']:.3f}]
Generated at: {datetime.now().strftime('%H:%M:%S')}
        """
        return report.strip()

# Run the complete workflow
workflow = ExperimentWorkflow("neural signal analysis")

# Step 1: Load data (cached)
raw_data = workflow.load_data(1000)
print(f"📊 Loaded {len(raw_data)} samples")

# Step 2: Preprocess data
processed_data = workflow.preprocess_data(raw_data)
print(f"🔧 Preprocessed data: {processed_data.shape}")

# Step 3: Analyze data
statistics = workflow.analyze_data(processed_data)
print(f"📈 Analysis complete")

# Step 4: Generate report
final_report = workflow.generate_report(statistics)
print(f"\n{final_report}")

# Test caching benefit
print("\n🚀 Testing cache performance:")
start = datetime.now()
cached_data = workflow.load_data(1000)  # Should be cached
cache_time = (datetime.now() - start).total_seconds()
print(f"   Cached data retrieval: {cache_time:.6f} seconds")
print(f"   Data integrity: {np.array_equal(raw_data, cached_data)}")

## Summary

This tutorial demonstrated the comprehensive functionality of the SciTeX Gen module:

### ✅ **Experiment Lifecycle Management**
- Project setup with `start()` and `close()` functions
- Unique ID generation and reproducibility
- Configuration management and logging
- Experiment tracking and versioning

### ✅ **Data Processing Utilities**
- Normalization functions (`to_01`, `to_z`, `unbias`, `clip_perc`)
- Array manipulation (`transpose`, `to_even`, `to_odd`, `to_rank`)
- Statistical transformations and outlier handling

### ✅ **Text and String Processing**
- Text formatting (`title_case`, `title2path`)
- String manipulation and path generation
- Text wrapping and formatting utilities

### ✅ **System and Environment**
- Environment detection (`is_ipython`, `is_script`)
- Host and system information (`check_host`)
- Package management (`list_packages`)
- Shell command execution

### ✅ **Performance Optimization**
- Function caching with `@cache` decorator
- LRU cache implementation
- Performance monitoring and optimization

### ✅ **Data Format Conversion**
- XML to dictionary conversion
- Parameter handling and compatibility
- Format transformation utilities

### ✅ **Advanced Features**
- Dimension handling (`DimHandler`)
- Time management (`TimeStamper`)
- Source code inspection
- Specialized transformations (`symlog`)

### Key Applications:
- **Research Workflows**: Complete experiment lifecycle management
- **Data Science**: Preprocessing and normalization pipelines
- **Development Tools**: Caching, debugging, and optimization
- **System Integration**: Environment detection and configuration
- **Reproducibility**: Seed fixing and experiment tracking

The SciTeX Gen module serves as the foundation for reproducible scientific computing workflows, providing essential utilities that work seamlessly together to support research and development activities.