# SciTeX General Utilities (gen) Module

This notebook demonstrates the powerful general utilities provided by the SciTeX `gen` module. These utilities are essential for scientific computing workflows, providing functionality for:

- **Environment Setup**: Reproducible experiment initialization
- **Data Processing**: Normalization, transformation, and type handling
- **Utility Functions**: Caching, debugging, and system interaction
- **Workflow Management**: Project organization and metadata handling

The `gen` module serves as the foundation for many scientific computing tasks in SciTeX.

## Installation and Setup

In [None]:
import scitex as stx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
import time

# Configure SciTeX for this notebook
stx.repro.fix_seeds(42)
print("SciTeX Gen Module Demonstration")
print(f"SciTeX version: {stx.__version__}")

## 1. Environment Setup and Experiment Initialization

The gen module provides powerful tools for setting up reproducible scientific experiments.

In [None]:
# Initialize a reproducible experiment environment
# This sets up logging, random seeds, and unique IDs
experiment_config = stx.gen.start(
    description="Demo experiment with gen utilities",
    config_path="./config/demo.yaml",  # Optional config file
    verbose=True
)

print(f"Experiment ID: {experiment_config.ID}")
print(f"Started at: {experiment_config.timestamp}")
print(f"Working directory: {experiment_config.spath}")

## 2. Time Tracking and Timestamps

Track execution time and create detailed timestamps for your experiments.

In [None]:
# Create a TimeStamper for tracking experiment progress
ts = stx.gen.TimeStamper(is_simple=False)

# Record timestamps with comments
ts.stamp("Experiment started")
time.sleep(0.1)  # Simulate some work

ts.stamp("Data loading phase")
time.sleep(0.2)  # Simulate data loading

ts.stamp("Processing phase")
time.sleep(0.15)  # Simulate processing

ts.stamp("Analysis complete")

# Display timestamp history
print("Timestamp History:")
print(ts.get_df())

# Calculate elapsed time between specific timestamps
elapsed = ts.get_elapsed_time(0, 3)  # From first to last timestamp
print(f"\nTotal elapsed time: {elapsed:.3f} seconds")

## 3. Data Normalization and Transformation

Essential utilities for data preprocessing and normalization.

In [None]:
# Generate sample data for demonstration
np.random.seed(42)
raw_data = np.random.randn(1000) * 5 + 10  # Mean=10, std=5
outlier_data = np.concatenate([raw_data, [50, -20, 100]])  # Add outliers

print(f"Original data range: [{raw_data.min():.2f}, {raw_data.max():.2f}]")
print(f"With outliers range: [{outlier_data.min():.2f}, {outlier_data.max():.2f}]")

# Various normalization techniques
z_normalized = stx.gen.to_z(raw_data)  # Z-score normalization
unit_normalized = stx.gen.to_01(raw_data)  # Scale to [0, 1]
clipped_data = stx.gen.clip_perc(outlier_data, percentile=95)  # Remove outliers
unbiased = stx.gen.unbias(raw_data)  # Remove mean bias

print(f"\nZ-normalized: mean={z_normalized.mean():.3f}, std={z_normalized.std():.3f}")
print(f"[0,1] normalized: min={unit_normalized.min():.3f}, max={unit_normalized.max():.3f}")
print(f"Clipped data range: [{clipped_data.min():.2f}, {clipped_data.max():.2f}]")
print(f"Unbiased data: mean={unbiased.mean():.6f}")

In [None]:
# Visualize the normalization effects
fig, axes = stx.plt.subplots(2, 2, figsize=(12, 8))

axes[0, 0].hist(raw_data, bins=50, alpha=0.7, color='blue')
axes[0, 0].set_title('Original Data')
axes[0, 0].set_xlabel('Value')
axes[0, 0].set_ylabel('Frequency')

axes[0, 1].hist(z_normalized, bins=50, alpha=0.7, color='green')
axes[0, 1].set_title('Z-Normalized Data')
axes[0, 1].set_xlabel('Z-Score')
axes[0, 1].set_ylabel('Frequency')

axes[1, 0].hist(unit_normalized, bins=50, alpha=0.7, color='red')
axes[1, 0].set_title('[0,1] Normalized Data')
axes[1, 0].set_xlabel('Normalized Value')
axes[1, 0].set_ylabel('Frequency')

axes[1, 1].hist(clipped_data, bins=50, alpha=0.7, color='purple')
axes[1, 1].set_title('Outlier-Clipped Data')
axes[1, 1].set_xlabel('Value')
axes[1, 1].set_ylabel('Frequency')

plt.tight_layout()
stx.io.save(fig, './figures/normalization_comparison.png', symlink_from_cwd=True)
plt.show()

## 4. Caching and Performance Optimization

Use caching to speed up expensive computations.

In [None]:
# Demonstrate caching with expensive computation
@stx.gen.cache
def expensive_computation(n):
    """Simulate an expensive computation."""
    time.sleep(0.1)  # Simulate computational delay
    return sum(i**2 for i in range(n))

# Time the first call (not cached)
start_time = time.time()
result1 = expensive_computation(1000)
first_call_time = time.time() - start_time

# Time the second call (cached)
start_time = time.time()
result2 = expensive_computation(1000)
second_call_time = time.time() - start_time

print(f"First call result: {result1}")
print(f"Second call result: {result2}")
print(f"First call time: {first_call_time:.4f} seconds")
print(f"Second call time: {second_call_time:.6f} seconds")
print(f"Speedup: {first_call_time/second_call_time:.0f}x faster")

## 5. Array and Data Type Utilities

Handle different data types and array operations efficiently.

In [None]:
# Generate sample arrays with different dimensions
array_1d = np.random.randn(100)
array_2d = np.random.randn(10, 20)
array_3d = np.random.randn(5, 10, 8)

# Use DimHandler for flexible dimension handling
dim_handler = stx.gen.DimHandler()

arrays = [array_1d, array_2d, array_3d]
for i, arr in enumerate(arrays, 1):
    print(f"Array {i}D:")
    print(f"  Shape: {arr.shape}")
    print(f"  Dimensions: {arr.ndim}")
    print(f"  Total elements: {arr.size}")
    
    # Get variable information
    var_info = stx.gen.var_info(arr)
    print(f"  Variable info: {var_info}")
    print()

In [None]:
# Demonstrate array transformations
test_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

print(f"Original array: {test_array}")
print(f"Transposed: {stx.gen.transpose(test_array)}")
print(f"To even length: {stx.gen.to_even(test_array)}")
print(f"To odd length: {stx.gen.to_odd(test_array)}")
print(f"To ranks: {stx.gen.to_rank(test_array)}")

# Test with different array shapes
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(f"\nOriginal matrix:\n{matrix}")
print(f"Transposed matrix:\n{stx.gen.transpose(matrix)}")

## 6. File and Path Utilities

Handle file operations and path management efficiently.

In [None]:
# Create sample data for file operations
sample_data = {
    'experiment_id': 'EXP_001',
    'parameters': {'learning_rate': 0.01, 'batch_size': 32},
    'results': [0.85, 0.92, 0.89, 0.91]
}

# Create a temporary directory structure
temp_dir = Path('./temp_gen_demo')
temp_dir.mkdir(exist_ok=True)

# Convert title to path format
title = "My Experiment Results - Analysis 2024"
path_name = stx.gen.title2path(title)
print(f"Title: {title}")
print(f"Path-safe name: {path_name}")

# Create symlinks for organization
data_file = temp_dir / f"{path_name}.json"
stx.io.save(sample_data, data_file, symlink_from_cwd=True)

# Create symbolic link
link_path = temp_dir / "latest_experiment.json"
stx.gen.symlink(data_file, link_path)

print(f"\nCreated data file: {data_file}")
print(f"Created symlink: {link_path}")
print(f"Symlink exists: {link_path.exists()}")

## 7. System and Environment Utilities

Interact with the system and check environment conditions.

In [None]:
# Check execution environment
print(f"Running in IPython/Jupyter: {stx.gen.is_ipython()}")
print(f"Running as script: {stx.gen.is_script()}")

# Check host information
host_info = stx.gen.check_host()
print(f"\nHost information:")
for key, value in host_info.items():
    print(f"  {key}: {value}")

# List installed packages (first few)
packages = stx.gen.list_packages()
print(f"\nFirst 10 installed packages:")
for i, pkg in enumerate(packages[:10]):
    print(f"  {i+1}. {pkg}")

## 8. Advanced Utilities and Debugging

Advanced features for debugging and development.

In [None]:
# Demonstrate module inspection
import numpy as np

# Inspect module capabilities
module_info = stx.gen.inspect_module(np)
print("NumPy module inspection:")
print(f"  Functions: {len(module_info['functions'])}")
print(f"  Classes: {len(module_info['classes'])}")
print(f"  Constants: {len(module_info['constants'])}")

# Show first few functions
print("\nFirst 10 functions:")
for func in module_info['functions'][:10]:
    print(f"  - {func}")

In [None]:
# Demonstrate text processing utilities
sample_text = "This is a Sample Text for Processing"

print(f"Original: {sample_text}")
print(f"Title case: {stx.gen.title_case(sample_text)}")

# Alternative keyword arguments (useful for function flexibility)
def flexible_function(main_param, **kwargs):
    # Use alternate_kwarg to handle different parameter names
    value = stx.gen.alternate_kwarg(kwargs, ['param1', 'parameter', 'p'], default=42)
    return f"Main: {main_param}, Flexible: {value}"

# Test with different parameter names
result1 = flexible_function("test", param1=100)
result2 = flexible_function("test", parameter=200)
result3 = flexible_function("test", p=300)
result4 = flexible_function("test")  # Uses default

print(f"\nFlexible function results:")
print(f"  With param1: {result1}")
print(f"  With parameter: {result2}")
print(f"  With p: {result3}")
print(f"  With default: {result4}")

## 9. Integration Example: Complete Workflow

Demonstrate how gen utilities work together in a complete scientific workflow.

In [None]:
# Complete workflow example using multiple gen utilities
class ExperimentWorkflow:
    def __init__(self, experiment_name):
        self.name = experiment_name
        self.ts = stx.gen.TimeStamper()
        self.results = {}
        
    def setup(self):
        """Initialize experiment environment."""
        self.ts.stamp("Setup started")
        
        # Initialize reproducible environment
        self.config = stx.gen.start(
            description=f"Workflow: {self.name}",
            verbose=False
        )
        
        self.ts.stamp("Setup completed")
        print(f"Experiment '{self.name}' initialized with ID: {self.config.ID}")
        
    @stx.gen.cache
    def generate_data(self, n_samples=1000):
        """Generate synthetic data (cached for efficiency)."""
        self.ts.stamp("Data generation started")
        
        # Simulate expensive data generation
        time.sleep(0.1)
        data = np.random.randn(n_samples, 3)
        
        self.ts.stamp("Data generation completed")
        return data
        
    def process_data(self, data):
        """Process and normalize data."""
        self.ts.stamp("Data processing started")
        
        # Apply various normalizations
        processed = {
            'raw': data,
            'z_normalized': stx.gen.to_z(data),
            'unit_normalized': stx.gen.to_01(data),
            'clipped': stx.gen.clip_perc(data, percentile=95)
        }
        
        self.ts.stamp("Data processing completed")
        return processed
        
    def analyze_and_save(self, processed_data):
        """Analyze data and save results."""
        self.ts.stamp("Analysis started")
        
        # Calculate statistics
        stats = {}
        for name, data in processed_data.items():
            stats[name] = {
                'mean': data.mean(axis=0),
                'std': data.std(axis=0),
                'shape': data.shape
            }
        
        # Save results with proper naming
        path_name = stx.gen.title2path(self.name)
        results_file = f"./results/{path_name}_results.json"
        
        # Ensure results directory exists
        Path("./results").mkdir(exist_ok=True)
        
        self.results = {
            'experiment_id': self.config.ID,
            'timestamp': self.config.timestamp,
            'statistics': stats,
            'timing': self.ts.get_df().to_dict()
        }
        
        stx.io.save(self.results, results_file, symlink_from_cwd=True)
        
        self.ts.stamp("Analysis completed")
        print(f"Results saved to: {results_file}")
        
    def run(self):
        """Execute complete workflow."""
        self.setup()
        data = self.generate_data()
        processed = self.process_data(data)
        self.analyze_and_save(processed)
        
        # Print timing summary
        timing_df = self.ts.get_df()
        total_time = self.ts.get_elapsed_time(0, len(timing_df)-1)
        
        print(f"\nWorkflow completed in {total_time:.3f} seconds")
        print("\nTiming breakdown:")
        print(timing_df)
        
        return self.results

# Run the complete workflow
workflow = ExperimentWorkflow("Gen Module Demonstration")
results = workflow.run()

## 10. Summary and Best Practices

The SciTeX gen module provides essential utilities for scientific computing workflows:

In [None]:
# Summary of key gen utilities demonstrated
summary = {
    'Environment Setup': [
        'stx.gen.start() - Initialize reproducible experiments',
        'stx.gen.TimeStamper - Track execution timing',
        'stx.repro.fix_seeds() - Ensure reproducibility'
    ],
    'Data Processing': [
        'stx.gen.to_z() - Z-score normalization',
        'stx.gen.to_01() - Unit normalization',
        'stx.gen.clip_perc() - Outlier removal',
        'stx.gen.unbias() - Remove mean bias'
    ],
    'Performance': [
        'stx.gen.cache - Function caching decorator',
        'Memory-efficient array operations',
        'Optimized data transformations'
    ],
    'Utilities': [
        'stx.gen.var_info() - Variable inspection',
        'stx.gen.inspect_module() - Module analysis',
        'stx.gen.title2path() - Safe path naming',
        'stx.gen.symlink() - File organization'
    ]
}

print("SciTeX Gen Module - Key Utilities Summary")
print("=" * 50)

for category, utilities in summary.items():
    print(f"\n{category}:")
    for utility in utilities:
        print(f"  • {utility}")

print(f"\n{'='*50}")
print("Best Practices:")
print("  • Always use stx.gen.start() at the beginning of experiments")
print("  • Apply caching to expensive computations")
print("  • Use TimeStamper for performance analysis")
print("  • Normalize data appropriately for your analysis")
print("  • Organize files with proper naming conventions")
print("  • Leverage gen utilities for reproducible workflows")

print(f"\nDemo completed successfully! 🎉")

In [None]:
# Clean up temporary files
import shutil

# Remove temporary directories
for temp_path in ['./temp_gen_demo', './results']:
    if Path(temp_path).exists():
        shutil.rmtree(temp_path)
        print(f"Cleaned up: {temp_path}")

print("\nNotebook cleanup completed.")