<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [10]</a>'.</span>

# SciTeX Gen Module - Core Generation Utilities

This comprehensive notebook demonstrates the SciTeX gen module capabilities, covering core generation utilities and helper functions.

## Features Covered

### Core Utilities
* Data normalization and transformation
* Array dimension handling
* Type checking and validation
* Shell command execution

### Development Tools
* Configuration printing
* Module inspection
* Environment checking
* Caching mechanisms

### File Operations
* Symlink management
* Text processing
* XML/JSON conversion
* Path utilities

In [1]:
# Detect notebook name for output directory
import os
from pathlib import Path

# Get notebook name (for papermill compatibility)
notebook_name = "02_scitex_gen"
if 'PAPERMILL_NOTEBOOK_NAME' in os.environ:
    notebook_name = Path(os.environ['PAPERMILL_NOTEBOOK_NAME']).stem


In [2]:
# Memory management for automated execution
import gc
import matplotlib
matplotlib.use('Agg')  # Non-interactive backend
import matplotlib.pyplot as plt
plt.ioff()  # Turn off interactive mode

# Function to clean up matplotlib
def cleanup_plt():
    plt.close('all')
    gc.collect()


In [3]:
import sys
sys.path.insert(0, '../src')
import scitex
import numpy as np
import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt
import tempfile
import os

# Set up example data directory
data_dir = Path("./gen_examples")
data_dir.mkdir(exist_ok=True)


In [4]:
# Path compatibility helper
import os
from pathlib import Path

def ensure_output_dir(subdir: str, notebook_name: str = "02_scitex_gen"):
    """Ensure output directory exists with backward compatibility."""
    expected_dir = Path(subdir)
    actual_dir = Path(f"{notebook_name}_out") / subdir
    
    if not expected_dir.exists() and actual_dir.exists():
        # Create symlink for backward compatibility
        try:
            os.symlink(str(actual_dir.resolve()), str(expected_dir))
        except (OSError, FileExistsError):
            pass
    
    return expected_dir


## Part 1: Data Normalization and Transformation

### 1.1 Normalization Functions

In [5]:
# Create sample data for normalization
sample_data = np.random.randn(1000) * 10 + 50  # Mean=50, std=10
sample_2d = np.random.randn(100, 20) * 5 + 25   # 2D array


# Normalize to 0-1 range
normalized_01 = scitex.gen.to_01(sample_data)

# Z-score normalization
z_normalized = scitex.gen.to_z(sample_data)

# Remove bias (center at zero)
unbiased = scitex.gen.unbias(sample_data)

In [6]:
# Percentile-based clipping
outlier_data = np.concatenate([sample_data, [200, -50, 150, -30]])  # Add outliers

# Clip to 5th and 95th percentiles
clipped = scitex.gen.clip_perc(outlier_data, low=5, high=95)

# Visualize transformations
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle('Data Transformation Examples')

axes[0, 0].hist(sample_data, bins=50, alpha=0.7, color='blue')
axes[0, 0].set_title('Original Data')
axes[0, 0].set_xlabel('Value')
axes[0, 0].set_ylabel('Frequency')

axes[0, 1].hist(normalized_01, bins=50, alpha=0.7, color='green')
axes[0, 1].set_title('Normalized to [0,1]')
axes[0, 1].set_xlabel('Value')
axes[0, 1].set_ylabel('Frequency')

axes[1, 0].hist(z_normalized, bins=50, alpha=0.7, color='red')
axes[1, 0].set_title('Z-score Normalized')
axes[1, 0].set_xlabel('Value')
axes[1, 0].set_ylabel('Frequency')

axes[1, 1].hist(clipped, bins=50, alpha=0.7, color='orange')
axes[1, 1].set_title('Percentile Clipped')
axes[1, 1].set_xlabel('Value')
axes[1, 1].set_ylabel('Frequency')

plt.tight_layout()
plt.show()
cleanup_plt()  # Free memory

  plt.show()


### 1.2 Ranking and Ordering Functions

In [7]:
# Create test data for ranking
test_values = np.array([85, 92, 78, 95, 88, 91, 73, 96, 82, 89])

# Convert to ranks
ranks = scitex.gen.to_rank(test_values)

# Show correspondence
ranked_data = pd.DataFrame({
    'Value': test_values,
    'Rank': ranks
})
ranked_data = ranked_data.sort_values('Rank')

# Even/odd utilities - demonstrate with individual numbers
test_numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for num in test_numbers:
    even = scitex.gen.to_even(num)
    odd = scitex.gen.to_odd(num)

# If you need to apply to arrays, use list comprehension or numpy.vectorize
numbers = np.arange(1, 21)
even_numbers = np.array([scitex.gen.to_even(n) for n in numbers])
odd_numbers = np.array([scitex.gen.to_odd(n) for n in numbers])


# Visualize ranking
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Original vs ranked
axes[0].bar(range(len(test_values)), test_values, alpha=0.7, color='blue')
axes[0].set_title('Original Values')
axes[0].set_xlabel('Index')
axes[0].set_ylabel('Value')

axes[1].bar(range(len(ranks)), ranks, alpha=0.7, color='red')
axes[1].set_title('Ranks')
axes[1].set_xlabel('Index')
axes[1].set_ylabel('Rank')

plt.tight_layout()
plt.show()
cleanup_plt()  # Free memory

  plt.show()


## Part 2: Array Dimension Handling

### 2.1 DimHandler Class

In [8]:
# Create test arrays with different dimensionsarray_1d = np.random.randn(100)array_2d = np.random.randn(50, 20)array_3d = np.random.randn(10, 8, 5)array_4d = np.random.randn(5, 4, 3, 2)arrays = {    '1D': array_1d,    '2D': array_2d,    '3D': array_3d,    '4D': array_4d}# Print array informationfor name, arr in arrays.items():    print(f"{name} array shape: {arr.shape}, size: {arr.size}")# Use DimHandler for dimension managementdim_handler = scitex.gen.DimHandler()# Analyze each arrayfor name, arr in arrays.items():    print(f"\nAnalyzing {name} array:")    print(f"  Shape: {arr.shape}")    print(f"  Dimensions: {arr.ndim}")    print(f"  Total elements: {arr.size}")

In [9]:
# Transpose operations
matrix = np.random.randn(5, 3)

# Use numpy transpose (scitex.gen.transpose is for dimension reordering with named dims)
transposed = matrix.T  # or np.transpose(matrix)

# Verify transpose property
double_transposed = transposed.T

# Example of scitex.gen.transpose with named dimensions
# This function is useful when you have meaningful dimension names
# Create a 3D tensor with dimensions: batch, time, features
tensor_3d = np.random.randn(2, 10, 5)  # 2 batches, 10 time steps, 5 features
src_dims = np.array(['batch', 'time', 'features'])
tgt_dims = np.array(['time', 'batch', 'features'])  # Swap batch and time

transposed_3d = scitex.gen.transpose(tensor_3d, src_dims, tgt_dims)

# Visualize transpose operation
fig, axes = plt.subplots(1, 2, figsize=(10, 4))

im1 = axes[0].imshow(matrix, cmap='viridis', aspect='auto')
axes[0].set_title(f'Original Matrix {matrix.shape}')
axes[0].set_xlabel('Columns')
axes[0].set_ylabel('Rows')
plt.colorbar(im1, ax=axes[0])

im2 = axes[1].imshow(transposed, cmap='viridis', aspect='auto')
axes[1].set_title(f'Transposed Matrix {transposed.shape}')
axes[1].set_xlabel('Columns')
axes[1].set_ylabel('Rows')
plt.colorbar(im2, ax=axes[1])

plt.tight_layout()
plt.show()
cleanup_plt()  # Free memory

  plt.show()


## Part 3: Type Checking and Variable Information

### 3.1 Variable Information System

<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [10]:
# Create various data types for testing
test_variables = {
    'integer': 42,
    'float': 3.14159,
    'string': "Hello, SciTeX!",
    'list': [1, 2, 3, 4, 5],
    'dict': {'a': 1, 'b': 2, 'c': 3},
    'numpy_array': np.array([1, 2, 3, 4, 5]),
    'pandas_series': pd.Series([1, 2, 3, 4, 5]),
    'pandas_dataframe': pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}),
    'complex': 3 + 4j,
    'boolean': True,
    'none_type': None
}


for name, var in test_variables.items():
    try:
        var_details = scitex.gen.var_info(var)
        if hasattr(var, 'shape'):
            # Condition met
        if hasattr(var, '__len__') and not isinstance(var, str):
            # Condition met
        if hasattr(var, 'dtype'):
            # Condition met
        if hasattr(var, 'nbytes'):
            # Condition met
    except Exception as e:        pass  # Fixed incomplete except block


IndentationError: expected an indented block after 'if' statement on line 20 (2725345229.py, line 21)

### 3.2 ArrayLike Type Checking

In [None]:
# Test ArrayLike type checking
array_like_candidates = [
    np.array([1, 2, 3]),
    [1, 2, 3],
    (1, 2, 3),
    pd.Series([1, 2, 3]),
    pd.DataFrame({'x': [1, 2, 3]}),
    "not array-like",
    42,
    {'a': 1, 'b': 2}
]


for i, candidate in enumerate(array_like_candidates):
    try:
        # Check if it's array-like
        is_array_like = isinstance(candidate, (np.ndarray, list, tuple, pd.Series, pd.DataFrame))
        
        if is_array_like:
            if hasattr(candidate, 'shape'):
                # Condition met
            elif hasattr(candidate, '__len__'):
                pass  # Fixed incomplete block
    except Exception as e:        pass  # Fixed incomplete except block


## Part 4: Environment and Configuration

### 4.1 Environment Detection

In [None]:
# Check environment

# Check if running in IPython/Jupyter
is_ipython = scitex.gen.is_ipython()
is_script = scitex.gen.is_script()


# Host checking
try:
    hostname = scitex.gen.check_host()
except Exception as e:
    pass  # Fixed incomplete except block

# List installed packages
try:
    packages = scitex.gen.list_packages()
    if packages:
        for pkg in packages[:10]:
            # Process pkg
        if len(packages) > 10:
            # Condition met
except Exception as e:    pass  # Fixed incomplete except block


### 4.2 Configuration Management

In [None]:
# Print configuration

try:
    scitex.gen.print_config()
except Exception as e:
    pass  # Fixed incomplete except block

# Module inspection

try:
    gen_info = scitex.gen.inspect_module(scitex.gen)
except Exception as e:    pass  # Fixed incomplete except block


## Part 5: File Operations and Utilities

### 5.1 Symlink Management

In [None]:
# Create test files for symlink operations
test_file = data_dir / "test_original.txt"
test_content = "This is a test file for symlink operations.\nLine 2\nLine 3"

# Write test file
with open(test_file, 'w') as f:
    f.write(test_content)


# Create symlink
symlink_path = data_dir / "test_symlink.txt"
try:
    scitex.gen.symlink(test_file, symlink_path)
    
    # Read through symlink
    with open(symlink_path, 'r') as f:
        symlink_content = f.read()
    
    
except Exception as e:    pass  # Fixed incomplete except block


### 5.2 Text Processing Utilities

In [None]:
# Title case conversiontest_titles = [    "hello world",    "THE QUICK BROWN FOX",    "machine learning algorithms",    "data science and AI",    print("python programming")]for title in test_titles:    try:        title_cased = scitex.gen.title_case(title)    except Exception as e:        pass  # Fixed incomplete except block# Title to path conversionfor title in test_titles:    try:        path_name = scitex.gen.title2path(title)    except Exception as e:        pass  # Fixed incomplete except block

### 5.3 Caching Mechanisms

In [None]:
# Demonstrate caching with simple computationimport timedef simple_computation(n):    """Simulate a computation that takes some time."""    time.sleep(0.1)  # Simulate computation time (reduced from 0.5)        result = sum(i**2 for i in range(n))    return result# Use cachingcached_computation = scitex.gen.cache(simple_computation)# First call - will computestart_time = time.time()result1 = cached_computation(1000)first_time = time.time() - start_time# Second call - should be cachedstart_time = time.time()result2 = cached_computation(1000)second_time = time.time() - start_timeprint(f"Second call print(f"result: {result2}")"))if second_time > 0:else:    pass  # Fixed incomplete block

## Part 6: Advanced Features

### 6.1 Shell Command Execution

In [None]:
# Execute shell commands safely# Simple commandscommands = [    "echo 'Hello from shell'",    "date",    "pwd",    print("ls -la | head -5")]for cmd in commands:    try:        result = scitex.gen.run_shellcommand(cmd)    except Exception as e:        pass  # Fixed incomplete except block

### 6.2 XML and Data Conversion

In [None]:
# XML to dictionary conversion - simplified example

# Use a minimal XML example
sample_xml = '''<data>
    <value>42</value>
    <name>test</name>
</data>'''

try:
    # Try to convert XML to dictionary
    if hasattr(scitex.gen, 'xml2dict'):
        xml_dict = scitex.gen.xml2dict(sample_xml)
    else:
        # Manual simple parsing for demonstration
    
except Exception as e:
    # Show expected output

### 6.3 Time Stamping and Logging

In [None]:
# TimeStamper for tracking operationstry:    # Create timestamp handler    timestamper = scitex.gen.TimeStamper()        # Perform some operations with timestamps    operations = [        "Data loading",        "Preprocessing",        "Model training",        "Evaluation",        print("Results saving")    ]        for i, operation in enumerate(operations):        time.sleep(0.01)  # Simulate operation time                # Add timestamp (if method exists)        if hasattr(timestamper, 'add_timestamp'):            timestamper.add_timestamp(operation)        else:            # Manual timestamp            current_time = time.strftime("%Y-%m-%d %H:%M:%S")    except Exception as e:    pass  # Fixed incomplete except block

## Part 7: Output Redirection and Logging

### 7.1 Tee Functionality

In [None]:
# Tee functionality - output to multiple destinations

log_file = data_dir / "output.log"

try:
    # Create Tee object
    tee = scitex.gen.Tee(str(log_file))
    
    # Redirect output
    original_stdout = sys.stdout
    sys.stdout = tee
    
    # Print some messages
    
    # Restore original stdout
    sys.stdout = original_stdout
    tee.close()
    
    
    # Read back the log file
    if log_file.exists():
        with open(log_file, 'r') as f:
            log_content = f.read()
        
except Exception as e:
    # Ensure stdout is restored
    sys.stdout = original_stdout

## Summary and Best Practices

This tutorial demonstrated the comprehensive capabilities of the SciTeX gen module:

### Key Features Covered:
1. **Data Normalization**: `to_01()`, `to_z()`, `unbias()`, `clip_perc()`
2. **Array Operations**: `DimHandler`, `transpose()`, dimension management
3. **Type Checking**: `var_info()`, `ArrayLike` validation
4. **Environment Detection**: `is_ipython()`, `is_script()`, `check_host()`
5. **File Operations**: `symlink()`, path utilities
6. **Text Processing**: `title_case()`, `title2path()`
7. **Caching**: `cache()` decorator for expensive operations
8. **System Integration**: Shell commands, configuration management
9. **Data Conversion**: `xml2dict()` for structured data
10. **Output Management**: `Tee` for logging and redirection

### Best Practices:
- Use **normalization functions** for consistent data preprocessing
- Apply **caching** for expensive computations
- Use **environment detection** for conditional execution
- Implement **proper error handling** for robust applications
- Use **symlinks** for efficient file management
- Apply **type checking** for data validation
- Use **Tee** for comprehensive logging

In [None]:
# Cleanup
import shutil

cleanup = input("Clean up example files? (y/n): ").lower().startswith('y')
if cleanup:
    shutil.rmtree(data_dir)
else:
    pass  # Fixed incomplete block
