# Getting Started with SciTeX

SciTeX is a Python package designed to standardize scientific computing workflows with focus on reproducibility, consistency, and ease of use.

This notebook introduces the core concepts and modules of SciTeX.

## Installation

First, ensure SciTeX is installed:

```bash
pip install scitex
```

Or install from source:

```bash
cd ~/proj/SciTeX-Code
pip install -e .
```

## 1. Basic Import and Setup

In [None]:
import scitex as stx
import numpy as np
import pandas as pd

# Check SciTeX version
print(f"SciTeX version: {stx.__version__}")

## 2. IO Module - Unified File Operations

The `scitex.io` module provides a unified interface for loading and saving various file formats.

In [None]:
# Create sample data
data = pd.DataFrame({
    'x': np.linspace(0, 10, 100),
    'y': np.sin(np.linspace(0, 10, 100)),
    'z': np.cos(np.linspace(0, 10, 100))
})

# Save data - automatically detects format from extension
stx.io.save(data, './data_example.csv')
stx.io.save(data, './data_example.pkl')
stx.io.save(data.values, './data_example.npy')

print("Files saved successfully!")

In [None]:
# Load data - format detected automatically
df_csv = stx.io.load('./data_example.csv')
df_pkl = stx.io.load('./data_example.pkl')
arr_npy = stx.io.load('./data_example.npy')

print(f"CSV shape: {df_csv.shape}")
print(f"PKL shape: {df_pkl.shape}")
print(f"NPY shape: {arr_npy.shape}")

### Advanced IO Features

In [None]:
# Save with compression
large_data = np.random.randn(1000, 1000)
stx.io.save(large_data, './large_data.npy.gz')

# Create symlink from current working directory
stx.io.save(data, './outputs/results.csv', symlink_from_cwd=True)

# Save dictionary data
results = {
    'accuracy': 0.95,
    'loss': 0.05,
    'params': {'learning_rate': 0.001, 'epochs': 100}
}
stx.io.save(results, './model_results.json')

## 3. Plotting Module - Enhanced Matplotlib

The `scitex.plt` module extends matplotlib with convenient functions and better defaults.

In [None]:
# Basic plotting
fig, ax = stx.plt.subplots(figsize=(8, 6))

# Plot data
ax.plot(data['x'], data['y'], label='sin(x)')
ax.plot(data['x'], data['z'], label='cos(x)')

# Use the convenient set_xyt function
ax.set_xyt('X values', 'Y values', 'Trigonometric Functions')
ax.legend()

# Save figure with automatic directory creation
stx.io.save(fig, './figures/trig_functions.png', dpi=150)
stx.plt.show()

In [None]:
# Multiple subplots with shared styling
fig, axes = stx.plt.subplots(2, 2, figsize=(10, 8))

# Generate different datasets
x = np.linspace(0, 10, 100)
datasets = [
    ('Linear', x, x),
    ('Quadratic', x, x**2),
    ('Exponential', x, np.exp(x/5)),
    ('Logarithmic', x[1:], np.log(x[1:]))
]

for ax, (title, x_data, y_data) in zip(axes.flat, datasets):
    ax.plot(x_data, y_data)
    ax.set_xyt('x', 'y', title)
    ax.grid(True, alpha=0.3)

fig.tight_layout()
stx.io.save(fig, './figures/multiple_functions.png')
stx.plt.show()

## 4. Statistics Module - Scientific Analysis

In [None]:
# Generate sample data
np.random.seed(42)
group1 = np.random.normal(100, 15, 100)
group2 = np.random.normal(110, 15, 100)

# Perform t-test
from scipy import stats
t_stat, p_value = stats.ttest_ind(group1, group2)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Effect size (Cohen's d)
cohen_d = (np.mean(group2) - np.mean(group1)) / np.sqrt((np.var(group1) + np.var(group2)) / 2)
print(f"Cohen's d: {cohen_d:.4f}")

In [None]:
# Visualize distributions
fig, ax = stx.plt.subplots(figsize=(8, 6))

# Plot histograms
ax.hist(group1, bins=20, alpha=0.6, label='Group 1', density=True)
ax.hist(group2, bins=20, alpha=0.6, label='Group 2', density=True)

# Add normal curves
x_range = np.linspace(50, 150, 100)
ax.plot(x_range, stats.norm.pdf(x_range, np.mean(group1), np.std(group1)), 
        'b-', linewidth=2, label='Group 1 fit')
ax.plot(x_range, stats.norm.pdf(x_range, np.mean(group2), np.std(group2)), 
        'r-', linewidth=2, label='Group 2 fit')

ax.set_xyt('Value', 'Density', f'Distribution Comparison (p={p_value:.3f})')
ax.legend()

stx.io.save(fig, './figures/distribution_comparison.png')
stx.plt.show()

## 5. Configuration Management

SciTeX encourages separating configuration from code for better reproducibility.

In [None]:
# Create a configuration dictionary
config = {
    'experiment': {
        'name': 'demo_analysis',
        'seed': 42,
        'n_samples': 1000
    },
    'model': {
        'type': 'random_forest',
        'n_estimators': 100,
        'max_depth': 10
    },
    'paths': {
        'data': './data/',
        'results': './results/',
        'figures': './figures/'
    }
}

# Save configuration
stx.io.save(config, './config/experiment.yaml')

# Load configuration
loaded_config = stx.io.load('./config/experiment.yaml')
print("Configuration loaded:")
print(loaded_config)

## 6. Complete Workflow Example

Let's combine everything in a typical scientific analysis workflow:

In [None]:
# 1. Setup and configuration
np.random.seed(42)
config = stx.io.load('./config/experiment.yaml')

# 2. Generate synthetic data
n_samples = config['experiment']['n_samples']
X = np.random.randn(n_samples, 2)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

# Add some noise
noise_idx = np.random.choice(n_samples, size=int(0.1 * n_samples), replace=False)
y[noise_idx] = 1 - y[noise_idx]

# 3. Save raw data
data = pd.DataFrame(X, columns=['feature1', 'feature2'])
data['target'] = y
stx.io.save(data, './data/synthetic_dataset.csv')

print(f"Generated dataset with {n_samples} samples")
print(f"Class distribution: {np.bincount(y)}")

In [None]:
# 4. Visualize the data
fig, ax = stx.plt.subplots(figsize=(8, 6))

# Plot different classes
colors = ['blue', 'red']
for class_val in [0, 1]:
    mask = y == class_val
    ax.scatter(X[mask, 0], X[mask, 1], 
               c=colors[class_val], 
               label=f'Class {class_val}',
               alpha=0.6)

# Add decision boundary
x_line = np.linspace(-3, 3, 100)
y_line = -x_line  # x + y = 0
ax.plot(x_line, y_line, 'k--', label='True boundary')

ax.set_xyt('Feature 1', 'Feature 2', 'Synthetic Classification Dataset')
ax.legend()
ax.grid(True, alpha=0.3)

stx.io.save(fig, './figures/synthetic_data_visualization.png')
stx.plt.show()

In [None]:
# 5. Train a simple model
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=config['experiment']['seed']
)

# Train model
model = RandomForestClassifier(
    n_estimators=config['model']['n_estimators'],
    max_depth=config['model']['max_depth'],
    random_state=config['experiment']['seed']
)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
accuracy = (y_pred == y_test).mean()

print(f"Test Accuracy: {accuracy:.3f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

In [None]:
# 6. Visualize results
fig, (ax1, ax2) = stx.plt.subplots(1, 2, figsize=(12, 5))

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
im = ax1.imshow(cm, cmap='Blues')
ax1.set_xyt('Predicted', 'Actual', 'Confusion Matrix')

# Add text annotations
for i in range(2):
    for j in range(2):
        ax1.text(j, i, str(cm[i, j]), ha='center', va='center')

# Feature importance
importance = model.feature_importances_
ax2.bar(['Feature 1', 'Feature 2'], importance)
ax2.set_xyt('Features', 'Importance', 'Feature Importance')

fig.tight_layout()
stx.io.save(fig, './figures/model_evaluation.png')
stx.plt.show()

In [None]:
# 7. Save results
results = {
    'accuracy': float(accuracy),
    'confusion_matrix': cm.tolist(),
    'feature_importance': importance.tolist(),
    'config': config,
    'n_train': len(X_train),
    'n_test': len(X_test)
}

stx.io.save(results, './results/experiment_results.json')
stx.io.save(model, './models/trained_model.pkl')

print("\nWorkflow complete! Results saved to:")
print("- ./results/experiment_results.json")
print("- ./models/trained_model.pkl")
print("- ./figures/")

## 7. Best Practices with SciTeX

1. **Always use relative paths** starting with `./`
2. **Organize outputs** by type (figures/, data/, results/)
3. **Use configuration files** for parameters
4. **Leverage unified IO** for format-agnostic code
5. **Create symlinks** for important outputs
6. **Document your workflow** in notebooks like this

## Next Steps

Explore more specialized notebooks:
- `02_scitex_io_advanced.ipynb` - Advanced IO operations
- `03_scitex_plotting.ipynb` - Publication-ready plots
- `04_scitex_statistics.ipynb` - Statistical analysis
- `05_scitex_dsp.ipynb` - Digital signal processing

## Cleanup

Remove temporary files created during this demo:

In [None]:
# Optional: Clean up demo files
import os
import shutil

# List files to clean
cleanup_files = [
    './data_example.csv',
    './data_example.pkl', 
    './data_example.npy',
    './large_data.npy.gz',
    './model_results.json'
]

cleanup_dirs = [
    './outputs',
    './figures',
    './data',
    './results',
    './models',
    './config'
]

# Uncomment to actually clean up:
# for f in cleanup_files:
#     if os.path.exists(f):
#         os.remove(f)
# for d in cleanup_dirs:
#     if os.path.exists(d):
#         shutil.rmtree(d)

print("Demo complete!")