# Kaggle and Google Colab Integration

This notebook demonstrates how to use Promethium in cloud notebook environments.

**Prerequisites:**
- Kaggle account or Google Colab access
- No local installation required

**Estimated Runtime:** 5-10 minutes

**Topics Covered:**
- Installation in cloud environments
- GPU detection and usage
- Data path handling for Kaggle and Colab
- Memory management tips

## 1. Installation

Install Promethium directly from PyPI. This works in both Kaggle and Colab.

In [None]:
# Install Promethium from PyPI
!pip install promethium-seismic==1.0.1 -q

In [None]:
# Verify installation
import promethium
print(f"Promethium version: {promethium.__version__}")

# Import all needed functions
from promethium import (
    load_segy,
    SeismicRecoveryPipeline,
    evaluate_reconstruction,
    generate_synthetic_traces,
    add_noise,
    set_seed,
    get_device,
    run_recovery,
)

import numpy as np
import matplotlib.pyplot as plt

## 2. Environment Detection

Detect whether we are running in Kaggle, Colab, or a local environment.

In [None]:
import os
import sys

def detect_environment():
    """Detect the current execution environment."""
    if 'KAGGLE_KERNEL_RUN_TYPE' in os.environ:
        return 'kaggle'
    elif 'COLAB_GPU' in os.environ or 'google.colab' in sys.modules:
        return 'colab'
    else:
        return 'local'

environment = detect_environment()
print(f"Environment: {environment}")

# Device detection
device = get_device()
print(f"Compute device: {device}")

# Check GPU availability
import torch
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("GPU: Not available (using CPU)")

## 3. Data Paths

Configure data paths based on the environment.

In [None]:
def get_data_paths(environment):
    """Get input and output paths based on environment."""
    if environment == 'kaggle':
        input_dir = '/kaggle/input'
        output_dir = '/kaggle/working'
    elif environment == 'colab':
        input_dir = '/content'
        output_dir = '/content'
    else:
        input_dir = './data'
        output_dir = './output'
    
    return input_dir, output_dir

input_dir, output_dir = get_data_paths(environment)
print(f"Input directory: {input_dir}")
print(f"Output directory: {output_dir}")

# Create output directory if it does not exist
os.makedirs(output_dir, exist_ok=True)

## 4. Loading Data

### Option A: From Kaggle Dataset
```python
# Example for Kaggle
data = load_segy('/kaggle/input/seismic-dataset/survey.sgy')
```

### Option B: Upload to Colab
```python
# Upload file in Colab
from google.colab import files
uploaded = files.upload()
data = load_segy(list(uploaded.keys())[0])
```

### Option C: Generate Synthetic Data (Demo)

In [None]:
# For demonstration, generate synthetic data
set_seed(42)

clean_data, metadata = generate_synthetic_traces(
    n_traces=100,
    n_samples=500,
    sample_rate=250.0,
    seed=42
)

# Add noise
noisy_data = add_noise(clean_data, noise_level=0.25, seed=42)

print(f"Data shape: {noisy_data.shape}")
print(f"Memory usage: {noisy_data.nbytes / 1024:.2f} KB")

## 5. Run Recovery Pipeline

In [None]:
# Use the convenience function for quick recovery
try:
    reconstructed = run_recovery(noisy_data, preset='unet_denoise_v1')
except Exception as e:
    print(f"Note: Full model requires trained weights.")
    print(f"Using simple filter for demonstration.")
    from scipy.ndimage import gaussian_filter1d
    reconstructed = np.array([gaussian_filter1d(trace, sigma=2) for trace in noisy_data])

print(f"Reconstructed shape: {reconstructed.shape}")

## 6. Evaluate and Visualize

In [None]:
# Compute metrics
metrics = evaluate_reconstruction(clean_data, reconstructed)

print("Reconstruction Metrics:")
for name, value in metrics.items():
    print(f"  {name}: {value:.4f}")

In [None]:
# Visualize results
trace_idx = 25
t = np.arange(metadata['n_samples']) / metadata['sample_rate']

fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)

axes[0].plot(t, clean_data[trace_idx], 'b-', linewidth=0.8)
axes[0].set_ylabel('Amplitude')
axes[0].set_title('Original Clean')
axes[0].grid(True, alpha=0.3)

axes[1].plot(t, noisy_data[trace_idx], 'r-', linewidth=0.8)
axes[1].set_ylabel('Amplitude')
axes[1].set_title('Noisy Input')
axes[1].grid(True, alpha=0.3)

axes[2].plot(t, reconstructed[trace_idx], 'g-', linewidth=0.8)
axes[2].set_xlabel('Time (s)')
axes[2].set_ylabel('Amplitude')
axes[2].set_title('Reconstructed')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(os.path.join(output_dir, 'reconstruction_result.png'), dpi=150)
plt.show()

print(f"Figure saved to: {os.path.join(output_dir, 'reconstruction_result.png')}")

## 7. Memory Management Tips

Cloud environments have limited memory. Here are tips for efficient usage:

In [None]:
# Check available memory
import psutil

memory = psutil.virtual_memory()
print(f"Total RAM: {memory.total / 1e9:.1f} GB")
print(f"Available RAM: {memory.available / 1e9:.1f} GB")
print(f"Used RAM: {memory.percent:.1f}%")

# Tips for memory management
print("\nMemory Management Tips:")
print("1. Process data in chunks for large datasets")
print("2. Delete intermediate variables with 'del variable'")
print("3. Use float32 instead of float64 when possible")
print("4. Clear GPU memory with torch.cuda.empty_cache()")

## 8. Saving Results

In [None]:
# Save reconstructed data as numpy file
output_path = os.path.join(output_dir, 'reconstructed_data.npy')
np.save(output_path, reconstructed)
print(f"Data saved to: {output_path}")

# Save metrics as JSON
import json
metrics_path = os.path.join(output_dir, 'metrics.json')
with open(metrics_path, 'w') as f:
    json.dump(metrics, f, indent=2)
print(f"Metrics saved to: {metrics_path}")

## 9. Summary

This notebook demonstrated:

1. **Installation**: `pip install promethium-seismic==1.0.1`
2. **Environment Detection**: Kaggle, Colab, or local
3. **GPU Usage**: Automatic device detection
4. **Data Paths**: Environment-specific path handling
5. **Memory Management**: Tips for cloud environments
6. **Result Saving**: Export to numpy and JSON formats

### Kaggle Competition Template

```python
!pip install promethium-seismic==1.0.1 -q

from promethium import load_segy, run_recovery, evaluate_reconstruction

# Load competition data
data = load_segy('/kaggle/input/competition-name/test.sgy')

# Run recovery
result = run_recovery(data, preset='unet_denoise_v1')

# Save submission
import numpy as np
np.save('/kaggle/working/submission.npy', result)
```