# NDWI Calculator

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/unbihexium-oss/unbihexium/blob/main/examples/notebooks/070_ndwi_calculator.ipynb)
[![GitHub](https://img.shields.io/badge/GitHub-View_Source-181717?logo=github)](https://github.com/unbihexium-oss/unbihexium)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

---

**Author**: Unbihexium OSS Foundation  
**Version**: 1.0.0  
**Last Updated**: 2025-12-21  
**Task Type**: `regression`

---

## Table of Contents

1. [Introduction](#1-introduction)
2. [Model Overview](#2-model-overview)
3. [Environment Setup](#3-environment-setup)
4. [Model Loading](#4-model-loading)
5. [Variant Comparison](#5-variant-comparison)
6. [Inference Pipeline](#6-inference-pipeline)
7. [Performance Benchmarks](#7-performance-benchmarks)
8. [Integration Examples](#8-integration-examples)
9. [Best Practices](#9-best-practices)
10. [References](#10-references)


## 1. Introduction

Normalized Difference Water Index calculation.

This notebook provides a comprehensive guide to using the Unbihexium library's `ndwi_calculator` model family. The model is available in four variants (tiny, base, large, mega) to accommodate different computational constraints and accuracy requirements.

### Key Features

- **Task**: Regression
- **Variants**: 4 (tiny, base, large, mega)
- **Formats**: ONNX and PyTorch (.pt)
- **Framework**: Unbihexium Model Zoo


## 2. Model Overview

### Use Cases

The `ndwi_calculator` model is designed for:

- Primary application: Normalized Difference Water Index calculation
- Integration with geospatial workflows
- Batch processing of satellite imagery
- Real-time inference for operational systems

### Technical Specifications

| Variant | Resolution | Channels | Parameters | Use Case |
|---------|------------|----------|------------|----------|
| tiny | 32x32 | 16 | ~17K | Ultra-lightweight for edge devices and rapid prototyping |
| base | 64x64 | 64 | ~268K | Balanced performance for production deployments |
| large | 128x128 | 128 | ~1M | High accuracy for demanding applications |
| mega | 256x256 | 256 | ~4M | Maximum precision for research and critical tasks |


## 3. Environment Setup

### Prerequisites

- Python 3.10 or higher
- 8 GB RAM minimum (16 GB recommended for large/mega variants)
- CUDA-compatible GPU (optional, for accelerated inference)


In [None]:
# !git lfs install
# !git lfs pull

# Install required packages
# Uncomment the following lines if running in a fresh environment

# !pip install unbihexium
# !pip install onnxruntime  # or onnxruntime-gpu for GPU acceleration
# !pip install torch torchvision


In [None]:
# Verify installation
import sys
print(f"Python version: {sys.version}")

try:
    import unbihexium
    print(f"Unbihexium version: {unbihexium.__version__}")
except ImportError:
    print("Unbihexium not installed. Run: pip install unbihexium")

try:
    import onnxruntime as ort
    print(f"ONNX Runtime version: {ort.__version__}")
    print(f"Available providers: {ort.get_available_providers()}")
except ImportError:
    print("ONNX Runtime not installed. Run: pip install onnxruntime")

try:
    import torch
    print(f"PyTorch version: {torch.__version__}")
    print(f"CUDA available: {torch.cuda.is_available()}")
except ImportError:
    print("PyTorch not installed. Run: pip install torch")


## 4. Model Loading

### 4.1 Using Unbihexium Model Zoo

The recommended approach is to use the Unbihexium Model Zoo API for automatic model management.


In [None]:
from pathlib import Path
import json

# Define model paths for all variants
MODEL_ID = "ndwi_calculator"

BASE_DIR = Path().resolve()
MODEL_ZOO = None
for parent in [BASE_DIR] + list(BASE_DIR.parents):
    candidate = parent / "model_zoo" / "assets"
    if candidate.exists():
        MODEL_ZOO = candidate
        break

if MODEL_ZOO is None:
    raise FileNotFoundError("Could not find 'model_zoo/assets' in parent directories.")

VARIANTS = ["tiny", "base", "large", "mega"]

# Load configuration for each variant
configs = {}

for variant in VARIANTS:
    variant_folder = MODEL_ZOO / variant
    if not variant_folder.exists():
        print(f"{variant.upper()}: Variant folder not found: {variant_folder}")
        continue

    model_folders = [f for f in variant_folder.iterdir() if f.is_dir()]
    if not model_folders:
        print(f"{variant.upper()}: No models found in {variant_folder}")
        continue

    # Assuming the first folder matches the pattern or is the correct one
    model_folder = model_folders[0]
    config_file = model_folder / "config.json"
    if config_file.exists():
        with open(config_file) as f:
            configs[variant] = json.load(f)
            print(f"{variant.upper()}: {configs[variant]}")
    else:
        print(f"{variant.upper()}: config.json not found in {model_folder}")


### 4.2 Direct ONNX Loading

For production deployments, ONNX format provides cross-platform compatibility.


In [None]:
import onnxruntime as ort
import numpy as np

# Load ONNX models for all variants
onnx_sessions = {}

for variant in VARIANTS:
    model_path = MODEL_ZOO / variant / f"{MODEL_ID}_{variant}" / "model.onnx"
    if model_path.exists():
        try:
            session = ort.InferenceSession(
                str(model_path),
                providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
            )
            onnx_sessions[variant] = session
            
            # Print model info
            input_info = session.get_inputs()[0]
            output_info = session.get_outputs()[0]
            print(f"{variant.upper()} ONNX loaded:")
            print(f"  Input: {input_info.name} {input_info.shape}")
            print(f"  Output: {output_info.name} {output_info.shape}")
        except Exception as e:
            print(f"{variant.upper()}: Failed to load - {e}")
    else:
        print(f"{variant.upper()}: ONNX model not found")


### 4.3 PyTorch Loading

For research and fine-tuning, PyTorch format provides full model access.


In [None]:
import torch

# Load PyTorch models
pt_models = {}

for variant in VARIANTS:
    model_path = MODEL_ZOO / variant / f"{MODEL_ID}_{variant}" / "model.pt"
    if model_path.exists():
        try:
            model = torch.jit.load(str(model_path), map_location='cpu')
            model.eval()
            pt_models[variant] = model
            
            # Count parameters
            param_count = sum(p.numel() for p in model.parameters())
            print(f"{variant.upper()} PyTorch loaded: {param_count:,} parameters")
        except Exception as e:
            print(f"{variant.upper()}: Failed to load - {e}")
    else:
        print(f"{variant.upper()}: PyTorch model not found")


## 5. Variant Comparison

Compare the four model variants across key metrics.


In [None]:
import os

# Compare model sizes and configurations
comparison_data = []

for variant in VARIANTS:
    onnx_path = MODEL_ZOO / variant / f"{MODEL_ID}_{variant}" / "model.onnx"
    pt_path = MODEL_ZOO / variant / f"{MODEL_ID}_{variant}" / "model.pt"
    
    row = {"Variant": variant.upper()}
    
    if onnx_path.exists():
        row["ONNX Size (MB)"] = round(os.path.getsize(onnx_path) / (1024 * 1024), 2)
    if pt_path.exists():
        row["PT Size (MB)"] = round(os.path.getsize(pt_path) / (1024 * 1024), 2)
    if variant in configs:
        row["Parameters"] = configs[variant].get("params", "N/A")
        row["Resolution"] = configs[variant].get("resolution", "N/A")
    
    comparison_data.append(row)

# Display comparison table
try:
    import pandas as pd
    df = pd.DataFrame(comparison_data)
    print(df.to_string(index=False))
except ImportError:
    for row in comparison_data:
        print(row)


## 6. Inference Pipeline

### 6.1 Input Preparation

Prepare input data according to model requirements.


In [None]:
def prepare_input(resolution: int, channels: int = 3, batch_size: int = 1):
    """
    Prepare synthetic input tensor for inference.
    
    Parameters
    ----------
    resolution : int
        Spatial resolution (width and height)
    channels : int
        Number of input channels (default: 3 for RGB)
    batch_size : int
        Batch size for inference
    
    Returns
    -------
    np.ndarray
        Input tensor of shape (batch_size, channels, resolution, resolution)
    """
    # Generate synthetic input (replace with actual data in production)
    return np.random.rand(batch_size, channels, resolution, resolution).astype(np.float32)

# Resolution mapping for each variant
RESOLUTIONS = {
    "tiny": 32,
    "base": 64,
    "large": 128,
    "mega": 256
}

# Prepare inputs for all variants
inputs = {}
for variant, res in RESOLUTIONS.items():
    inputs[variant] = prepare_input(res)
    print(f"{variant.upper()} input shape: {inputs[variant].shape}")


### 6.2 ONNX Inference

Run inference using ONNX Runtime.


In [None]:
# Run ONNX inference for all variants
onnx_outputs = {}

for variant, session in onnx_sessions.items():
    input_name = session.get_inputs()[0].name
    input_data = inputs[variant]
    
    # Run inference
    output = session.run(None, {input_name: input_data})
    onnx_outputs[variant] = output[0]
    
    print(f"{variant.upper()} ONNX output shape: {output[0].shape}")
    print(f"  Min: {output[0].min():.4f}, Max: {output[0].max():.4f}, Mean: {output[0].mean():.4f}")


### 6.3 PyTorch Inference

Run inference using PyTorch.


In [None]:
# Run PyTorch inference for all variants
pt_outputs = {}

with torch.no_grad():
    for variant, model in pt_models.items():
        input_tensor = torch.from_numpy(inputs[variant])
        
        # Run inference
        output = model(input_tensor)
        pt_outputs[variant] = output.numpy()
        
        print(f"{variant.upper()} PyTorch output shape: {output.shape}")
        print(f"  Min: {output.min():.4f}, Max: {output.max():.4f}, Mean: {output.mean():.4f}")


## 7. Performance Benchmarks

Measure inference time for each variant and framework.


In [None]:
import time

def benchmark(func, n_runs: int = 10, warmup: int = 3):
    """
    Benchmark a function.
    
    Parameters
    ----------
    func : callable
        Function to benchmark
    n_runs : int
        Number of timed runs
    warmup : int
        Number of warmup runs
    
    Returns
    -------
    tuple
        (mean_time_ms, std_time_ms)
    """
    # Warmup
    for _ in range(warmup):
        func()
    
    # Timed runs
    times = []
    for _ in range(n_runs):
        start = time.perf_counter()
        func()
        end = time.perf_counter()
        times.append((end - start) * 1000)  # Convert to ms
    
    return np.mean(times), np.std(times)

# Benchmark ONNX inference
print("ONNX Runtime Performance:")
print("-" * 50)
onnx_benchmark = {}
for variant, session in onnx_sessions.items():
    input_name = session.get_inputs()[0].name
    input_data = inputs[variant]
    
    mean_ms, std_ms = benchmark(lambda: session.run(None, {input_name: input_data}))
    onnx_benchmark[variant] = mean_ms
    print(f"{variant.upper():6} | {mean_ms:8.2f} ms +/- {std_ms:.2f} ms")


In [None]:
# Benchmark PyTorch inference
print("PyTorch Performance:")
print("-" * 50)
pt_benchmark = {}

with torch.no_grad():
    for variant, model in pt_models.items():
        input_tensor = torch.from_numpy(inputs[variant])
        
        mean_ms, std_ms = benchmark(lambda: model(input_tensor))
        pt_benchmark[variant] = mean_ms
        print(f"{variant.upper():6} | {mean_ms:8.2f} ms +/- {std_ms:.2f} ms")


## 8. Integration Examples

### 8.1 Batch Processing

Process multiple images in a single batch for improved throughput.


In [None]:
def batch_inference(session, images: list, batch_size: int = 4):
    """
    Process images in batches.
    
    Parameters
    ----------
    session : ort.InferenceSession
        ONNX inference session
    images : list
        List of input images (numpy arrays)
    batch_size : int
        Batch size for processing
    
    Returns
    -------
    list
        List of outputs
    """
    results = []
    input_name = session.get_inputs()[0].name
    
    for i in range(0, len(images), batch_size):
        batch = np.stack(images[i:i+batch_size])
        output = session.run(None, {input_name: batch})
        results.extend(output[0])
    
    return results

# Example: Process 8 images with the base variant
if "base" in onnx_sessions:
    sample_images = [prepare_input(64)[0] for _ in range(8)]
    batch_results = batch_inference(onnx_sessions["base"], sample_images, batch_size=4)
    print(f"Processed {len(sample_images)} images, got {len(batch_results)} outputs")


### 8.2 Model Selection Strategy

Choose the appropriate variant based on requirements.


In [None]:
def select_variant(
    max_latency_ms: float = None,
    max_memory_mb: float = None,
    min_resolution: int = None,
    prefer_accuracy: bool = True
) -> str:
    """
    Select the best model variant based on constraints.
    
    Parameters
    ----------
    max_latency_ms : float, optional
        Maximum acceptable latency in milliseconds
    max_memory_mb : float, optional
        Maximum model size in megabytes
    min_resolution : int, optional
        Minimum required input resolution
    prefer_accuracy : bool
        If True, prefer larger models when constraints allow
    
    Returns
    -------
    str
        Recommended variant name
    """
    # Priority order based on preference
    variants = ["mega", "large", "base", "tiny"] if prefer_accuracy else ["tiny", "base", "large", "mega"]
    
    for variant in variants:
        # Check latency constraint
        if max_latency_ms and variant in onnx_benchmark:
            if onnx_benchmark[variant] > max_latency_ms:
                continue
        
        # Check resolution constraint
        if min_resolution and RESOLUTIONS[variant] < min_resolution:
            continue
        
        return variant
    
    return "base"  # Default fallback

# Example usage
recommended = select_variant(max_latency_ms=50, prefer_accuracy=True)
print(f"Recommended variant for <50ms latency: {recommended.upper()}")


## 9. Best Practices

### Variant Selection Guidelines

| Scenario | Recommended Variant | Rationale |
|----------|--------------------|-----------|
| Edge deployment (IoT, drones) | tiny | Minimal memory and compute |
| Production API service | base | Balanced performance |
| High-accuracy batch processing | large | Better accuracy, acceptable latency |
| Research and validation | mega | Maximum precision |

### Performance Optimization

1. **Use ONNX for Production**: ONNX Runtime provides optimized execution across platforms.
2. **Enable GPU Acceleration**: Use `CUDAExecutionProvider` for NVIDIA GPUs.
3. **Batch Processing**: Increase batch size to improve GPU utilization.
4. **Model Caching**: Cache loaded models to avoid repeated loading overhead.
5. **Input Preprocessing**: Use vectorized operations for input preparation.

### Memory Management

```python
# Clear GPU memory after inference
import gc
import torch

torch.cuda.empty_cache()
gc.collect()
```


## 10. References

### Documentation

- [Unbihexium Documentation](https://unbihexium-oss.github.io/unbihexium/)
- [Model Zoo Reference](https://github.com/unbihexium-oss/unbihexium/tree/main/model_zoo)
- [ONNX Runtime Documentation](https://onnxruntime.ai/docs/)
- [PyTorch Documentation](https://pytorch.org/docs/)

### Related Notebooks

- See the [notebooks index](./README.md) for all available tutorials.

---

**License**: Apache-2.0  
**Copyright**: 2025 Unbihexium OSS Foundation
