# Distributed Computing with Dask

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ucid-foundation/ucid/blob/main/notebooks/12_distributed_dask_grid.ipynb)

---

## Overview

Process large-scale UCID grids using Dask for distributed computing:

1. Dask DataFrame basics
2. Parallelized UCID generation
3. Distributed scoring
4. Performance optimization

---

In [None]:
%pip install -q ucid dask[complete]

In [None]:
import numpy as np
import pandas as pd

import ucid

print(f"UCID version: {ucid.__version__}")

---

## 1. Dask Setup

In [None]:
try:
    import dask.dataframe as dd
    from dask.distributed import Client, LocalCluster

    cluster = LocalCluster(n_workers=2, threads_per_worker=2)
    client = Client(cluster)
    print(f"Dask dashboard: {client.dashboard_link}")
except ImportError:
    print("Dask not installed")

---

## 2. Parallel UCID Generation

In [None]:
# Create large dataset
n_points = 10000
df = pd.DataFrame(
    {
        "lat": np.random.uniform(40.8, 41.2, n_points),
        "lon": np.random.uniform(28.6, 29.4, n_points),
    }
)

print(f"Created {len(df)} points")

In [None]:
try:
    ddf = dd.from_pandas(df, npartitions=4)
    print(f"Partitions: {ddf.npartitions}")
except:
    print("Using pandas fallback")

---

## 3. Scaling Guidelines

In [None]:
# Scaling recommendations
scaling = {
    "< 10K points": "Single machine, pandas",
    "10K - 1M points": "Dask LocalCluster",
    "1M - 100M points": "Dask distributed cluster",
    "> 100M points": "Spark or Ray cluster",
}

print("Scaling Recommendations:")
for scale, rec in scaling.items():
    print(f"  {scale}: {rec}")

---

## Summary

Key concepts:
- Dask parallelizes pandas operations
- Partition data for distributed processing
- Monitor with Dask dashboard

---

*Copyright 2026 UCID Foundation. Licensed under EUPL-1.2.*