# Basin Similarity and Drought Prediction

This notebook demonstrates:
1. **Basin similarity mapping** using UMAP/PCA
2. **Drought prediction** using Random Forest
3. **Streamflow prediction** for ungauged basins

## Data Overview
- **Attributes**: Basin characteristics (elevation, climate, etc.)
- **Timeseries**: Streamflow, precipitation, temperature
- **Target**: Drought events (SRI < -1.0)


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import classification_report, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
import umap
from sklearn.decomposition import PCA
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")


## Load Data


In [None]:
# Load attributes and timeseries
attrs = pd.read_csv("data/attributes.csv")
ts = pd.read_csv("data/timeseries.csv")

print(f"Attributes shape: {attrs.shape}")
print(f"Timeseries shape: {ts.shape}")
print(f"\nAttributes columns: {list(attrs.columns)}")
print(f"Timeseries columns: {list(ts.columns)}")

# Display first few rows
print("\nAttributes:")
print(attrs.head())
print("\nTimeseries:")
print(ts.head())
