# 00_env_check — MAAP Environment Sanity Checks

This notebook verifies:
1) Python geospatial stack is importable (rasterio/pyproj/xarray, etc.)
2) S3 access works via `s3fs`
3) MAAP STAC can be queried via `pystac-client`
4) A tiny reprojection round-trip using `rasterio.warp.reproject`

**Tip:** If you have `config/project.yaml`, this notebook will try to read ROI/CRS from it; otherwise it uses defaults.


In [1]:
import sys, os, json, math, warnings
import numpy as np
import pandas as pd
import xarray as xr
from pathlib import Path
from pprint import pprint
import yaml

import rasterio
from rasterio.io import MemoryFile
from rasterio.transform import from_origin
from rasterio.warp import reproject, Resampling
from affine import Affine
import pyproj
from shapely.geometry import box, mapping

import fsspec
import s3fs

from pystac_client import Client

print('✅ Imports ok')
print('Python:', sys.version)
print('rasterio:', rasterio.__version__)
print('pyproj:', pyproj.__version__)
print('xarray:', xr.__version__)
print('pystac-client:', Client.__module__)
print('s3fs:', s3fs.__version__)
print('fsspec:', fsspec.__version__)

✅ Imports ok
Python: 3.10.19 | packaged by conda-forge | (main, Oct 13 2025, 14:08:27) [GCC 14.3.0]
rasterio: 1.3.11
pyproj: 3.6.1
xarray: 2025.6.1
pystac-client: pystac_client.client
s3fs: 2025.9.0
fsspec: 2025.9.0


In [2]:
# Try to read ROI/CRS from config/project.yaml if present
cfg_path = Path('../config/project.yaml')
cfg = {}
if cfg_path.exists():
    with open(cfg_path, 'r') as f:
        cfg = yaml.safe_load(f)
        print('Loaded config from', cfg_path)
else:
    print('No config/project.yaml found — using defaults.')

roi = cfg.get('roi', {
    'west': -121.9,
    'east': -121.7,
    'south': 37.25,
    'north': 37.35,
})
target_grid = cfg.get('target_grid', {
    'crs': 'EPSG:6933',   # EASE-Grid 2.0 as default
    'resolution': 200,
    'nodata': -9999.0,
    'dtype': 'float32'
})
print('ROI:', roi)
print('Target grid spec:', target_grid)

Loaded config from ../config/project.yaml
ROI: {'west': -118.48, 'east': -117.47, 'south': 34.235, 'north': 35.365}
Target grid spec: {'crs': 'EPSG:6933', 'resolution': 200, 'nodata': -9999.0, 'dtype': 'float32'}


## 1) S3 access test
Set an S3 prefix (folder) that you know is listable on MAAP. In this example, we set it to the Global Land Cover and Land Use Change, 2000-2020 datasets, see detail
https://glad.umd.edu/dataset/GLCLUC2020


In [3]:
# --- EDIT THIS: put your bucket/prefix here (must end with '/') ---
S3_PREFIX = "s3://nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/"

print('Attempting to list:', S3_PREFIX)
try:
    fs = s3fs.S3FileSystem()  # anon=True for public; set False if using creds/roles
    items = fs.ls(S3_PREFIX)
    print(f'✅ S3 list ok — got {len(items)} items (showing up to 10):')
    for p in items[:10]:
        print('  ', p)
except Exception as e:
    print('⚠️ S3 listing failed — check prefix/permissions. Error:')
    print(e)

Attempting to list: s3://nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/
✅ S3 list ok — got 280 items (showing up to 10):
   nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/00N_000E.tif
   nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/00N_010E.tif
   nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/00N_020E.tif
   nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/00N_030E.tif
   nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/00N_040E.tif
   nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/00N_040W.tif
   nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/00N_050W.tif
   nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/00N_060W.tif
   nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/00N_070W.tif
   nasa-maap-data-store/file-staging/nasa-map/glad-glclu2020/v2/2020/00N_080W.tif


## 2) MAAP STAC search test
We query the MAAP STAC endpoint for items intersecting the ROI. Update `COLLECTION` and `ASSET_KEY` as needed.


In [4]:
STAC_URL = 'https://stac.maap-project.org'
# Example collections 
COLLECTION = cfg.get('collection', 'nisar-sim')
ASSET_KEY = cfg.get('asset_key', None)

print('STAC URL:', STAC_URL)
print('Collection:', COLLECTION)
try:
    client = Client.open(STAC_URL)
    # Build ROI polygon
    search = client.search(collections=[COLLECTION], max_items=10)
    items = list(search.items())
    print(f'Found {len(items)} STAC item(s).')
    if items:
        it0 = items[0]
        print('Example item id:', it0.id)
        if ASSET_KEY and ASSET_KEY in it0.assets:
            print('Asset href:', it0.assets[ASSET_KEY].href)
        else:
            print('Available asset keys:', list(it0.assets.keys())[:10])

    for item in search.items():
        if asset := item.assets.get("GCOV"):
            s3_link = asset.extra_fields["alternate"]["href"]
            print('Example item id:', item.id)
            print('s3 link:', s3_link)

except Exception as e:
    print('⚠️ STAC query failed — adjust collection/ROI or check endpoint. Error:')
    print(e)

STAC URL: https://stac.maap-project.org
Collection: nisar-sim
Found 10 STAC item(s).
Example item id: NISAR_L3_PR_SME2_001_008_D_070_4000_QPNA_A_20190829T180759_20190829T180809_P01101_M_P_J_001
Available asset keys: ['SME2']
Example item id: NISAR_L2_PR_GCOV_001_005_A_219_4020_SHNA_A_20081012T060910_20081012T060926_P01101_F_N_J_001
s3 link: s3://sds-n-cumulus-prod-nisar-sample-data/GCOV/ALOS1_Rosamond_20081012/NISAR_L2_PR_GCOV_001_005_A_219_4020_SHNA_A_20081012T060910_20081012T060926_P01101_F_N_J_001.h5


## 3) Tiny reprojection round-trip
We create a synthetic 10m×10m raster in EPSG:32611 over the ROI. This ROI and projection is from the previous NISAR GCOV example (NISAR_L2_PR_GCOV_001_005_A_219_4020_SHNA_A_20081012T060910_20081012T060926_P01101_F_N_J_001). Then reproject it to the target grid CRS (ease2 in the example) with ~200 m resolution. 
This verifies that `rasterio.warp.reproject` and PROJ data are working.


In [5]:
from math import ceil

# --- 1. Define Source Parameters ---

# Source CRS (UTM Zone 11 North)
src_crs = 'EPSG:32611'
dst_crs = src_crs # <<< NO CHANGE IN PROJECTION

# Your input BoundingBox coordinates
left_val = 365480.0
bottom_val = 3789220.0
right_val = 456380.0
top_val = 3913620.0

# Source Resolution (10m)
src_res_m = 10.0
dx = src_res_m
dy = src_res_m

# Calculate source dimensions
nx = int(round((right_val - left_val) / dx))
ny = int(round((top_val - bottom_val) / dy))

# Define the source Affine Transform
src_transform = Affine(dx, 0.0, left_val, 0.0, -dy, top_val)

print(f"Source Grid: Width={nx}, Height={ny}, Transform: {src_transform}, CRS: {src_crs}")

# synthetic gradient data
src_data = np.linspace(0, 1, ny * nx, dtype='float32').reshape((ny, nx))


# --- 2. Define Target Grid Parameters ---

# Target Resolution (200m)
dst_res_m = 200.0

# Since the CRS is the same, the extent (min/max X and Y) remains the same!
xmin = left_val
ymax = top_val
xmax = right_val
ymin = bottom_val


# --- 3. Calculate Destination Grid Dimensions and Transform ---

# Calculate the dimensions of the destination array using the new resolution
# The extent (Xmax-Xmin, Ymax-Ymin) is unchanged, only the resolution is different.
dst_width = int(ceil((xmax - xmin) / dst_res_m))
dst_height = int(ceil((ymax - ymin) / dst_res_m))

# Define the new destination geotransform using the source's origin and the new resolution
# Origin (xmin, ymax) is preserved from the source extent.
dst_transform = from_origin(xmin, ymax, dst_res_m, dst_res_m)

# Create the destination array
dst_data = np.zeros((dst_height, dst_width), dtype='float32')

print(f"\nDestination Grid: Width={dst_width}, Height={dst_height}, CRS: {dst_crs} (Resolution: {dst_res_m}m)")


# --- 4. Perform the Reprojection (Resampling) ---

reproject(
    source=src_data,
    destination=dst_data,
    src_transform=src_transform,
    src_crs=src_crs,
    dst_transform=dst_transform,
    dst_crs=dst_crs, # <<< Same CRS as source
    resampling=Resampling.bilinear, 
)

print('\n✅ Resampling complete:')
print(f'  Source shape: {src_data.shape}')
print(f'  Target shape: {dst_data.shape}')
print('  Target min/max:', float(np.nanmin(dst_data)), float(np.nanmax(dst_data)))

Source Grid: Width=9090, Height=12440, Transform: | 10.00, 0.00, 365480.00|
| 0.00,-10.00, 3913620.00|
| 0.00, 0.00, 1.00|, CRS: EPSG:32611

Destination Grid: Width=455, Height=622, CRS: EPSG:32611 (Resolution: 200.0m)

✅ Resampling complete:
  Source shape: (12440, 9090)
  Target shape: (622, 455)
  Target min/max: 0.0 0.9990828633308411


## 4) (Optional) Save the destination raster as an in-memory COG
If you want to confirm COG writing, uncomment and run the cell below to write a small COG file under `../data/outputs/`.


In [6]:
# %% Optional COG write
out_dir = Path('../data/outputs')
out_dir.mkdir(parents=True, exist_ok=True)
out_path = out_dir / 'env_check_reprojected_200m.tif'

profile = {
    'driver': 'GTiff',
    'height': dst_data.shape[0],
    'width': dst_data.shape[1],
    'count': 1,
    'dtype': 'float32',
    'crs': dst_crs,
    'transform': dst_transform,
    'tiled': True,
    'compress': 'deflate'
}

with rasterio.open(out_path, 'w', **profile) as dst:
    dst.write(dst_data, 1)

print('✅ Wrote', out_path)
print('Tip: You can add overviews / convert to COG with rio-cogeo if desired.')

✅ Wrote ../data/outputs/env_check_reprojected_200m.tif
Tip: You can add overviews / convert to COG with rio-cogeo if desired.


---
**Next steps**
- Edit the `S3_PREFIX` and STAC `COLLECTION`/`ASSET_KEY` to match your data.
- Add your project `config/project.yaml` so ROI/CRS are shared across notebooks.
- Proceed to `01_discover_links.ipynb` to catalog S3/STAC hrefs for Land cover, NDVI, SMAP, NISAR and more.
