# Maxar Image Availability Analysis

The Maxar image availability workflow takes as input a list of TerraFund project ids and returns as output a csv listing every project and how much of that project’s area has Maxar imagery coverage.

#### Workflow:
1. Pull info on project characteristics for the entire portfolio using the TerraMatch API
    - Repo/notebook: terrafund-portfolio-analysis/tm-api.ipynb
    - Input: list of TerraFund project IDs
    - Output: csv of all project features
2. Using the TM API csv, pull Maxar metadata
    - Repo/notebook: maxar-tools/decision-tree-metadata.ipynb and maxar-tools/src/decision_tree.py (? may need to change b/c of my additions to the acquire_metadata function)
    - Input: csv of project features
    - Output: csv of maxar metadata
3. Create imagery features (??)
    - Repo/notebook: terrafund-portfolio-analysis/maxar-img-avail.py
    - Input: csv of maxar metadata and csv of TM project features
    - Output: csv of project features and percent imagery coverage
4. Identify projects with 100% imagery coverage

In [2]:
import pandas as pd # used
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import shape
from shapely.geometry import Polygon, Point
from shapely import union_all
import ast
from datetime import datetime, timedelta
import re
import os
import math
import requests
import yaml
import json
import pyproj
import sys
sys.path.append('../src/')
import image_availability as img
import process_api_results as clean
import decision_trees as tree
import tm_api_utils as api_request

%load_ext autoreload
%autoreload 2

### Parameters

In [3]:
# File paths
tm_auth_path = '../secrets.yaml'
tm_staging_url = "https://api-staging.terramatch.org/research/v3/sitePolygons?"                 # use for testing queries
tm_prod_url = "https://api.terramatch.org/research/v3/sitePolygons?"                            # Use to pull data for analysis'
approved_projects = '../terrafund-portfolio-analyses/projects_all_approved_202501091214.csv'    # List of projects with approved polygons
feats = '../data/tm_api_TEST.csv'                                                               # Polygon metadata & geometries from TM API
maxar_feats = '/home/darby/github_repos/maxar-tools/data/tm_api_TEST.csv'                       # Polygon metadata & geometries from TM API saved to maxar-tools repo
maxar_md = '../data/imagery_availability/comb_img_availability_2025-02-26.csv'                  # Metadata for Maxar images corresponding to polygons

# Define thesholds
cloud_thresh = 50             # Threshold for removing cloudy imagery
off_nadir_thresh = 30         # Threshold for removing imagery too far off nadir
sun_elev_thresh = 30          # Threshold for removing imagery with too steep of a sun angle
img_count = 1                 # Threshold for identifying image availability
baseline_range = (-366, 0)    # Baseline window (1 year before plantstart date)
ev_range = (730, 1095)        # Early verification window (2-3 years after plant start date)

## Code Workflow Outline

In [None]:
# Step 1: LOAD AND PREPROCESS DATA
# 1.1: Load polygon dataset
poly_csv = gpd.GeoDataFrame(polygon geometries & metadata)

# 1.2 Load image dataset
img_csv = gpd.GeoDataFrame(maxar image geometries & metadata)

# 1.3 Preprocess the data
poly_gdf = preprocess_polygons(poly_csv) # Clean data, convert geometries, enforce CRS
img_gdf = preprocess_images(img_csv) # Clean data, convert geometries, enforce CRS


# Step 2: MERGE POLYGON DATA WITH IMAGE DATA
merged_gdf = img_gdf.merge(poly_gdf, on=['project_id', 'poly_id'], how='left')

# Step 3: PRE-FILTER IMAGES
filtered_images = merged_gdf where:
    (date is within allowed date range) &
    (cloud cover < cloud_thresh) &
    (off-nadir angle < off_nadir_thresh) &
    (sun elevation < sun_elev_thresh)

# Step 4: ITERATE THROUGH PROJECTS AND POLYGONS TO CALCULATE IMAGERY COVERAGE
# 4.1 Create a dictionary for project-polygon mapping
project_polygons = {project_id: list of poly_ids associated with that project} # Create a dictionary

# 4.2 Initialize list to store low coverage cases
low_img_coverage_log = []

# 4.3 Iterate through each project
for each project_id in project_polygons:

    # 4.4 Get all polygons for this project
    project_polygons_list = list of poly_ids for this project_id

    # 4.5 Iterate through each polygon in the project
    for each poly_id in project_polygons_list:
    
        # 4.6 Get all images associated with this polygon
        poly_images = filtered_images[filtered_images['poly_id'] == poly_id]

        # Count the number of available images
        num_images = len(poly_images)

        # If no valid image exists, record 0% coverage
        if poly_images is empty:
            store result: (poly_id, project_id, None, num_images, 0, 0) # No images available
            continue

        # 4.7 Select the best image (lowest cloud cover)
        best_image = select_best_image(poly_images)

        # 4.8 Get polygon and image geometries
        poly_geom = poly_gdf[poly_gdf['poly_id'] == poly_id].geometry.iloc[0]
        best_img_geom = best_image['img_geom']

        # 4.9 Compute UTM Zone and reproject geometries
        poly_centroid = compute centroid of poly_geom
        utm_crs = get UTM CRS from centroid
        poly_geom_reprojected = reproject poly_geom to utm_crs
        best_img_geom_reprojected = reproject best_img_geom to utm_crs

        # 4.10 Calculate the polygon area dynamically (in hectares)
        poly_area_ha = poly_geom_reprojected.area / 10000

        # 4.11 Calculate area of overlap
        overlap_area = poly_geom_reprojected union best_img_geom_reprojected
        overlap_area_ha = overlap_area / 10000

        # 4.12 Compute percent of polygon area covered
        percent_img_cover = (overlap_area / poly_area_ha) * 100

        # 4.13 Log cases where imagery coverage is unexpectedly low
        if percent_img_cover < 50:
            log_entry = {
                'poly_id': poly_id,
                'project_id': project_id,
                'best_image': best_image['title'],
                'num_images': num_images,
                'poly_area_ha': poly_area_ha,
                'overlap_area_ha': overlap_area_ha,
                'percent_img_cover': percent_img_cover
            }
            low_img_coverage_log.append(log_entry)

        # 4.14 Store results
        store result: (poly_id, project_id, best_image['title'], num_images, poly_area_ha, overlap_area_ha, percent_img_cover)

# STEP 5: EXPORT LOW COVERAGE LOG IF NEEDED
if low_img_coverage_log is not empty:
    export_to_csv(low_img_coverage_log, "low_coverage_polygons.csv")

# Function Implementation

### STEP 1: LOAD & PREPROCESS DATA
Goal: ensure input data is clean & structured

In [None]:
## 1.1 LOAD IN POLYGON AND IMAGE CSVS
poly_df = pd.read_csv(feats)
img_df = pd.read_csv(maxar_md)

In [25]:
## 1.2 PREPROCESS POLYGON DATA
def preprocess_polygons(poly_df, debug=False):
    """
    Cleans up a dataframe of polygon metadata & geometries from the TerraMatch API and 
    converts it into a GeoDataframe

    Args:
        poly_df (DataFrame): Raw polygon dataset.

    Returns:
        GeoDataFrame: Processed polygon dataset with a geometry column as a shapely object.
    """
    # Enforce lowercase column names
    poly_df.columns = poly_df.columns.str.lower()

    # Rename 'name' and 'geometry' columns
    poly_df = poly_df.rename(columns={'name': 'poly_name', 'geometry': 'poly_geom'})

    # Convert 'plantstart' column to a datetime
    poly_df['plantstart'] = pd.to_datetime(poly_df['plantstart'], errors='coerce')

    # Convert stringified 'poly_geom' dictionaries into real dictionaries
    poly_df['poly_geom'] = poly_df['poly_geom'].apply(lambda x: shape(ast.literal_eval(x)) if isinstance(x, str) else shape(x))

    # Convert 'poly_geom' dictionaries from WKT to Shapely objects
    poly_df['poly_geom'] = poly_df['poly_geom'].apply(shape)

    # Convert to GeoDataFrame
    poly_gdf = gpd.GeoDataFrame(poly_df, geometry='poly_geom', crs="EPSG:4326")

    # Add a field for the polygon centroid
    poly_gdf['poly_centroid'] = poly_gdf['poly_geom'].iloc[0].centroid

    if debug:
        print(f"There are {len(poly_gdf.poly_id.unique())} unique polygons for {len(poly_gdf.project_id.unique())} projects in this dataset.")

    return poly_gdf

In [None]:
## 1.3 PREPROCESS MAXAR IMAGERY DATA
def preprocess_images(img_df, debug=True):
    """
    Cleans up a dataframe of maxar image metadata & geometries from the Maxar Discovery API and 
    converts it into a GeoDataframe

    Args:
        img_df (DataFrame): Raw image metadata dataset.
    
    Returns: 
        GeoDataFrame: Processed image dataset with a geometry column as a shapely object.
    """
    # Select relevent columns
    img_df = img_df[['title', 'project_id', 'poly_id', 'datetime', 'area:cloud_cover_percentage', 'eo:cloud_cover', 'area:avg_off_nadir_angle', 'view:sun_elevation', 'img_geom']]

    # Convert 'datetime' column to a datetime and rename
    img_df.loc[:, 'datetime'] = pd.to_datetime(img_df['datetime'], format='%Y-%m-%dT%H:%M:%S.%fZ', errors='coerce') # Convert to datetime type
    img_df.loc[:, 'datetime'] = img_df['datetime'].apply(lambda x: x.replace(tzinfo=None) if pd.notna(x) else x)    # Remove time zone info
    img_df = img_df.rename(columns={'datetime': 'img_date'}) # Rename the column img_date

    # Convert stringified 'poly_geom' dictionaries into real dictionaries
    img_df['img_geom'] = img_df['img_geom'].apply(lambda x: shape(ast.literal_eval(x)) if isinstance(x, str) else shape(x))

    # Convert 'img_geom' (image footprint geometries) from WKT to Shapely objects
    img_df['img_geom'] = img_df['img_geom'].apply(shape)

    # Convert DataFrame to GeoDataFrame
    img_gdf = gpd.GeoDataFrame(img_df, geometry='img_geom', crs="EPSG:4326")

    # Add a field for the image centroid
    img_gdf['img_centroid'] = img_gdf['img_geom'].iloc[0].centroid

    if debug:
        print(f"There are {len(img_gdf)} images for {len(img_gdf.poly_id.unique())} polygons in {len(img_gdf.project_id.unique())} projects in this dataset.")

    return img_gdf

### STEP 2: MERGE & FILTER DATA
Goal: link images to polygons and apply filters

In [None]:
## 2.1 MERGE THE POLYGON ATTRIBUTES TO THE IMAGES GEODATAFRAME
def merge_polygons_images(img_gdf, poly_gdf, debug=True):
    """ 
    Merges the polygon metadata into the Maxar image GeoDataFrame. All rows of the img_gdf are preserved.
    Also records polygons that are dropped because they don't have any associated images.

    Args:
        img_gdf (GeoDataFrame): Image metadata dataset (each row represents a Maxar image)
        poly_gdf (GeoDataFrame): Polygon dataset (each row represents a polygon from the TM API)
    
    Returns:
        tuple: (GeoDataFrame of merged dataset, list of missing polygons (poly_id, project_id))
    """
    # Merge the image data with the polygon data (preserving image data rows and adding associated polygon attributes)
    merged_gdf = img_gdf.merge(poly_gdf, on=['project_id', 'poly_id'], how='left')

    # Identify polygons without any corresponding Maxar images
    missing_polygons_df = poly_gdf[~poly_gdf['poly_id'].isin(merged_gdf['poly_id'])]

    # Save poly_id and project_id of missing polygons as a list of tuples
    missing_polygons_list = list(missing_polygons_df[['poly_id', 'project_id']].itertuples(index=False, name=None))

    if debug:
        print(f"Total images in img_gdf: {len(img_gdf)}")
        print(f"Total polygons in poly_gdf: {len(poly_gdf)}")
        print(f"Total rows in merged dataset: {len(merged_gdf)}")
        print(f"Unique polygons in merged dataset: {len(merged_gdf['poly_id'].unique())}")
    
        # Count polygons dropped due to no matching images
        missing_polygons = len(poly_gdf[~poly_gdf['poly_id'].isin(merged_gdf['poly_id'])])
        print(f"There {missing_polygons} polygons without images in the merged dataset")
        print(f"Polygons without images (dropped at this stage): {missing_polygons_list}")

    return merged_gdf, missing_polygons_list


In [56]:
merged_gdf, missing_polygons = merge_polygons_images(img_gdf, poly_gdf, debug=True)
merged_gdf.head()
#missing_polygons

Total images in img_gdf: 229
Total polygons in poly_gdf: 16
Total rows in merged dataset: 229
Unique polygons in merged dataset: 16
There 0 polygons without images in the merged dataset
Polygons without images (dropped at this stage): []


Unnamed: 0,title,project_id,poly_id,img_date,area:cloud_cover_percentage,eo:cloud_cover,area:avg_off_nadir_angle,view:sun_elevation,img_geom,img_centroid,...,plantend,practice,targetsys,distr,numtrees,calcarea,indicators,establishmenttreespecies,reportingperiods,poly_centroid
0,Maxar WV03 Image 10400100988B2F00,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2024-07-27 07:41:56.710736,0.0,61.366533,18.931963,54.273473,"POLYGON ((39.12775 -3.95945, 38.99426 -3.96954...",POINT (39.06041989897945 -4.7406910399983815),...,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",POINT (35.587706972193 0.3331192744796512)
1,Maxar WV02 Image 10300100F27EE700,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-12-05 07:39:49.467465,0.0,38.634718,20.688286,61.224778,"POLYGON ((38.96875 -3.9694, 38.96875 -3.97614,...",POINT (39.06041989897945 -4.7406910399983815),...,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",POINT (35.587706972193 0.3331192744796512)
2,Maxar WV02 Image 10300100F00AC700,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-12-05 07:38:48.766730,0.0,39.623082,26.39833,61.016359,"POLYGON ((38.96218 -3.98494, 38.96219 -4.03341...",POINT (39.06041989897945 -4.7406910399983815),...,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",POINT (35.587706972193 0.3331192744796512)
3,Maxar WV02 Image 10300100ED672B00,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-09-14 07:48:45.755427,1.587057,13.354023,20.696114,65.952015,"POLYGON ((39.11501 -5.5161, 39.27359 -5.51484,...",POINT (39.06041989897945 -4.7406910399983815),...,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",POINT (35.587706972193 0.3331192744796512)
4,Maxar WV02 Image 10300100EDCB5700,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-09-14 07:47:37.555494,0.0,13.228893,12.65585,65.691295,"POLYGON ((39.11472 -4.54988, 39.11469 -4.57994...",POINT (39.06041989897945 -4.7406910399983815),...,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",POINT (35.587706972193 0.3331192744796512)


### STEP 3: PROCESS EACH POLYGON
Goal: Prepare polygons & select best image

### STEP 4: COMPUTE COVERAGE
Goal: calculate imagery coverage per polygon

### STEP 5: EXPORT RESULTS
Goal: save results for review