# Maxar Image Availability Analysis

The Maxar image availability workflow takes as input a list of TerraFund project ids and returns as output a csv listing every project and how much of that project’s area has Maxar imagery coverage.

#### Workflow:
1. Pull info on project characteristics for the entire portfolio using the TerraMatch API
    - Repo/notebook: terrafund-portfolio-analysis/tm-api.ipynb
    - Input: list of TerraFund project IDs
    - Output: csv of all project features
2. Using the TM API csv, pull Maxar metadata
    - Repo/notebook: maxar-tools/decision-tree-metadata.ipynb and maxar-tools/src/decision_tree.py (? may need to change b/c of my additions to the acquire_metadata function)
    - Input: csv of project features
    - Output: csv of maxar metadata
3. Create imagery features (??)
    - Repo/notebook: terrafund-portfolio-analysis/maxar-img-avail.py
    - Input: csv of maxar metadata and csv of TM project features
    - Output: csv of project features and percent imagery coverage
4. Identify projects with 100% imagery coverage


In [65]:
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import shape
from shapely.geometry import Polygon, Point
from shapely import union_all
import ast
from datetime import datetime, timedelta
import re
import os
import math
import requests
import yaml
import json
import pyproj
import sys
sys.path.append('../src/')
import image_availability as img
import process_api_results as clean
import decision_trees as tree
import tm_api_utils as api_request

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Parameters

In [19]:
# File paths
tm_auth_path = '../secrets.yaml'
tm_staging_url = "https://api-staging.terramatch.org/research/v3/sitePolygons?"                 # use for testing queries
tm_prod_url = "https://api.terramatch.org/research/v3/sitePolygons?"                            # Use to pull data for analysis'
approved_projects = '../terrafund-portfolio-analyses/projects_all_approved_202501091214.csv'    # List of projects with approved polygons
feats = '../data/tm_api_TEST.csv'                                                               # Polygon metadata & geometries from TM API
maxar_feats = '/home/darby/github_repos/maxar-tools/data/tm_api_TEST.csv'                       # Polygon metadata & geometries from TM API saved to maxar-tools repo
maxar_md = '../data/imagery_availability/comb_img_availability_2025-02-26.csv'                  # Metadata for Maxar images corresponding to polygons

# Define thesholds
cloud_thresh = 50             # Threshold for removing cloudy imagery
off_nadir_thresh = 30         # Threshold for removing imagery too far off nadir
sun_elev_thresh = 30          # Threshold for removing imagery with too steep of a sun angle
img_count = 1                 # Threshold for identifying image availability
baseline_range = (-366, 0)    # Baseline window (1 year before plantstart date)
ev_range = (730, 1095)        # Early verification window (2-3 years after plant start date)

### Load & Preprocess Data
Inputs: 
- TM API csv
- Maxar metadata csv

In [None]:
# # Load TM API polygons & convert to dataframe
# polygons = pd.read_csv(feats)
# polygons.columns = polygons.columns.str.lower()    # Enforce lowercase column names
# poly_df = pd.DataFrame(polygons)
# poly_df.columns

# # Rename columns
# poly_df = poly_df.rename(columns={'name': 'poly_name','geometry': 'poly_geom'})

# # Convert 'plantstart' column to a datetime
# poly_df['plantstart'] = pd.to_datetime(poly_df['plantstart'], errors='coerce')

In [72]:
# Load TM API polygons and convert to a GeoDataFrame
polygons = pd.read_csv(feats)
polygons.columns = polygons.columns.str.lower()   # Enforce lowercase column names

# Rename 'name' and 'geometry' columns
poly_df = polygons.rename(columns={'name': 'poly_name', 'geometry': 'poly_geom'})  

# Convert 'plantstart' column to a datetime
poly_df['plantstart'] = pd.to_datetime(poly_df['plantstart'], errors='coerce')

# Convert stringified 'poly_geom' dictionaries into real dictionaries
poly_df['poly_geom'] = poly_df['poly_geom'].apply(lambda x: shape(ast.literal_eval(x)) if isinstance(x, str) else shape(x))

# Convert 'poly_geom' (polygon geometries) from WKT to Shapely objects
poly_df['poly_geom'] = poly_df['poly_geom'].apply(shape)

# Add a field for the polygon centroid
poly_df['poly_centroid'] = poly_df['poly_geom'].iloc[0].centroid

# Convert DataFrame to GeoDataFrame
poly_gdf = gpd.GeoDataFrame(poly_df, geometry='poly_geom', crs="EPSG:4326")

In [74]:
print(poly_gdf.shape)
poly_gdf.head()
len(poly_gdf['poly_id'].unique())
poly_gdf

(16, 17)


Unnamed: 0,poly_name,status,siteid,poly_geom,plantstart,plantend,practice,targetsys,distr,numtrees,calcarea,indicators,establishmenttreespecies,reportingperiods,poly_id,project_id,poly_centroid
0,"SAVE KENYA WATER TOWERS, MOROB SUB-LOCATION SITE",approved,ae5a9efd-66b0-4985-8c4c-7e733fa9363e,"POLYGON ((35.58453 0.32554, 35.58457 0.32598, ...",2024-05-01,2024-09-05,"tree-planting, assisted-natural-regeneration",agroforest,partial,45802.0,70.061969,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2024-07-30T00:00:00.000Z', 'submit...",a91435c7-a179-4c1d-9891-de0fe1741654,146b6912-62a1-4b58-b027-466dc3295731,POINT (35.587706972193 0.3331192744796512)
1,Mwambani_2 (new),approved,b38fcde9-e336-4fb0-b4ae-21fd6762c852,"POLYGON ((39.12436 -5.11594, 39.12478 -5.1153,...",2022-01-09,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",410696dc-9579-4412-9c7b-55194cb1867c,3a860077-df4c-4e95-8fec-41520c551243,POINT (35.587706972193 0.3331192744796512)
2,Mwambani_4 (new),approved,b38fcde9-e336-4fb0-b4ae-21fd6762c852,"POLYGON ((39.12075 -5.12238, 39.12106 -5.1222,...",2022-01-09,,tree-planting,mangrove,Null,,2.189894,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",f6871a61-a766-451a-be90-086219616cef,3a860077-df4c-4e95-8fec-41520c551243,POINT (35.587706972193 0.3331192744796512)
3,Mwambani_5 (new),approved,b38fcde9-e336-4fb0-b4ae-21fd6762c852,"POLYGON ((39.11226 -5.12444, 39.11242 -5.12405...",2022-01-09,,tree-planting,mangrove,Null,,5.10203,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",9e745667-0701-434a-8ecb-d917fe2bcf29,3a860077-df4c-4e95-8fec-41520c551243,POINT (35.587706972193 0.3331192744796512)
4,Mwambani_6 (new),approved,b38fcde9-e336-4fb0-b4ae-21fd6762c852,"POLYGON ((39.1083 -5.12901, 39.10894 -5.12781,...",2022-01-09,,tree-planting,mangrove,Null,,5.032552,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",9e508b07-4534-4e04-bb5b-bb0d3734a796,3a860077-df4c-4e95-8fec-41520c551243,POINT (35.587706972193 0.3331192744796512)
5,Mwambani_7 (new),approved,b38fcde9-e336-4fb0-b4ae-21fd6762c852,"POLYGON ((39.10328 -5.13694, 39.10355 -5.13711...",2022-01-09,,tree-planting,mangrove,Null,,1.851621,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",1cbca6da-0024-47dc-bb3a-06f8727d1cd6,3a860077-df4c-4e95-8fec-41520c551243,POINT (35.587706972193 0.3331192744796512)
6,Mwambani_8 (new),approved,b38fcde9-e336-4fb0-b4ae-21fd6762c852,"POLYGON ((39.10073 -5.14421, 39.10073 -5.14386...",2022-01-09,,tree-planting,mangrove,Null,,0.822402,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",e7223a4d-68c6-4d32-b140-f871dec62bd3,3a860077-df4c-4e95-8fec-41520c551243,POINT (35.587706972193 0.3331192744796512)
7,Mwambani_9 (new),approved,b38fcde9-e336-4fb0-b4ae-21fd6762c852,"POLYGON ((39.09877 -5.15152, 39.09865 -5.15083...",2022-01-09,,tree-planting,mangrove,Null,,4.84816,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",0b9ef620-327a-4be2-8b0c-50ec0fa06788,3a860077-df4c-4e95-8fec-41520c551243,POINT (35.587706972193 0.3331192744796512)
8,Mwambani_12 (new),approved,b38fcde9-e336-4fb0-b4ae-21fd6762c852,"POLYGON ((39.107 -5.1699, 39.10723 -5.16942, 3...",2022-01-09,,tree-planting,mangrove,Null,,0.996572,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",e7e42658-360a-4452-8be4-60ea8d1ef0e7,3a860077-df4c-4e95-8fec-41520c551243,POINT (35.587706972193 0.3331192744796512)
9,Mwambani_1 (new),approved,b38fcde9-e336-4fb0-b4ae-21fd6762c852,"POLYGON ((39.12405 -5.11083, 39.12318 -5.11116...",2022-01-09,,tree-planting,mangrove,Null,,26.958225,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",e18c2562-7f73-4fd2-a361-a6eee01ed71a,3a860077-df4c-4e95-8fec-41520c551243,POINT (35.587706972193 0.3331192744796512)


In [92]:
# Load Maxar images metadata and convert to a GeoDataFrame
images = pd.read_csv(maxar_md)
print(images.columns)

# Select relevent columns
img_df = images[['title', 'project_id', 'poly_id', 'datetime', 'area:cloud_cover_percentage', 'eo:cloud_cover', 'area:avg_off_nadir_angle', 'view:sun_elevation', 'img_geom']]

# Convert 'datetime' column to a datetime and rename
img_df.loc[:, 'datetime'] = pd.to_datetime(img_df['datetime'], format='%Y-%m-%dT%H:%M:%S.%fZ', errors='coerce') # Convert to datetime type
img_df.loc[:, 'datetime'] = img_df['datetime'].apply(lambda x: x.replace(tzinfo=None) if pd.notna(x) else x)    # Remove time zone info
img_df = img_df.rename(columns={'datetime': 'img_date'})                                                        # Rename 'datetime' column 'img_date'

# Convert stringified 'poly_geom' dictionaries into real dictionaries
img_df['img_geom'] = img_df['img_geom'].apply(lambda x: shape(ast.literal_eval(x)) if isinstance(x, str) else shape(x))

# Convert 'img_geom' (image footprint geometries) from WKT to Shapely objects
img_df['img_geom'] = img_df['img_geom'].apply(shape)

# Add a field for the image centroid
img_df['img_centroid'] = img_df['img_geom'].iloc[0].centroid

# Convert DataFrame to GeoDataFrame
img_gdf = gpd.GeoDataFrame(img_df, geometry='img_geom', crs="EPSG:4326")

Index(['gsd', 'title', 'datetime', 'eo:bands', 'platform', 'utc_hour',
       'local_hour', 'instruments', 'associations', 'view:azimuth',
       'constellation', 'off_nadir_avg', 'off_nadir_end', 'off_nadir_max',
       'off_nadir_min', 'rda_available', 'utc_month_day', 'eo:cloud_cover',
       'scan_direction', 'view:off_nadir', 'local_month_day',
       'off_nadir_start', 'timezone_offset', 'utc_time_of_day',
       'collect_time_end', 'view:sun_azimuth', 'local_time_of_day',
       'raw_archive_state', 'collect_time_start', 'pan_resolution_avg',
       'pan_resolution_end', 'pan_resolution_max', 'pan_resolution_min',
       'view:sun_elevation', 'multi_resolution_avg', 'multi_resolution_end',
       'multi_resolution_max', 'multi_resolution_min', 'pan_resolution_start',
       'acquisition_rev_number', 'multi_resolution_start',
       'view:sun_elevation_max', 'view:sun_elevation_min',
       'geolocation_uncertainty', 'stereo_pair_identifiers',
       'area:avg_off_nadir_angle', '

In [77]:
print(img_gdf.shape)
img_gdf.head()

(229, 10)


Unnamed: 0,title,project_id,poly_id,img_date,area:cloud_cover_percentage,eo:cloud_cover,area:avg_off_nadir_angle,view:sun_elevation,img_geom,img_centroid
0,Maxar WV03 Image 10400100988B2F00,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2024-07-27 07:41:56.710736,0.0,61.366533,18.931963,54.273473,"POLYGON ((39.12775 -3.95945, 38.99426 -3.96954...",POINT (39.06041989897945 -4.7406910399983815)
1,Maxar WV02 Image 10300100F27EE700,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-12-05 07:39:49.467465,0.0,38.634718,20.688286,61.224778,"POLYGON ((38.96875 -3.9694, 38.96875 -3.97614,...",POINT (39.06041989897945 -4.7406910399983815)
2,Maxar WV02 Image 10300100F00AC700,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-12-05 07:38:48.766730,0.0,39.623082,26.39833,61.016359,"POLYGON ((38.96218 -3.98494, 38.96219 -4.03341...",POINT (39.06041989897945 -4.7406910399983815)
3,Maxar WV02 Image 10300100ED672B00,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-09-14 07:48:45.755427,1.587057,13.354023,20.696114,65.952015,"POLYGON ((39.11501 -5.5161, 39.27359 -5.51484,...",POINT (39.06041989897945 -4.7406910399983815)
4,Maxar WV02 Image 10300100EDCB5700,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-09-14 07:47:37.555494,0.0,13.228893,12.65585,65.691295,"POLYGON ((39.11472 -4.54988, 39.11469 -4.57994...",POINT (39.06041989897945 -4.7406910399983815)


### Merge Images with Polygons
Inputs:
- poly_gdf: geodataframe of polygon metadata
- img_gdf: geodataframe of maxar image metadata

Outputs:
- merged: merged geodataframe of maxar image metadata + associated polygon metadata

In [34]:
# Merge the image data with the polygon data (preserving image data rows and adding associated polygon attributes)
merged_gdf = img_gdf.merge(poly_gdf, on=['project_id', 'poly_id'], how='left')

# Ensure correct datetime format
merged_gdf['plantstart'] = pd.to_datetime(merged_gdf['plantstart'], errors='coerce')
merged_gdf['img_date'] = pd.to_datetime(merged_gdf['img_date'], errors='coerce')

In [35]:
print(merged_gdf.shape)
print(merged_gdf.columns)
merged_gdf.head()

(229, 23)
Index(['title', 'project_id', 'poly_id', 'img_date',
       'area:cloud_cover_percentage', 'eo:cloud_cover',
       'area:avg_off_nadir_angle', 'view:sun_elevation', 'img_geom',
       'poly_name', 'status', 'siteid', 'poly_geom', 'plantstart', 'plantend',
       'practice', 'targetsys', 'distr', 'numtrees', 'calcarea', 'indicators',
       'establishmenttreespecies', 'reportingperiods'],
      dtype='object')


Unnamed: 0,title,project_id,poly_id,img_date,area:cloud_cover_percentage,eo:cloud_cover,area:avg_off_nadir_angle,view:sun_elevation,img_geom,poly_name,...,plantstart,plantend,practice,targetsys,distr,numtrees,calcarea,indicators,establishmenttreespecies,reportingperiods
0,Maxar WV03 Image 10400100988B2F00,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2024-07-27 07:41:56.710736,0.0,61.366533,18.931963,54.273473,"POLYGON ((39.12775 -3.95945, 38.99426 -3.96954...",Mwambani_2 (new),...,2022-01-09,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit..."
1,Maxar WV02 Image 10300100F27EE700,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-12-05 07:39:49.467465,0.0,38.634718,20.688286,61.224778,"POLYGON ((38.96875 -3.9694, 38.96875 -3.97614,...",Mwambani_2 (new),...,2022-01-09,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit..."
2,Maxar WV02 Image 10300100F00AC700,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-12-05 07:38:48.766730,0.0,39.623082,26.39833,61.016359,"POLYGON ((38.96218 -3.98494, 38.96219 -4.03341...",Mwambani_2 (new),...,2022-01-09,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit..."
3,Maxar WV02 Image 10300100ED672B00,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-09-14 07:48:45.755427,1.587057,13.354023,20.696114,65.952015,"POLYGON ((39.11501 -5.5161, 39.27359 -5.51484,...",Mwambani_2 (new),...,2022-01-09,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit..."
4,Maxar WV02 Image 10300100EDCB5700,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2023-09-14 07:47:37.555494,0.0,13.228893,12.65585,65.691295,"POLYGON ((39.11472 -4.54988, 39.11469 -4.57994...",Mwambani_2 (new),...,2022-01-09,,tree-planting,mangrove,Null,,11.174912,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit..."


In [36]:
# Summarize the number of images by project and polygon for merged (unfiltered) images
merged_gdf_summary = (
    merged_gdf.groupby(['project_id', 'poly_id'])
    .size()
    .reset_index(name='merged_img_count')
)

merged_gdf_summary

Unnamed: 0,project_id,poly_id,merged_img_count
0,146b6912-62a1-4b58-b027-466dc3295731,a91435c7-a179-4c1d-9891-de0fe1741654,5
1,3a860077-df4c-4e95-8fec-41520c551243,0b9ef620-327a-4be2-8b0c-50ec0fa06788,17
2,3a860077-df4c-4e95-8fec-41520c551243,1cbca6da-0024-47dc-bb3a-06f8727d1cd6,17
3,3a860077-df4c-4e95-8fec-41520c551243,212d5966-2c94-4db7-98e9-4847cfdc4215,17
4,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,17
5,3a860077-df4c-4e95-8fec-41520c551243,4d13b994-be20-4392-9f0f-68709607e96b,15
6,3a860077-df4c-4e95-8fec-41520c551243,8bc43765-9e53-4702-ba97-13005b806126,17
7,3a860077-df4c-4e95-8fec-41520c551243,9e508b07-4534-4e04-bb5b-bb0d3734a796,16
8,3a860077-df4c-4e95-8fec-41520c551243,9e745667-0701-434a-8ecb-d917fe2bcf29,18
9,3a860077-df4c-4e95-8fec-41520c551243,c9b59851-e4b7-4271-ac99-f4e601f86e85,18


### Filter Images Based on Constraints
Inputs:
- merged: merged dataframe of maxar image metadata + associated polygon metadata

Outputs:
- filtered_merged: a filtered version of the merged dataframe of maxar image metadata + associated polygon metadata

In [43]:
# Create a date differential column
merged_gdf['date_diff'] = (merged_gdf['img_date'] - merged_gdf['plantstart']).dt.days

# Filter to retain only images within the desired time range, cloud cover, off nadir angle, and sun elevation parameters
img_gdf_filtered = merged_gdf[
    (merged_gdf['date_diff'] >= baseline_range[0]) &
    (merged_gdf['date_diff'] <= baseline_range[1]) &
    (merged_gdf['area:cloud_cover_percentage'] < cloud_thresh) &
    (merged_gdf['area:avg_off_nadir_angle'] <= off_nadir_thresh) &
    (merged_gdf['view:sun_elevation'] >= sun_elev_thresh)
].copy()    # Copy to avoid SettingWithCopyWarning

In [44]:
print('img_gdf_filtered Unique Polygons:', len(img_gdf_filtered['poly_id'].unique()))
img_gdf_filtered['poly_id'].value_counts()

img_gdf_filtered Unique Polygons: 15


poly_id
1cbca6da-0024-47dc-bb3a-06f8727d1cd6    3
212d5966-2c94-4db7-98e9-4847cfdc4215    3
f6871a61-a766-451a-be90-086219616cef    2
9e745667-0701-434a-8ecb-d917fe2bcf29    2
9e508b07-4534-4e04-bb5b-bb0d3734a796    2
e7223a4d-68c6-4d32-b140-f871dec62bd3    2
410696dc-9579-4412-9c7b-55194cb1867c    2
0b9ef620-327a-4be2-8b0c-50ec0fa06788    2
e7e42658-360a-4452-8be4-60ea8d1ef0e7    2
4d13b994-be20-4392-9f0f-68709607e96b    2
e18c2562-7f73-4fd2-a361-a6eee01ed71a    2
c9b59851-e4b7-4271-ac99-f4e601f86e85    2
8bc43765-9e53-4702-ba97-13005b806126    2
a91435c7-a179-4c1d-9891-de0fe1741654    1
e41e8d8a-efa3-4626-bbfe-5af48f23b6da    1
Name: count, dtype: int64

In [45]:
# Summarize  the number of images by project and polygon for filtered (by date and cloud cover) images
img_gdf_filtered_summary = (
    img_gdf_filtered.groupby(['project_id', 'poly_id'])
    .size()
    .reset_index(name='filtered_img_count')
)

img_gdf_filtered_summary

Unnamed: 0,project_id,poly_id,filtered_img_count
0,146b6912-62a1-4b58-b027-466dc3295731,a91435c7-a179-4c1d-9891-de0fe1741654,1
1,3a860077-df4c-4e95-8fec-41520c551243,0b9ef620-327a-4be2-8b0c-50ec0fa06788,2
2,3a860077-df4c-4e95-8fec-41520c551243,1cbca6da-0024-47dc-bb3a-06f8727d1cd6,3
3,3a860077-df4c-4e95-8fec-41520c551243,212d5966-2c94-4db7-98e9-4847cfdc4215,3
4,3a860077-df4c-4e95-8fec-41520c551243,410696dc-9579-4412-9c7b-55194cb1867c,2
5,3a860077-df4c-4e95-8fec-41520c551243,4d13b994-be20-4392-9f0f-68709607e96b,2
6,3a860077-df4c-4e95-8fec-41520c551243,8bc43765-9e53-4702-ba97-13005b806126,2
7,3a860077-df4c-4e95-8fec-41520c551243,9e508b07-4534-4e04-bb5b-bb0d3734a796,2
8,3a860077-df4c-4e95-8fec-41520c551243,9e745667-0701-434a-8ecb-d917fe2bcf29,2
9,3a860077-df4c-4e95-8fec-41520c551243,c9b59851-e4b7-4271-ac99-f4e601f86e85,2


In [46]:
# Check the # of images with different cloud cover percentages in the filtered imagery
print(img_gdf_filtered.shape)
img_gdf_filtered['area:cloud_cover_percentage'].value_counts()

(30, 24)


area:cloud_cover_percentage
0.000000     19
9.133758      1
1.945866      1
13.303457     1
25.365812     1
1.288262      1
15.639313     1
42.054866     1
29.062129     1
14.068180     1
40.883526     1
38.283611     1
Name: count, dtype: int64

### Compute Coverage for Each Polygon
Input:
- poly_gdf: geodataframe of polygon metadata
- img_gdf_filtered: a filtered version of the merged geodataframe of maxar image metadata + associated polygon metadata

Output:
- csv of percent imagery coverage by project

In [None]:
# SCRATCH IGNORE
#  Empty list to hold results
results = []

for _, polygon in poly_gdf.iterrows():
    poly_id = polygon['poly_id']
    project_id = polygon['project_id']
    poly_geom = polygon['poly_geom'] # Geometry column
    poly_area = polygon['calcarea']  # Precomputed area of the polygon in hectares

    # Create a filtered GeoDataFrame that contains only images that intersect the polygon
    images = img_gdf_filtered[img_gdf_filtered["img_geom"].intersects(poly_geom)]

images
    

Unnamed: 0,title,project_id,poly_id,img_date,area:cloud_cover_percentage,eo:cloud_cover,area:avg_off_nadir_angle,view:sun_elevation,img_geom,poly_name,...,plantend,practice,targetsys,distr,numtrees,calcarea,indicators,establishmenttreespecies,reportingperiods,date_diff
225,Maxar WV02 Image 10300100DF537E00,529e1bae-2187-473f-a2a3-17e577720aba,e41e8d8a-efa3-4626-bbfe-5af48f23b6da,2022-12-16 10:34:07.495606,38.283611,35.559627,21.58851,54.672399,"POLYGON ((0.50385 7.02087, 0.34321 7.01415, 0....",OESR Feature 2,...,2023-09-29,tree-planting,natural-forest,full,68000.0,70.538672,"[{'indicatorSlug': 'restorationByStrategy', 'y...",[],"[{'dueAt': '2022-09-30T00:00:00.000Z', 'submit...",-29


In [None]:
# SCRATCH IGNORE
# Create empty dictionary to store poly_ids for each prj_id
project_polygons = {}

# Extract a list of the unique project_id values from shp_gdf   
prj_keys = list(set(poly_gdf.project_id))

for key in prj_keys:
    print('project key:', key)
    # Extract all poly_ids associated with this project_id
    poly_keys = poly_gdf[poly_gdf['project_id'] == key]['poly_id'].tolist()
    #print('poly keys:', poly_keys)

    # Store it in a dictionary
    project_polygons[key] = poly_keys

# Print to check
for key, polys in list(project_polygons.items()):
    print(f"Project ID: {key}, Polygon IDs: {polys}")

In [None]:
# SCRATCH IGNORE
for project_id, poly_list in project_polygons.items():
    print(project_id, poly_list)

In [59]:
# Vectorized option to select a best image per polygon from Rhiannon's MSU code. But can't add complex logic like combining two images in it
mins_metadata = img_gdf_filtered.loc[img_gdf_filtered.groupby('poly_id')['area:cloud_cover_percentage'].idxmin()].reset_index(drop=True)
mins_metadata.shape

(15, 24)

### Reproject Polygons & Image Footprints

In [66]:
def get_utm_crs(long, lat):
    """
    Determine the best UTM CRS based on polygon centroid location
    """
    utm_zone = int((long + 180) / 6) + 1
    hemisphere = 32600 if lat >= 0 else 32700 # Northern vs Southern hemisphere
    return f"EPSG:{hemisphere+utm_zone}"

In [91]:
# Step 1: Create a dictionary mapping project_id --> list of poly_ids
project_polygons = poly_gdf.groupby("project_id")["poly_id"].apply(list).to_dict()

# Create an empty list to store
results = []

# Step 2: Iterate through each project_id
for project_id, poly_list in project_polygons.items():
    # First, filter img_gdf_filtered by 'project_id'
    project_images = img_gdf_filtered[img_gdf_filtered['project_id'] == project_id]
    print(f"There are {len(project_images)} images in the filtered image dataset associated with Project: {project_id}")

    # Step 3: Iterate through each polygon in this project
    for poly_id in poly_list:
        print(f"Checking polygon {poly_id}...")
        # Retrieve polygon geometry
        polygon_row = poly_gdf[poly_gdf['poly_id'] == poly_id].iloc[0]
        poly_geom = polygon_row["poly_geom"]
        poly_area = polygon_row["calcarea"] # For now, using precomputed area (in hectares)
        print(f"Polygon {poly_id}'s area is {poly_area} hectares")

        # Get the centroid of the polygon to determine the UTM zone
        #poly_centroid = poly_geom.centroid
        #print(f"Polygon Centroid for Polygon {poly_id} is {poly_centroid}")
        utm_crs = get_utm_crs(poly_centroid.x, poly_centroid.y) # Determine the best UTM CRS
        print('UTM CRS:', utm_crs)

        # Reproject the polygon to its correct UTM Zone
        poly_reprojected = gpd.GeoDataFrame([polygon_row], geometry=[poly_geom], crs="EPSG:4326").to_crs(utm_crs)
        poly_geom_reprojected = poly_reprojected.geometry.iloc[0] # Extract reprojected polygon

        # Now filter 'project_images' by 'poly_id'
        poly_images = project_images[project_images['poly_id'] == poly_id]
        print(f"Before reprojecting, there are {len(poly_images)} images in the filtered image dataset for Polygon {poly_id}")
        print(f"Original CRS: {poly_images.crs}")

        # If there are no images for this polygon, append that to the results list
        if poly_images.empty:
            # No imagery --> 0% coverage
            results.append((poly_id, project_id, poly_area, 0, 0, 0))
            continue

        # Reproject the images to the same UTM CRS
        #poly_images_reprojected = gpd.GeoDataFrame(poly_images, geometry=poly_images['img_geom'], crs='EPSG:4326').to_crs(utm_crs)
        # Remove invalid or empty geometries
        poly_images = poly_images[poly_images.is_valid]
        poly_images = poly_images[~poly_images.is_empty]

        # Reset index to prevent row loss
        poly_images = poly_images.reset_index(drop=True)        
        poly_images = poly_images.to_crs(utm_crs)
        print(f"After reprojecting, there are {len(images_reprojected)} reprojected images for Polygon {poly_id}")
        print(f"Reprojected CRS: {poly_images.crs}")

        # Step 4: Select best image(s) (lowest cloud cover OR union all images)
        sorted_images = poly_images_reprojected.sort_values(by='area:cloud_cover_percentage') # Sort by lowest cloud cover percentage
        best_image = sorted_images.iloc[0] # Take the 1 best image (for now)
        best_image_geom = best_image.geometry
        print(f"The best image for Polygon {poly_id} is {best_image['title']}")

        # Step 5: Compute actual coverage of polygon using the best image
        covered_area = best_image_geom.intersection(poly_geom_reprojected).area # In square meters (really?)
        convered_area_ha = covered_area / 10000 # Convert to hectares
        print(f"The calculated covered area for polygon {poly_id} is {convered_area_ha}")

There are 1 images in the filtered image dataset associated with Project: 146b6912-62a1-4b58-b027-466dc3295731
Checking polygon a91435c7-a179-4c1d-9891-de0fe1741654...
Polygon a91435c7-a179-4c1d-9891-de0fe1741654's area is 70.061968996093 hectares
UTM CRS: EPSG:32631
Before reprojecting, there are 1 images in the filtered image dataset for Polygon a91435c7-a179-4c1d-9891-de0fe1741654
Original CRS: EPSG:4326
After reprojecting, there are 1 reprojected images for Polygon a91435c7-a179-4c1d-9891-de0fe1741654
Reprojected CRS: EPSG:32631
The best image for Polygon a91435c7-a179-4c1d-9891-de0fe1741654 is Maxar WV02 Image 10300100DF537E00
The calculated covered area for polygon a91435c7-a179-4c1d-9891-de0fe1741654 is 0.0
There are 28 images in the filtered image dataset associated with Project: 3a860077-df4c-4e95-8fec-41520c551243
Checking polygon 410696dc-9579-4412-9c7b-55194cb1867c...
Polygon 410696dc-9579-4412-9c7b-55194cb1867c's area is 11.174912445761 hectares
UTM CRS: EPSG:32631
Before 