# Maxar Image Availability Analysis

The Maxar image availability workflow takes as input a list of TerraFund project ids and returns as output a csv listing every project and how much of that projectâ€™s area has Maxar imagery coverage.

#### Workflow:
1. Pull info on project characteristics for the entire portfolio using the TerraMatch API
    - Repo/notebook: terrafund-portfolio-analysis/tm-api.ipynb
    - Input: list of TerraFund project IDs
    - Output: csv of all project features
2. Using the TM API csv, pull Maxar metadata
    - Repo/notebook: maxar-tools/decision-tree-metadata.ipynb and maxar-tools/src/decision_tree.py (? may need to change b/c of my additions to the acquire_metadata function)
    - Input: csv of project features
    - Output: csv of maxar metadata
3. Calculate the percent area of each project with available Maxar imagery
    - Repo/notebook: terrafund-portfolio-analysis/maxar-img-avail.ipynb and terrafund-portfolio-analysis/src/image_coverage.py
    - Input: csv of maxar metadata and csv of TM project features
    - Output: csv of project features and percent imagery coverage, csv of percent imagery coverage aggregated to project level, csv of polygons with low imagery coverage
4. Identify projects with highest imagery coverage to use for the RS image availability simulation

In [1]:
import pandas as pd
import geopandas as gpd
from tqdm import tqdm
import sys
sys.path.append('../src/')
import image_coverage as img_cover
import analyze_img_coverage as analyze
from datetime import datetime

### Parameters

In [4]:
# Naming convention
run_name = 'ppc_2025_tree_count_elig_round2'
run_dir = 'ppc_tree_count_elig'
analysis = 'baseline' # must change if you change the date_range

# Today's date
today = datetime.today().strftime('%Y-%m-%d')
#today = '2025-04-02'

# File paths
feats = f'../data/{run_dir}/tm_api_{run_name}_{today}.csv' # CSV of polygon metadata & geometries from TM API (infile)
maxar_md = f'../data/{run_dir}/imagery_availability/comb_img_availability_{run_name}_{today}.csv' # CSV of metadata for Maxar images corresponding to polygons (infile)
dropped_poly_path = f'../data/{run_dir}/dropped_poly_invalid_geom_{run_name}_{today}.csv'
results_path = f'../data/{run_dir}/results/{analysis}/' # File path to save results to

# Define filtering thesholds (stored in a dictionary)
filters = {
    'cloud_cover': 50,           # Remove images with >50% cloud cover
    'off_nadir': 30,             # Remove images with >30Â° off-nadir angle
    'sun_elevation': 30,         # Keep only images where sun elevation >30Â°
    #'date_range': (-366, 0),    # Date range of 1 year before plantstart (TerraFund baseline)
    'date_range': (-366, 90),   # Date range of 1 year before plantstart through 3 months after (PPC baseline)
    #'date_range': (730, 9999),  # Date range of 2 years post-plantstart through today (upper bound of maxar_md dataset is today's date) (early verification)
    #'date_range': (-151, 213),  # Custom all of 2022 with plantstart June 1 2022 (Rwanda, Mozambique Lidar)
    #'date_range': (579, 883),   # Custom May - Oct 2024 with plantstart June 1 2022 (Kenya lidar)
    #'date_range': (-59, 305),   # Custom all of 2023 with plantstart March 1 2023 (GEDI Landscapes & Global Lidar)
    'img_count': 1,             # Threshold for identifying image availability (REASSESS)
}

### Calculate Image Availability by Project

In [5]:
### 1. LOAD POLYGON AND IMAGE DATA ###
poly_df = pd.read_csv(feats)
img_df = pd.read_csv(maxar_md)

In [6]:
### 2.1. PREPROCESS POLYGON DATA ###
poly_gdf = img_cover.preprocess_polygons(poly_df, debug=False, save_dropped=True, dropped_output_path=dropped_poly_path)

Processing polygon data...
Cleaning geometries...

ðŸ§¾ Geometry Cleaning Summary:
  âž¤ Total geometries processed: 40
  âž¤ Invalid geometries:         0
  âž¤ Repaired with buffer(0):    0
  âž¤ Dropped:                    0
  âœ… Final valid polygons:       40



In [7]:
### 2.2. PREPROCESS IMAGE DATA ###
img_gdf = img_cover.preprocess_images(img_df, debug=True)

Processing Maxar image data...
There are 1016 images for 39 polygons in 4 projects in this dataset.


In [8]:
### 3. MERGE POLYGON METADATA INTO IMAGE DATA ###
merged_gdf, missing_polygons_list = img_cover.merge_polygons_images(img_gdf, poly_gdf, debug=True)

Merging polygon metadata into image data...
Total images in img_gdf: 1016
Total polygons in poly_gdf: 40
Number of polygons removed from merged dataset due to invalid (unfixable) geometries: 0
Number of rows removed from image dataset because their polygons had invalid (unfixable) geometries: 0
Total rows in merged dataset: 1016
Unique polygons in merged dataset: 39
1 polygons were dropped from the merged dataset because they have no Maxar images
Polygons without images (dropped at this stage): [('aa8c2646-d4f0-4815-b4d3-bfd4fe22bc15', '02b3119e-9505-4dba-b58d-f2a967b71ef9')]


In [9]:
### 4. FILTER IMAGES ###
img_gdf_filtered = img_cover.filter_images(merged_gdf, filters, debug=True)

Total images before filtering: 1016
Total images after filtering: 193
Polygons with at least one valid filtered image: 25


In [11]:
### 5. COMPUTE POLYGON-LEVEL IMAGERY COVERAGE ###
# Initialize storage for results & low-coverage polygons list
low_img_coverage_log = []
results = []

# Iterate through all polygons and compute imagery coverage per polygon
for poly_id, project_id in zip(poly_gdf['poly_id'], poly_gdf['project_id']):
    result = img_cover.compute_polygon_image_coverage(poly_id, project_id, poly_gdf, img_gdf_filtered, low_img_coverage_log)
    results.append(result)

# Convert the results to a DataFrame
results_df = pd.DataFrame(results, columns=['poly_id', 'project_id', 'best_image', 'img_date', 'num_images',
                                            'poly_area_ha', 'overlap_area_ha', 'percent_img_cover'])
results_df['best_image'] = results_df['best_image'].fillna("None")

Computing coverage for polygon 08437fef-2f9e-4097-bf23-5b279092ec7a
Logging low covarage for polygon 08437fef-2f9e-4097-bf23-5b279092ec7a because there is no available imagery
Computing coverage for polygon a803f994-1d2c-4249-84c8-07e616cf1a9e
Logging low covarage for polygon a803f994-1d2c-4249-84c8-07e616cf1a9e because there is no available imagery
Computing coverage for polygon 91afb31c-1f6f-476f-87cd-c603475aa499
Found best image: img_id                                                          10300100D3A73F00
title                                          Maxar WV02 Image 10300100D3A73F00
project_id                                  02b3119e-9505-4dba-b58d-f2a967b71ef9
poly_id                                     91afb31c-1f6f-476f-87cd-c603475aa499
img_date                                              2022-06-04 13:49:03.568383
area:cloud_cover_percentage                                                  0.0
eo:cloud_cover                                                              

In [12]:
### 6. AGGREGATE TO PROJECT-LEVEL COVERAGE ###
project_results_df = img_cover.aggregate_project_image_coverage(results_df, debug=True)

There are 4 projects being analyzed.


In [13]:
### 7. SAVE RESULTS ###
# Percent imagery coverage by polygon
results_df.to_csv(f"{results_path}polygon_imagery_coverage_{run_name}_{analysis}_{today}.csv", index=False)

# Percent imagery coverage by project
project_results_df.to_csv(f"{results_path}project_imagery_coverage_{run_name}_{analysis}_{today}.csv", index=False)

# Polygons with low imagery coverage
if low_img_coverage_log:
    low_coverage_polygons_df = pd.DataFrame(low_img_coverage_log)
    print(f"Logging low image coverage polygons to {results_path}.")
    low_coverage_polygons_df['best_image'] = low_coverage_polygons_df['best_image'].fillna("None")
    low_coverage_polygons_df.to_csv(f"{results_path}low_coverage_polygons_{run_name}_{analysis}_{today}.csv", index=False)

print(f"Imagery coverage results saved to {results_path}")

Logging low image coverage polygons to ../data/ppc_tree_count_elig/results/baseline/.
Imagery coverage results saved to ../data/ppc_tree_count_elig/results/baseline/


## Analyze Maxar Image Availability

In [14]:
# Read in files
# Image availability by project
project_img_avail = pd.read_csv(f"{results_path}project_imagery_coverage_{run_name}_{analysis}_{today}.csv")

# Image availability by polygon
poly_img_avail = pd.read_csv(f"{results_path}polygon_imagery_coverage_{run_name}_{analysis}_{today}.csv")

# Low coverage polygons
low_coverage_poly = pd.read_csv(f"{results_path}low_coverage_polygons_{run_name}_{analysis}_{today}.csv")

In [None]:
# Overall distribution of image availability
analyze.img_avail_hist(project_img_avail)

In [None]:
# High image availability projects
qualifying_projects_list = analyze.count_projs_wi_img_avail(project_img_avail, 90)

In [None]:
analyze.analyze_low_coverage_issues(low_coverage_poly)

In [None]:
high_cov = project_img_avail[(project_img_avail['total_percent_area_covered'] > 90) & (project_img_avail['total_percent_area_covered'] <= 101)]
print(len(high_cov))
high_cov.sort_values('total_percent_area_covered', ascending=False)

### For PPC, calculate image availability by task (project_id + plantstart_year)

In [22]:
# Merge the 'plantstart' and 'plantstart_year' columns from poly_gdf into poly_img_avail (dataset of each polygon with associated best Maxar image)
poly_img_avail_wi_yrs = poly_img_avail.merge(poly_gdf[['poly_id', 'plantstart']], how='left', on='poly_id')
poly_img_avail_wi_yrs['plantstart_year'] = pd.to_datetime(poly_img_avail_wi_yrs['plantstart'], errors='coerce').dt.year

In [24]:
task_results_df = img_cover.aggregate_project_image_coverage_ppc(poly_img_avail_wi_yrs)

In [25]:
# Save percent imagery coverage by task as dataframe
# Percent imagery coverage by polygon
task_results_df.to_csv(f"{results_path}task_imagery_coverage_{run_name}_{analysis}_{today}.csv", index=False)