# Maxar Image Availability Analysis

The Maxar image availability workflow takes as input a list of TerraFund project ids and returns as output a csv listing every project and how much of that project’s area has Maxar imagery coverage.

#### Workflow:
1. Pull info on project characteristics for the entire portfolio using the TerraMatch API
    - Repo/notebook: terrafund-portfolio-analysis/tm-api.ipynb
    - Input: list of TerraFund project IDs
    - Output: csv of all project features
2. Using the TM API csv, pull Maxar metadata
    - Repo/notebook: maxar-tools/decision-tree-metadata.ipynb and maxar-tools/src/decision_tree.py (? may need to change b/c of my additions to the acquire_metadata function)
    - Input: csv of project features
    - Output: csv of maxar metadata
3. Calculate the percent area of each project with available Maxar imagery
    - Repo/notebook: terrafund-portfolio-analysis/maxar-img-avail.ipynb and terrafund-portfolio-analysis/src/image_coverage.py
    - Input: csv of maxar metadata and csv of TM project features
    - Output: csv of project features and percent imagery coverage, csv of percent imagery coverage aggregated to project level, csv of polygons with low imagery coverage
4. Identify projects with highest imagery coverage to use for the RS image availability simulation

In [1]:
import pandas as pd
import geopandas as gpd
import sys
sys.path.append('../src/')
import image_coverage as img_cover
from datetime import datetime

### Parameters

In [2]:
# File paths
feats = '../data/tm_api_TEST.csv'                                                               # Polygon metadata & geometries from TM API
maxar_md = '../data/imagery_availability/comb_img_availability_2025-02-26.csv'                  # Metadata for Maxar images corresponding to polygons
results_path = '../data/results/'                                                               # File path to save results to

# Define filtering thesholds (stored in a dictionary)
filters = {
    'cloud_cover': 50,          # Remove images with >50% cloud cover
    'off_nadir': 30,            # Remove images with >30° off-nadir angle
    'sun_elevation': 30,        # Keep only images where sun elevation >30°
    'date_range': (-366, 0),    # Date range of 1 year before plantstart
    'img_count': 1,             # Threshold for identifying image availability (REASSESS)
}

### Calculate Image Availability by Project

In [11]:
### 1. LOAD POLYGON AND IMAGE DATA ###
poly_df = pd.read_csv(feats)
img_df = pd.read_csv(maxar_md)

### 2. PREPROCESS POLYGON AND IMAGE DATA ###
poly_gdf = img_cover.preprocess_polygons(poly_df, debug=True)
img_gdf = img_cover.preprocess_images(img_df, debug=True)

### 3. MERGE POLYGON METADATA INTO IMAGE DATA ###
merged_gdf, missing_polygons_list = img_cover.merge_polygons_images(img_gdf, poly_gdf, debug=True)

### 4. FILTER IMAGES ###
img_gdf_filtered = img_cover.filter_images(merged_gdf, filters, debug=True)

### 5. COMPUTE POLYGON-LEVEL IMAGERY COVERAGE ###
# Initialize storage for results & low-coverage polygons list
low_img_coverage_log = []
results = []

# Iterate through all polygons and compute imagery coverage per polygon
for poly_id, project_id in zip(poly_gdf['poly_id'], poly_gdf['project_id']):
    result = img_cover.compute_polygon_image_coverage(poly_id, project_id, poly_gdf, img_gdf_filtered, low_img_coverage_log)
    results.append(result)

# Convert the results to a DataFrame
results_df = pd.DataFrame(results, columns=['poly_id', 'project_id', 'best_image', 'num_images',
                                            'poly_area_ha', 'overlap_area_ha', 'percent_img_cover'])
results_df['best_image'] = results_df['best_image'].fillna("None")

# Convert low-coverage log to DataFrame
#low_coverage_polygons_df = pd.DataFrame(low_img_coverage_log)

### 6. AGGREGATE TO PROJECT-LEVEL COVERAGE ###
project_results_df = img_cover.aggregate_project_image_coverage(results_df, debug=True)

### 7. SAVE RESULTS ###
today = datetime.today().strftime('%Y-%m-%d')

# Percent imagery coverage by polygon
results_df.to_csv(f"{results_path}polygon_imagery_coverage_{today}.csv", index=False)

# Percent imagery coverage by project
project_results_df.to_csv(f"{results_path}project_imagery_coverage_{today}.csv", index=False)

# Polygons with low imagery coverage
if low_img_coverage_log:
    low_coverage_polygons_df = pd.DataFrame(low_img_coverage_log)
    print(f"Logging low image coverage polygons to {results_path}.")
    low_coverage_polygons_df['best_image'] = low_coverage_polygons_df['best_image'].fillna("None")
    low_coverage_polygons_df.to_csv(f"{results_path}low_coverage_polygons_{today}.csv", index=False)

print(f"Imagery coverage results saved to {results_path}!")

There are 16 unique polygons for 3 projects in this dataset.
There are 229 images for 16 polygons in 3 projects in this dataset.
Total images in img_gdf: 229
Total polygons in poly_gdf: 16
Total rows in merged dataset: 229
Unique polygons in merged dataset: 16
There 0 polygons without images in the merged dataset
Polygons without images (dropped at this stage): []
Total images before filtering: 229
Total images after filtering: 30
Polygons with at least one valid image: 15
Computing coverage for polygon a91435c7-a179-4c1d-9891-de0fe1741654
Computing coverage for polygon 410696dc-9579-4412-9c7b-55194cb1867c
Computing coverage for polygon f6871a61-a766-451a-be90-086219616cef
Computing coverage for polygon 9e745667-0701-434a-8ecb-d917fe2bcf29
Computing coverage for polygon 9e508b07-4534-4e04-bb5b-bb0d3734a796
Computing coverage for polygon 1cbca6da-0024-47dc-bb3a-06f8727d1cd6
Computing coverage for polygon e7223a4d-68c6-4d32-b140-f871dec62bd3
Computing coverage for polygon 0b9ef620-327a-4