<a href="https://colab.research.google.com/github/trchudley/GEOG2462/blob/main/Short_Scripts/Week_1_Filter_by_AOI_coverage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Filtering by the proportional coverage of your AOI

The default scripts included within the GitHub leave a little something to be desired in some spatial queries. By default, the code searches as follows:

1. Search for all Landsat scenes _intersecting_ the AOI.
2. Filter to only those scenes between the start and end dates.
3. Pick the least cloudy image.

The problem comes with the word 'intersecting'. This will select all images that even slightly cover the AOI, even if they only brush it. Therefore, for some search regions which cover a zone between two different scenes, it might be desireable to include an additional filter that removes images that don't adequately cover your AOI.

This script shows how this can be done - you can paste the code into your own scripts to apply it.

## Logging in to Google Earth Engine

Ensure the project name is your own, created upon registration with GEE. You can easily register one at the [following link](https://code.earthengine.google.com/register) - just make sure to select `Unpaid Usage` > `Academia & Research`).

In [2]:
import ee
import geemap
import time

ee.Authenticate()  # Trigger the authentication flow.
ee.Initialize(project='ee-trchudley')    # Change to your own default project name.

## Define editable variables

This is the only cell you will need to edit in this notebook.

In [73]:

# Define search parameters
latitude = 70.405   # Degrees of latitude
longitude = -50.519  # Degrees of longitude
size = 10000  # Size of AOI, in metres
region_name = 'store_glacier_greenland'  # AOI name, for filename construction

# Define search range, within which the least cloudy image will be found
date_start = '2023-05-01'
date_end = '2023-09-30'

# Google Drive export folder
folder = 'scires_project_2A'


# Searching for data without the filter (an example of what's going wrong)

Let's visualise what's going wrong with this example:

In [74]:
# Get search region geometry
point = ee.Geometry.Point(longitude, latitude)  # Create a point
region = point.buffer(size/2).bounds()  # Buffer the point to a 2D shape

# Get Landsat 8 image collection
landsat8_collection = ee.ImageCollection("LANDSAT/LC08/C02/T1_TOA")

# Filter to desired region and date bounds
landsat8_collection = landsat8_collection.filterBounds(region)
landsat8_collection = landsat8_collection.filterDate(date_start, date_end)

print('Number of images after filtering:', landsat8_collection.size().getInfo())

# Get least cloudy image and clip to search region
image = landsat8_collection.sort('CLOUD_COVER').first()
image = image.clip(region)


Number of images after filtering: 49


In [None]:
image

When we visualise the image, we can see that the 'ideal' (least cloudy) image doesn't properly cover our AOI!

In [None]:
Map = geemap.Map()  # Create empty map

max_reflectance = 1.00 # Set the upper limit of reflectance to visualise.
                       # Play with this value (between 0-1) to see what it
                       # does. It will need to be higher for snowy/icy
                       # scenes

visParams = {'bands': ['B4', 'B3', 'B2'], 'max': max_reflectance}
Map.addLayer(region, {}, "Search Region")  # Add our AOI
Map.addLayer(image, visParams, 'Colour Composite Image')

Map.centerObject(region, zoom=11)
Map

## Employing a simple presence/absence filter to fix this.

A relatively simple function can fix this for us. It looks slightly complicated, but you don't need to understand it, just paste it in to the relevant parts of your code. Basically, it counts the proportion of pixels within the final image that are not "`NaN`" (programming shorthand for 'Not a Number', indicating an invalid value in a collection of what are otherwise expected to be numbers). It then multiplies this by 30<sup>2</sup> - because our images are 30 m resolution, so the pixel count $\times$ 30 $\times$ 30 will give us the area of the valid pixels. Finally, it calculates the proportion of valid pixels from this value and the area of the geometry, which is retrievable via a simple function.

In [77]:
# Function to calculate the proportion of valid pixels in an image
def calculate_coverage(image, region=region, band='B4', resolution=30):

    # Get the image's count of valid pixels within the geometry
    valid_pixel_count = image.updateMask(image.mask()).reduceRegion(
        reducer=ee.Reducer.count(),
        geometry=region,
        scale=resolution,  # Use the resolution suitable for your dataset
        maxPixels=1e13
    ).values().get(0)  # Dynamically access the first band's valid pixel count

    # Calculate the valid area and total area, then work out the propertion.
    valid_area = ee.Number(valid_pixel_count).multiply(ee.Number(resolution).pow(2))
    total_area = region.area(1)
    proportion = valid_area.divide(total_area)

    # Add the proportion as metadata to the image
    return image.set('valid_proportion', proportion)

This function creates a new property of the image called `valid_proportion`. Hence, we can apply this function to the imageCollection, and filter based this value. Look at the new code we introduce in the image filtering code here, where we filter to a `min_coverage` fraction of `0.6` (60%):

In [78]:
# Get search region geometry
point = ee.Geometry.Point(longitude, latitude)  # Create a point
region = point.buffer(size/2).bounds()  # Buffer the point to a 2D shape

# Get Landsat 8 image collection
landsat8_collection = ee.ImageCollection("LANDSAT/LC08/C02/T1_TOA")

# Filter to desired region and date bounds
landsat8_collection = landsat8_collection.filterBounds(region)
landsat8_collection = landsat8_collection.filterDate(date_start, date_end)

# --------------------------------------------------------------------------- #
# NEW CODE: FILTER TO ONLY IMAGES COVERING AOI BY A CERTAIN THRESHOLD

# Set your minimum coverage threshold
min_coverage = 0.6

# Apply the function to the image collection
landsat8_collection = landsat8_collection.map(calculate_coverage)

# Filter the image collection based on a minimum proportion of valid pixels.
# NB the `gte()` function: gte = greater than or equal to
landsat8_collection = landsat8_collection.filter(
    ee.Filter.gte('valid_proportion', min_coverage)
)

# END OF NEW CODE
# --------------------------------------------------------------------------- #

print('Number of images after filtering:', landsat8_collection.size().getInfo())

# Get least cloudy image and clip to search region
image = landsat8_collection.sort('CLOUD_COVER').first()
image = image.clip(region)

Number of images after filtering: 44


Look, we have fewer images this time around! Does this give us a different image?

In [None]:
image

Yes, it does. We can see what `valid_proportion` we have as well:

In [79]:
image.get('valid_proportion').getInfo()

1.006490253882831

Strangely, our `valid_proportion` is slightly greater than 1. This isn't too surprising: it is likely some combination of a [floating point error](https://docs.python.org/3/tutorial/floatingpoint.html) or the fact that our `clip` function is inclusive (meaning it includes pixels that intersect the geometry even if it isn't fully inside the bounds). Either way, it looks like we have a complete image... Let's test:

In [None]:
Map = geemap.Map()  # Create empty map

max_reflectance = 1.00 # Set the upper limit of reflectance to visualise.
                       # Play with this value (between 0-1) to see what it
                       # does. It will need to be higher for snowy/icy
                       # scenes

visParams = {'bands': ['B4', 'B3', 'B2'], 'max': max_reflectance}
Map.addLayer(region, {}, "Search Region")  # Add our AOI
Map.addLayer(image, visParams, 'Colour Composite Image')

Map.centerObject(region, zoom=11)
Map

Brilliant! We have a complete image. You can use this code within other scripts by copy and pasting the `calcualte_coverage` function and the new code block within the imageCollection filtering script.

As an aside, it's notable that our new image is clearly taken at sunset, which might not be ideal for NDI operations. For now, I will leave additional time filtering operations as an exercise for you (perhaps in collaboration with ChatGPT or similar...).

# Download the image

In [None]:
# Get the data of the image from the metadata
date_string = image.get('DATE_ACQUIRED').getInfo()

# Now we will construct the filename automatically
filename = region_name + '_' + date_string + '_image'

# Visualise for testing
print("The image will be saved to your Google Drive at:\n" + folder + '/' + filename + '.tif')

# Export the image, specifying scale and region.
task = ee.batch.Export.image.toDrive(**{
    'image': image.select(['B4', 'B3', 'B2', 'B5', 'B6']),
    'description': filename,
    'folder': folder,
    'scale': 30,
    'region': region.getInfo()['coordinates']
})
task.start()

while task.active():
  print('Task processing ongoing... (id: {}).'.format(task.id))
  time.sleep(5)

print('Finished processing. Image is exported to your Drive.')