In [1]:
import os
import glob
from datetime import datetime
from geogapfiller import gapfiller
from geogapfiller import imgpredictor
from geogapfiller import utils


## 1. Gap-filling methods to reconstruct geospatial data
This code implements four different gap-filling methods to reconstruct geospatial data. The methods applied are Polynomial, Median, Harmonic, and LightGBM. Gap-filling is crucial for reconstructing data, as clouds, shadows, and other atmospheric conditions often affect the quality of the images. Description of the methods:

   - *Median approach*: The median is often favored over other statistical measures, such as the mean, because it is less affected by outliers that may result from atmospheric disturbances or sensor errors. This approach selects the median value from the cloud-free pixels in the time series, offering a straightforward solution. However, it does not account for the broader trends or seasonal variations in the data, which may limit its effectiveness in capturing long-term patterns.

   - *Polynomial approach*: The polynomial regression gap-filling approach models the relationship between a dependent variable and one or more independent variables. Higher-degree polynomials can represent more complex relationships, allowing for better data reconstruction. However, as the polynomial degree increases, the model becomes harder to interpret and requires more processing time.

    - *Harmonic approach*: The harmonic gap-filling approach leverages a Fourier-like series, using a combination of sine and cosine functions to estimate missing data. This method is widely applied in remote sensing because of its strength in capturing periodic and seasonal variations. Harmonic models tend to excel when data gaps are evenly distributed, as they can smoothly interpolate across time. However, when gaps are uneven or concentrated in specific periods, the model’s accuracy may decline, resulting in less reliable gap-filling.

    - *LightGBM approach*: The LightGBM gap-filling approach utilizes a tree-based learning algorithm to model relationships in the data. Unlike harmonic models, which assume periodicity, LightGBM does not rely on any inherent patterns and instead learns from the provided training data. It is known for its efficiency and produces results comparable to the Gradient Boosting Machine. LightGBM excels at capturing complex, nonlinear relationships, making it especially useful when EVI data deviates from a simple sinusoidal pattern.


This code can be used to fill gaps in any geospatial dataset, provided there is a stack of raster images available. The models are calibrated using data from a 15-day window—both before and after the target date for each pixel. For edge pixels, the first pixels are filled using the next available 15 days of data, while the last pixels are filled using the preceding 15 days.

To ensure accurate gap-filling, the user must extract the dates from the raster images correctly. In this code, dates are formatted as "YYYYMMDD" (e.g., "20230602"), and the following code snippet is used to extract these dates for filling gaps in the images.

In [2]:

# Function to extract the image dates and create a list of rasters
def img_pattern(img_path, pattern:str)-> tuple:
    """
    Function to extract the image dates and create a list of rasters
    Args:
        img_path (str): Path to the images
        pattern (str): Pattern to search for the images in the folder
    Returns: list of images and dates in the format of datetime"""

    # Get a list of all raster files matching the pattern
    img_list = glob.glob(os.path.join(img_path, '**', f"*{pattern}.tif"), recursive=True)
    # Create an empty list to store the dates
    img_dates = []
    for img in img_list:
        img_date = os.path.basename(img).split("_")[0]  # Extract the date part from the filename
        convert_date = datetime.strptime(img_date, '%Y%m%d')
        img_dates.append(convert_date)  # Store both the file and the formatted date

    return img_list, img_dates


In [3]:
# Define the path to locate the images
img_path = '/geogapfiller/data'
output_path = '/geogapfiller'

# Extract the image dates and create a list of rasters
img_list, img_dates = img_pattern(img_path, pattern="NIR")

In [4]:
# Choose the method to fill the gaps (e.g., 'median', 'polynomial', 'harmonic', 'lightgbm')
filler = gapfiller.get_method("median")
# If the user wants to modify the parameters of the method, it can be done as follows:
#filler = gapfiller.MedianFiller(window_size=15) # Median
#filler = gapfiller.PolynomialFiller(poly_degree=2, window_size=15) # Polynomial
# filler = gapfiller.HarmonicFiller(window_size=15) # Harmonic
# filler = gapfiller.LightGBMFiller(window_size=15, n_estimators=50, random_state=0)

# Run the method to fill the gaps in the images
raster_filled = gapfiller.run_method(filler,img_list, img_dates)


In [5]:
# Export the filled rater to drive
utils.export_raster(output_path,img_list, raster_filled, img_dates, method= "median", pattern= 'NIR')

## 2. Create a synthetic image time series
This code generates a synthetic image time series to obtain the images in a scale in a desired time interval (e.g. 1 - 3 days). The method keeps the original image values and creates new images in the desired interval by using predicting models. It is possible to select four techniques to generate the synthetic images: Linear, Polynomial, Harmonic, and LightGBM.

In [6]:
# Choose the method to predict the images (e.g., 'median', 'polynomial', 'harmonic', 'lightgbm')
filler = gapfiller.get_method("median")
# Run the method to predict the images, change the interval to the desired time interval
raster_predicted, dates_ranges = imgpredictor.run_method(filler, img_list, img_dates, interval=1)

In [7]:
utils.export_raster(output_path, img_list, raster_predicted, dates_ranges, method='median_predicted', pattern='NIR')