# Overview

This Jupyter notebook downloads and preprocesses Sentinel 1 and 2 tiles for large areas (at least 40 sq km). The workflow entails generating the tile coordinates, downloading the raw data, and processing (cloud and shadow removal, gap interpolation, indices, and superresolution).

The notebook is broken down into the following sections:

   * **Parameter definition**:
   * **Projection functions**
   * **Data download functions**
   * **Cloud and shadow removal functions**
   * **Superresoluttion functions**
   * **Tile and folder management functions**
   * **Function execution**

If you are planning to download new Sentinel data, you need to have an API key to use the data provider [Sentinel Hub](https://www.sentinel-hub.com). If you do not have an API key but have access to sentinel imagery, the input data for this notebook is an entire year of:
  * Cloud masks
  * L1C bands 2, 8A, 11
  * 10- and 20m L2A bands
  * VV-VH Sentinel 1 bands
  * Digital elevation model
  
  
The data are tiled into 6300m x 6300m windows. An example of the raw data can be downloaded by running the following cell. This data can be preprocessed (cloud interpolation, super resolution, smoothing, etcetera) by running the rest of the notebook. It can then also be predicted by running `4b-predict-large-area`.

In [1]:
# If using example raw data
import os
if not os.path.exists("../data/example/raw"):
    os.makedirs("../data/example/raw/")
    
landscape = 'example'
OUTPUT_FOLDER = '../data/{}/'.format(landscape)
coords = (13.727334, -90.015579)
coords = (coords[1], coords[0])

In [2]:
# Download example raw data - only if you don't have an API key!
!curl https://restoration-monitoring-external.s3.amazonaws.com/restoration-mapper/example/example.zip \
    -o ../data/example/raw/data.zip
!unzip ../data/example/raw/data.zip -d ../data/example/raw/

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  374M  100  374M    0     0  9308k      0  0:00:41  0:00:41 --:--:-- 14.2M0     0  6786k      0  0:00:56  0:00:23  0:00:33 10.8M  0  7110k      0  0:00:53  0:00:24  0:00:29 13.2M  0  8388k      0  0:00:45  0:00:29  0:00:16 14.2M    0  8621k      0  0:00:44  0:00:36  0:00:08 9178k
Archive:  ../data/example/raw/data.zip
   creating: ../data/example/raw/misc/
  inflating: ../data/example/raw/misc/dem_0_0.hkl  
  inflating: ../data/example/raw/misc/s1_dates_0_0.hkl  
  inflating: ../data/example/raw/misc/s2_dates_0_0.hkl  
   creating: ../data/example/raw/s2/
  inflating: ../data/example/raw/s2/0_0.hkl  
   creating: ../data/example/raw/clouds/
  inflating: ../data/example/raw/clouds/shadows_0_0.hkl  
  inflating: ../data/example/raw/clouds/clouds_0_0.hkl  
  inflating: ../data/example/raw/clouds/clean_steps_0_0.hkl  
   creating:

# 1.0 Package Imports

In [1]:
import pandas as pd
import numpy as np
from random import shuffle
from osgeo import ogr, osr
from sentinelhub import WmsRequest, WcsRequest, MimeType, CRS, BBox, constants
import logging
from collections import Counter
import datetime
import os
import yaml
from sentinelhub import DataSource
import scipy.sparse as sparse
import scipy
from scipy.sparse.linalg import splu
from skimage.transform import resize
from sentinelhub import CustomUrlParam
from time import time as timer
from time import sleep as sleep
import multiprocessing
import math
import reverse_geocoder as rg
import pycountry
import pycountry_convert as pc
import hickle as hkl
from shapely.geometry import Point, Polygon
import geopandas
from tqdm import tnrange, tqdm_notebook
import math
import boto3
from pyproj import Proj, transform
from timeit import default_timer as timer
from typing import Tuple, List
import warnings

In [2]:
if os.path.exists("../config.yaml"):
    with open("../config.yaml", 'r') as stream:
        key = (yaml.safe_load(stream))
        API_KEY = key['key']
        AWSKEY = key['awskey']
        AWSSECRET = key['awssecret']
else:
    API_KEY = "none"

In [3]:
%run ../src/preprocessing/slope.py
%run ../src/preprocessing/indices.py
%run ../src/downloading/utils.py
%run ../src/preprocessing/cloud_removal.py
%run ../src/preprocessing/whittaker_smoother.py
%run ../src/dsen2/utils/DSen2Net.py

Using TensorFlow backend.


# 1.1 Constants and Parameters

Currently the only years that can be downloaded from Sentinel Hub are 2018 and 2019. 2017 has an ETA of Summer 2020.

The `landscapes` dictionary has a key, value convention of the landscape name, and a `(lat, long)` tuple

In [27]:
year = 2017
if year > 2017:
    dates = ('{}-12-01'.format(str(year - 1)) , '{}-02-01'.format(str(year + 1)))
else: 
    dates = ('{}-01-01'.format(str(year)) , '{}-02-01'.format(str(year + 1)))
dates_sentinel_1 = ('{}-01-01'.format(str(year)) , '{}-12-31'.format(str(year)))
SIZE = 9*5
IMSIZE = (7*2) + (SIZE * 14)+2

days_per_month = [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30]
starting_days = np.cumsum(days_per_month)

In [28]:
landscapes = {
    'ethiopia-tigray': (13.540810, 38.177220),
    'kenya-makueni-2': (-1.817109, 37.44563),
    'ghana': (9.259359, -0.83375),
    'niger-koure': (13.18158, 2.478),
    'cameroon-farnorth': (10.596, 14.2722),
    'mexico-campeche': (18.232495, -92.1234215),
    'malawi-rumphi-old': (-11.044, 33.818),
    'malawi-rumphi': (-11.15, 33.246),
    'ghana-sisala-east': (10.385, -1.765),
    'ghana-west-mamprusi': (10.390084, -0.846330),
    'ghana-kwahu': (6.518909, -0.826008),
    'senegal-16b': (15.82585, -15.34166),
    'india-kochi': (9.909, 76.254),
    'india-sidhi': (24.0705, 81.607),
    'brazil-esperito-santo': (-20.147, -40.837),
    'brazil-paraiba': (-22.559943, -44.186629),
    'brazil-goias': (-14.905595, -48.907399),
    'colombia-talima': (4.179529, -74.889171),
    'drc-kafubu': (-11.749636, 27.586622),
    'thailand-khon-kaen': (15.709725, 102.546518),
    'indonesia-west-java': (-6.721101, 108.280949),
    'madagascar': (-18.960152, 47.469587),
    'tanzania': (-6.272258, 36.679824),
    'chile': (-36.431237, -71.872030),
    'indonesia-jakarta': (-6.352580, 106.677072),
    'caf-baboua': (5.765917, 14.791618),   
    'honduras': (14.096664, -88.720304),
    'nicaragua': (12.398014, -86.963042),
    'china': (26.673679, 107.464231),
    'australia-west': (-32.666762, 117.411197),
    'mexico-sonora': (29.244288, -111.243230),
    'south-africa': (-30.981698, 28.727301),
    'maldonado-uraguay': (-34.629250, -55.004331),
    'dominican-rep-la-salvia': (18.872589, -70.462961),
    'guatemala-coban': (15.3, -90.8),
    'senegal-tucker-a': (15.350595, -15.459789),
    'elsalvador-imposible': (13.727334, -90.015579),
    'peru-shatoja-district': (-6.566366, -76.759752),
    'angola-galanga': (-12.104782, 15.151222),
    'morocco-chefchaouen': (34.942560, -4.772589),
    'georgia-imereti': (42.223069, 42.603353),
    'drc-mai-ndombe' : (-3.696119, 20.362077),
    'malawi-salima': (-13.6, 34.32),
    'brazil-para': (-2.064534, -56.578095),
    'brazil-para-2': (-7.351687, -48.457507),
    'pakistan-mardan': (34.355452, 71.945095),
    'botswana-kweneng': (-24.360968, 25.176526),
    'nicaragua-bonanza': (13.933745, -84.690842),
    'ghana-cocoa': (7.398111, -1.269223),
    'ghana-brong-ahafo': (7.70258, -0.70911),
    'mexico-change-det': (21.212083, -88.993677),
    'costa-rica-change-det': (8.47520, -82.94909),
    'honduras-colon': (15.617889, -85.447611),
    'mexico-campeche-change': (18.151747, -92.152278),
    'guatemala-gain': (16.464444, -89.479170),
    'guatemala-gain-2': (15.196480, -89.118290),
    'uganda-agroforestry': (2.353267, 32.681427),
    'uganda-agroforestry-2': (2.333234, 32.657905),
    'brazil-enrichment': (-26.116734, -51.415170),
    'rwanda-agroforestry': (-1.40, 30.459),
    'example': (13.727334, -90.015579),
    'brazil-buffer': (-20.117451, -50.364266),
    'brazil-restoration': (-22.228324, -50.963621),
    'brazil-native-p': (-21.785180, -47.280192),
    'brazil-patch-p': (-23.379173, -47.272593),
    'brazil-lake': (-23.675035, -46.315154),
    'brazil-enrichment-2': (-23.032334, -46.175910),
    'brazil-streambank': (-24.764847, -48.193073),
    'brazil-planting-2': (-23.301148, -45.649483),
    'brazil-gulley': (-22.054421, -52.055029),
    'brazil-planting': (-23.583940, -47.574845),
    'brazil-planting-3': (-23.388818, -47.286080),
    'brazil-tropics': (-24.775491, -48.072984),
    'brazil-small': (-22.040039, -47.826367),
    'brazil-coastal': (-23.837691, -45.456441),
    'brazil-gulley-east': (-21.971591, -51.829941),
    'pilot-1': (15.497, -90.40), # guatemala,
    'pilot-7': (18.648312, -71.794117),
    'pilot-kenya': (-0.051225, 37.14618),
    'pilot-madagascar': (-21.900845, 46.915008),
    'pilot-kenya-2': (-0.998198, 36.591918),
    'pilot-india': (25.717254, 91.957805),
    'pilot-tanzania': (-4.787902, 38.270604),
    'pilot-burkina-2': (12.937066, -1.399379),
    'pilot-kenya-3': (0.163074, 37.831647)
}

landscape = 'pilot-kenya-3'
OUTPUT_FOLDER = '../tile_data/{}/{}/'.format(landscape, str(year))
coords = landscapes[landscape]
coords = (coords[1], coords[0])
print(OUTPUT_FOLDER, coords)

../tile_data/pilot-kenya-3/2017/ (37.831647, 0.163074)


In [29]:
landscape_df = pd.DataFrame({'landscape': [x for x in landscapes.keys()], 
                             'latitude': [x[0] for x in landscapes.values()],
                             'longitude': [x[1] for x in landscapes.values()]
})

landscape_df.to_csv("../data/latlongs/landscapes.csv", index=False)

# 2.0 Projection functions

In [30]:
def calculate_bbx_pyproj(coord: Tuple[float, float],
                         step_x: int, step_y: int,
                         expansion: int, multiplier: int = 1.) -> (Tuple[float, float], 'CRS'):
    ''' Calculates the four corners of a bounding box
        [bottom left, top right] as well as the UTM EPSG using Pyproj
        
        Note: The input for this function is (x, y), not (lat, long)
        
        Parameters:
         coord (tuple): Initial (long, lat) coord
         step_x (int): X tile number of a 6300x6300 meter tile
         step_y (int): Y tile number of a 6300x6300 meter tile
         expansion (int): Typically 10 meters - the size of the border for the predictions
         multiplier (int): Currently deprecated
         
        Returns:
         coords (tuple):
         CRS (int):
    '''
    
    inproj = Proj('epsg:4326')
    outproj_code = calculate_epsg(coord)
    outproj = Proj('epsg:' + str(outproj_code))
    
    
    
    coord_utm =  transform(inproj, outproj, coord[1], coord[0])
    coord_utm_bottom_left = (coord_utm[0] + step_x*6300 - expansion,
                             coord_utm[1] + step_y*6300 - expansion)
    
    coord_utm_top_right = (coord_utm[0] + (step_x+multiplier) * 6300 + expansion,
                           coord_utm[1] + (step_y+multiplier) * 6300 + expansion)

    zone = str(outproj_code)[3:]
    direction = 'N' if coord[1] >= 0 else 'S'
    utm_epsg = "UTM_" + zone + direction
    return (coord_utm_bottom_left, coord_utm_top_right), CRS[utm_epsg]


def pts_in_geojson(lats: List[float], longs: List[float], geojson: 'geojson') -> bool:  
    """ Identifies whether candidate download tile is within an input geojson
        
        Parameters:
         lats (list): list of latitudes
         longs (list): list of longitudes
         geojson (float): path to input geojson
    
        Returns:
         bool 
    """
    polys = geopandas.read_file(geojson)['geometry']
    polys = geopandas.GeoSeries(polys)
    pnts = [Point(x, y) for x, y in zip(list(lats), list(longs))]
    
    def _contains(pt):
        return polys.contains(pt)[0]

    if any([_contains(pt) for pt in pnts]):
        return True
    else: return False

# 2.1 Data download functions

If using Sentinel hub, identify the following layers:
  * CLOUD: return [CLP / 255]
  * SHADOW: return [B02, B8A, B11]
  * DEM: return [DEM]
  * SENT: return [VV, VH]
  * L2A10: return [B02,B03,B04, B08]
  * L2A20: return [B05,B06,B07, B8A,B11,B12]

In [31]:
def identify_clouds(bbox: List[Tuple[float, float]],
                        epsg: 'CRS', dates: dict = dates) -> (np.ndarray, np.ndarray, np.ndarray):
    """ Downloads and calculates cloud cover and shadow
        
        Parameters:
         bbox (list): output of calc_bbox
         epsg (float): EPSG associated with bbox 
         dates (tuple): YY-MM-DD - YY-MM-DD bounds for downloading 
    
        Returns:
         cloud_img (np.array):
         shadows (np.array): 
         clean_steps (np.array):
    """
    box = BBox(bbox, crs = epsg)
    cloud_request = WcsRequest(
        layer='CLOUD_NEW',
        bbox=box,
        time=dates,
        resx='160m', 
        resy='160m',
        image_format = MimeType.TIFF_d8,
        maxcc=0.7,
        instance_id=API_KEY,
        custom_url_params = {constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
        time_difference=datetime.timedelta(hours=72),
    )


    shadow_request = WcsRequest(
        layer='SHADOW',
        bbox=box,
        time=dates,
        resx='20m',
        resy='20m',
        image_format =  MimeType.TIFF_d16,
        maxcc=0.7,
        instance_id=API_KEY,
        custom_url_params = {constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
        time_difference=datetime.timedelta(hours=72))

    cloud_img = cloud_request.get_data()
    cloud_img = np.array(cloud_img)
    
    if np.max(cloud_img > 10):
        cloud_img = cloud_img / 255
        
    assert np.max(cloud_img) <= 1., f'The max cloud probability is {np.max(cloud_img)}'
    c_probs_pus = ((40*40)/(512*512)) *(1/3)*cloud_img.shape[0]
    print(f"Cloud_probs used {round(c_probs_pus, 1)} processing units")
    
    cloud_img = resize(cloud_img, (cloud_img.shape[0], IMSIZE, IMSIZE), order = 0)
    n_cloud_px = np.array([len(np.argwhere(cloud_img[x, ...].reshape((IMSIZE)*(IMSIZE)) > 0.33))
                           for x in range(cloud_img.shape[0])])
    cloud_steps = np.argwhere(n_cloud_px > IMSIZE**2 / 5)
    clean_steps = [x for x in range(cloud_img.shape[0]) if x not in cloud_steps]
    shadow_img = shadow_request.get_data(data_filter = clean_steps)
    shadow_img = np.array(shadow_img)
    shadow_pus = (shadow_img.shape[1]*shadow_img.shape[2])/(512*512) * shadow_img.shape[0]
    shadow_img = resize(shadow_img, (shadow_img.shape[0], IMSIZE, IMSIZE, shadow_img.shape[-1]), order = 0)
    
    
    
    if np.max(shadow_img > 10):
        print(f"The max shadows is {np.max(shadow_img)}")
        shadow_img = shadow_img / 65535
 
    cloud_img = np.delete(cloud_img, cloud_steps, 0)
    shadows = mcm_shadow_mask(np.array(shadow_img), cloud_img)
    print(f"Shadows ({shadows.shape}) used {round(shadow_pus, 1)} processing units")
    return cloud_img, shadows, clean_steps

    
    
def download_dem(bbox: List[Tuple[float, float]], epsg: 'CRS') -> np.ndarray:
    """ Downloads the DEM layer from Sentinel hub
        
        Parameters:
         bbox (list): output of calc_bbox
         epsg (float): EPSG associated with bbox 
    
        Returns:
         dem_image (arr):
    """

    box = BBox(bbox, crs = epsg)
    dem_s = (630)+4+8+8
    dem_request = WmsRequest(data_source=DataSource.DEM,
                         layer='DEM',
                         bbox=box,
                         width=dem_s,
                         height=dem_s,
                         instance_id=API_KEY,
                         image_format=MimeType.TIFF_d32f,
                         custom_url_params={CustomUrlParam.SHOWLOGO: False})
    dem_image = dem_request.get_data()[0]
    dem_image = calcSlope(dem_image.reshape((1, dem_s, dem_s)),
                  np.full((dem_s, dem_s), 10), np.full((dem_s, dem_s), 10), zScale = 1, minSlope = 0.02)
    dem_image = dem_image.reshape((dem_s,dem_s, 1))
    dem_image = dem_image[1:dem_s-1, 1:dem_s-1, :]
    print(f"DEM used {round(((IMSIZE*IMSIZE)/(512*512))*2, 1)} processing units")
    return dem_image
 

def download_layer(bbox: List[Tuple[float, float]],
                   clean_steps: np.ndarray, epsg: 'CRS',
                   dates: dict = dates, year: int = year) -> (np.ndarray, np.ndarray):
    """ Downloads the L2A sentinel layer with 10 and 20 meter bands
        
        Parameters:
         bbox (list): output of calc_bbox
         clean_steps (list): list of steps to filter download request
         epsg (float): EPSG associated with bbox 
         time (tuple): YY-MM-DD - YY-MM-DD bounds for downloading 
    
        Returns:
         img (arr):
         img_request (obj): 
    """
    box = BBox(bbox, crs = epsg)
    image_request = WcsRequest(
            layer='L2A20',
            bbox=box,
            time=dates,
            image_format = MimeType.TIFF_d16,
            maxcc=0.7,
            resx='20m', resy='20m',
            instance_id=API_KEY,
            custom_url_params = {constants.CustomUrlParam.DOWNSAMPLING: 'NEAREST',
                                constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
            time_difference=datetime.timedelta(hours=72),
        )
    img_bands = image_request.get_data(data_filter = clean_steps)
    img_20 = np.stack(img_bands)
    #print(f"The max 20m is {np.max(img_20)}")
    if np.max(img_20) >= 10:
        img_20 = img_20 / 65535
    assert np.max(img_20) <= 2.
    
    s2_20_usage = (img_20.shape[1]*img_20.shape[2])/(512*512) * (6/3) * img_20.shape[0]
    print(f"Original 20 meter bands size: {img_20.shape}, using {round(s2_20_usage, 1)} PU")
    img_20 = resize(img_20, (img_20.shape[0], IMSIZE, IMSIZE, img_20.shape[-1]), order = 0)

    image_request = WcsRequest(
            layer='L2A10',
            bbox=box,
            time=dates,
            image_format = MimeType.TIFF_d16,
            maxcc=0.7,
            resx='10m', resy='10m',
            instance_id=API_KEY,
            custom_url_params = {constants.CustomUrlParam.DOWNSAMPLING: 'BICUBIC',
                                constants.CustomUrlParam.UPSAMPLING: 'BICUBIC'},
            time_difference=datetime.timedelta(hours=72),
    )
    
    img_bands = image_request.get_data(data_filter = clean_steps)
    img_10 = np.stack(img_bands)
    if np.max(img_10) >= 10:
        img_10 = img_10 / 65535
    assert np.max(img_10) <= 2.
    
    s2_10_usage = (img_10.shape[1]*img_10.shape[2])/(512*512) * (4/3) * img_10.shape[0]
    print(f"Original 10 meter bands size: {img_10.shape}, using {round(s2_10_usage, 1)} PU")
    img_10 = resize(img_10, (img_10.shape[0], IMSIZE, IMSIZE, img_10.shape[-1]), order = 0)
    img = np.concatenate([img_10, img_20], axis = -1)
    print(f"Sentinel 2 used {round(s2_20_usage + s2_10_usage, 1)} PU")

    image_dates = []
    for date in image_request.get_dates():
        if date.year == year - 1:
            image_dates.append(-365 + starting_days[(date.month-1)] + date.day)
        if date.year == year:
            image_dates.append(starting_days[(date.month-1)] + date.day)
        if date.year == year + 1:
            image_dates.append(365 + starting_days[(date.month-1)]+date.day)
    image_dates = [val for idx, val in enumerate(image_dates) if idx in clean_steps]
    image_dates = np.array(image_dates)
    return img, image_dates

        
def download_sentinel_1(bbox: List[Tuple[float, float]],
                        epsg: 'CRS', imsize: int = IMSIZE, 
                        dates: dict = dates_sentinel_1, layer: str = "SENT",
                        year: int = year) -> (np.ndarray, np.ndarray):
    """ Downloads the GRD Sentinel 1 VV-VH layer from Sentinel Hub
        
        Parameters:
         bbox (list): output of calc_bbox
         epsg (float): EPSG associated with bbox 
         imsize (int):
         dates (tuple): YY-MM-DD - YY-MM-DD bounds for downloading 
         layer (str):
         year (int): 
    
        Returns:
         s1 (arr):
         image_dates (arr): 
    """
    box = BBox(bbox, crs = epsg)
    image_request = WcsRequest(
            layer=layer,
            bbox=box,
            time=dates,
            image_format = MimeType.TIFF_d16,
            data_source=DataSource.SENTINEL1_IW,
            maxcc=1.0,
            resx='10m', resy='5m',
            instance_id=API_KEY,
            custom_url_params = {constants.CustomUrlParam.DOWNSAMPLING: 'NEAREST',
                                constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
            time_difference=datetime.timedelta(hours=96),
        )
    data_filter = None
    if len(image_request.download_list) > 50:
        data_filter = [x for x in range(len(image_request.download_list)) if x % 2 == 0]
    img_bands = image_request.get_data(data_filter = data_filter)
    s1 = np.stack(img_bands)
    if np.max(s1) >= 1000:
            s1 = s1 / 65535.
    
    s1_usage = (2/3) * s1.shape[0] * ((s1.shape[1]*s1.shape[2]) / (512*512))
    print(f"Sentinel 1 used {round(s1_usage, 1)} PU for \
          {s1.shape[0]} out of {len(image_request.download_list)} images")
    s1 = resize(s1, (s1.shape[0], imsize*2, imsize*2, s1.shape[-1]), order = 0)
    s1 = np.reshape(s1, (s1.shape[0], s1.shape[1]//2, 2, s1.shape[2] // 2, 2, s1.shape[-1]))
    s1 = np.mean(s1, (2, 4))

    image_dates = []
    for date in image_request.get_dates():
        if date.year == year - 1:
            image_dates.append(-365 + starting_days[(date.month-1)] + date.day)
        if date.year == year:
            image_dates.append(starting_days[(date.month-1)] + date.day)
        if date.year == year + 1:
            image_dates.append(365 + starting_days[(date.month-1)]+date.day)
    image_dates = np.array(image_dates)
    s1c = np.copy(s1)
    s1c[np.where(s1c < 1.)] = 0
    n_pix_oob = np.sum(s1c, axis = (1, 2, 3))
    to_remove = np.argwhere(n_pix_oob > (imsize*2*imsize*2)/50)
    s1 = np.delete(s1, to_remove, 0)
    image_dates = np.delete(image_dates, to_remove)
    return s1, image_dates


def identify_s1_layer(coords: Tuple[float, float]) -> str:
    """ Identifies whether to download ascending or descending 
        sentinel 1 orbit based upon predetermined geographic coverage
        
        Reference: https://sentinel.esa.int/web/sentinel/missions/
                   sentinel-1/satellite-description/geographical-coverage
        
        Parameters:
         coords (tuple): 
    
        Returns:
         layer (str): either of SENT, SENT_DESC 
    """
    results = rg.search(coords)
    country = results[-1]['cc']
    continent_name = pc.country_alpha2_to_continent_code(country)
    if continent_name in ['AF', 'OC']:
        layer = "SENT"
    if continent_name in ['SA']:
        if coords[0] > -7.11:
            layer = "SENT"
        else:
            layer = "SENT_DESC"
    if continent_name in ['AS']:
        if coords[0] > 23.3:
            layer = "SENT"
        else:
            layer = "SENT_DESC"
    if continent_name in ['NA']:
        layer = "SENT_DESC"
    print(f"The continent is: {continent_name}, and the sentinel 1 orbit is {layer}")
    return layer

# 2.2 Cloud and shadow removal

In [32]:
def remove_missed_clouds(img: np.ndarray) -> np.ndarray:
    """ Removes clouds that may have been missed by s2cloudless
        by looking at a temporal change outside of IQR
        
        Parameters:
         img (arr): 
    
        Returns:
         to_remove (arr): 
    """
    iqr = np.percentile(img[..., 0].flatten(), 75) - np.percentile(img[..., 0].flatten(), 25)
    thresh_t = np.percentile(img[..., 0].flatten(), 75) + iqr*1.5
    thresh_b = np.percentile(img[..., 0].flatten(), 25) - iqr*1.5
    outlier_percs = []
    for step in range(img.shape[0]):
        bottom = len(np.argwhere(img[step, ..., 0].flatten() > thresh_t))
        top = len(np.argwhere(img[step, ..., 0].flatten() < thresh_b))
        p = 100 * ((bottom + top) / (img.shape[1]*img.shape[2]))
        outlier_percs.append(p)
    to_remove = np.argwhere(np.array(outlier_percs) > 20)
    return to_remove


def calculate_bad_steps(sentinel2: np.ndarray, clouds: np.ndarray) -> np.ndarray:
    """ Calculates the timesteps to remove based upon cloud cover and missing data
        
        Parameters:
         sentinel2 (arr): 
         clouds (arr):
    
        Returns:
         to_remove (arr): 
    """
    n_cloud_px = np.array([len(np.argwhere(clouds[x, ...].reshape((IMSIZE)*(IMSIZE)) > 0.30)) for x in range(clouds.shape[0])])
    cloud_steps = np.argwhere(n_cloud_px > IMSIZE**2 / 7)
    missing_images = [np.argwhere(sentinel2[x, ..., :10].flatten() == 0.0) for x in range(sentinel2.shape[0])]
    missing_images = np.array([len(x) for x in missing_images])
    missing_images_p = [np.argwhere(sentinel2[x, ..., :10].flatten() >= 1) for x in range(sentinel2.shape[0])]
    missing_images_p = np.array([len(x) for x in missing_images_p])
    missing_images += missing_images_p
    missing_images = np.argwhere(missing_images >= (IMSIZE**2) / 20)
    to_remove = np.unique(np.concatenate([cloud_steps.flatten(), missing_images.flatten()]))
    return to_remove

# 2.3 Superresolution

In [33]:
MDL_PATH = "../src/dsen2/models/"

input_shape = ((4, None, None), (6, None, None))
model = s2model(input_shape, num_layers=6, feature_size=128)
predict_file = MDL_PATH+'s2_032_lr_1e-04.hdf5'
print('Symbolic Model Created.')

model.load_weights(predict_file)

def DSen2(d10: np.ndarray, d20: np.ndarray) -> np.ndarray:
    """Super resolves 20 meter bans using the DSen2 convolutional
       neural network, as specified in Lanaras et al. 2018
       https://github.com/lanha/DSen2

        Parameters:
         d10 (arr): (4, X, Y) shape array with 10 meter resolution
         d20 (arr): (6, X, Y) shape array with 20 meter resolution

        Returns:
         prediction (arr): (6, X, Y) shape array with 10 meter superresolved
                          output of DSen2 on d20 array
    """
    test = [d10, d20]
    input_shape = ((4, None, None), (6, None, None))
    prediction = _predict(test, input_shape, deep=False)
    return prediction


def _predict(test: np.ndarray, input_shape: Tuple, model: 'model' = model,
             deep: bool = False, run_60: bool = False) -> np.ndarray:
    """Wrapper function around Keras.model.predict

        Parameters:
         test (arr):
         input_shape (tuple)
         model (Keras.model)
         deep (bool):
         run_60 (bool):

        Returns:
         prediction (arr): (6, X, Y) shape array with 10 meter superresolved
                          output of DSen2 on d20 array
    """
    
    prediction = model.predict(test, verbose=0, batch_size = 8)
    return prediction


def superresolve(sentinel2: np.ndarray) -> np.ndarray:
    """Worker function to deal with types and shapes
       to superresolve a 10-band input array

        Parameters:
         sentinel2 (arr): (:, X, Y, 10) shape array with 10 meter resolution
                          bands in indexes 0-4, and 20 meter in 4- 10

        Returns:
         superresolved (arr): (:, X, Y, 10) shape array with 10 meter 
                              superresolved output of DSen2
    """
    d10 = sentinel2[..., 0:4]
    d20 = sentinel2[..., 4:10]

    d10 = np.swapaxes(d10, 1, -1)
    d10 = np.swapaxes(d10, 2, 3)
    d20 = np.swapaxes(d20, 1, -1)
    d20 = np.swapaxes(d20, 2, 3)
    superresolved = DSen2(d10, d20)
    superresolved = np.swapaxes(superresolved, 1, -1)
    superresolved = np.swapaxes(superresolved, 1, 2)
    sentinel2[..., 4:10] = superresolved
    return sentinel2 # returns band IDXs 3, 4, 5, 7, 8, 9


def superresolve_tile(arr: np.ndarray) -> np.ndarray:
    """Superresolves each 56x56 subtile in a 646x646 input tile
       by padding the subtiles to 64x64 and removing the pad after prediction,
       eliminating boundary artifacts

        Parameters:
         arr (arr): (?, 646, 646, 10) array

        Returns:
         superresolved (arr): (?, 646, 646, 10) array
    """
    print(f"The input array to superresolve is {arr.shape}")
    superresolved = np.copy(arr)
    tiles = tile_window(646, 646, 56, 56)
    for i in tnrange(len(tiles)):
        subtile = tiles[i]
        to_resolve = arr[:, subtile[0]:subtile[0]+56, subtile[1]:subtile[1]+56, :]
        to_resolve = np.pad(to_resolve, ((0, 0), (4, 4), (4, 4), (0, 0)), 'reflect')
        resolved = superresolve(to_resolve)
        resolved = resolved[:, 4:-4, 4:-4, :]
        superresolved[:, subtile[0]:subtile[0]+56, subtile[1]:subtile[1]+56] = resolved
    return superresolved

Symbolic Model Created.


# 2.4 Tiling and folder management functions

In [34]:
def make_output_and_temp_folders(idx: str, output_folder: str = OUTPUT_FOLDER) -> None:
    """Makes necessary folder structures for IO of raw and processed data

        Parameters:
         idx (str)
         output_folder (path)

        Returns:
         None
    """
    def _find_and_make_dirs(dirs):
        if not os.path.exists(os.path.realpath(dirs)):
            os.makedirs(os.path.realpath(dirs))
            
    _find_and_make_dirs(output_folder + "raw/")
    _find_and_make_dirs(output_folder + "raw/clouds/")
    _find_and_make_dirs(output_folder + "raw/s1/")
    _find_and_make_dirs(output_folder + "raw/s2/")
    _find_and_make_dirs(output_folder + "raw/misc/")
    _find_and_make_dirs(output_folder + "processed/")
    _find_and_make_dirs(output_folder + "interim/")
    

def check_contains(coord: tuple, step_x: int, step_y:
                   int, folder: str = OUTPUT_FOLDER) -> bool:
    """Given an input .geojson, identifies whether a given tile intersections
       the geojson

        Parameters:
         coord (tuple):
         step_x (int):
         step_y (int):
         folder (path):

        Returns:
         contains (bool)
    """
    contains = False
    bbx, epsg = calculate_bbx_pyproj(coord, step_x, step_y, expansion = 80)
    inproj = Proj('epsg:' + str(str(epsg)[5:]))
    outproj = Proj('epsg:4326')
    bottomleft = transform(inproj, outproj, bbx[0][0], bbx[0][1])
    topright = transform(inproj, outproj, bbx[1][0], bbx[1][1])
    
    if os.path.exists(folder):
            if any([x.endswith(".geojson") for x in os.listdir(folder)]):
                geojson_path = folder + [x for x in os.listdir(folder) if x.endswith(".geojson")][0]
    
                bool_contains = pts_in_geojson(lats = [bottomleft[1], topright[1]], 
                                                       longs = [bottomleft[0], topright[0]],
                                                       geojson = geojson_path)
                contains = bool_contains
    return contains

def download_large_tile(coord: tuple,
                        step_x: int,
                        step_y: int,
                        folder: str = OUTPUT_FOLDER, 
                        year: int = year,
                        s1_layer: str = "SENT") -> None:
    """Wrapper function to download cloud probs, Sentinel 2, Sentinel 1, and DEM

        Parameters:
         coord (tuple):
         step_x (int):
         step_y (int):
         folder (path):
         year (int):
         s1_layer (str):

        Returns:
         None
    """
    bbx, epsg = calculate_bbx_pyproj(coord, step_x, step_y, expansion = 80)
    dem_bbx, _ = calculate_bbx_pyproj(coord, step_x, step_y, expansion = 90)
    idx = str(step_y) + "_" + str(step_x)
    idx = str(idx)
    make_output_and_temp_folders(idx)

    if not os.path.exists(folder + "output/" + str(step_y*5) + "/" + str(step_x*5) + ".npy"):
        if not os.path.exists(folder + "processed/" + str(step_y*5) + "/" + str(step_x*5) + ".hkl"):
            clouds_file = f'{folder}raw/clouds/clouds_{idx}.hkl'
            shadows_file = f'{folder}raw/clouds/shadows_{idx}.hkl'
            s1_file = f'{folder}raw/s1/{idx}.hkl'
            s1_dates_file = f'{folder}raw/misc/s1_dates_{idx}.hkl'
            s2_file = f'{folder}raw/s2/{idx}.hkl'
            s2_dates_file = f'{folder}raw/misc/s2_dates_{idx}.hkl'
            clean_steps_file = f'{folder}raw/clouds/clean_steps_{idx}.hkl'
            
            if not os.path.exists(clouds_file):
                print(f"Downloading clouds because {clouds_file} does not exist")
                cloud_probs, shadows, clean_steps = identify_clouds(bbx, epsg = epsg)
                hkl.dump(cloud_probs, clouds_file, mode='w', compression='gzip')
                hkl.dump(shadows, shadows_file, mode='w', compression='gzip')
                hkl.dump(clean_steps, clean_steps_file, mode='w', compression='gzip')
            
            if not os.path.exists(s1_file):
                print(f"Downloading S1 because {s1_file} does not exist")
                s1_layer = identify_s1_layer((coord[1], coord[0]))
                s1, s1_dates = download_sentinel_1(bbx, layer = s1_layer, epsg = epsg)
                if s1.shape[0] == 0:
                    s1_layer = "SENT_DESC" if s1_layer == "SENT" else "SENT"
                    print(f'Switching to {s1_layer}')
                    s1, s1_dates = download_sentinel_1(bbx, layer = s1_layer, epsg = epsg)
                s1 = process_sentinel_1_tile(s1, s1_dates)
                hkl.dump(s1, s1_file, mode='w', compression='gzip')
                hkl.dump(s1_dates, s1_dates_file, mode='w', compression='gzip')

            if not os.path.exists(s2_file):
                print(f"Downloading S2 because {s2_file} does not exist")
                if 'clean_steps' not in globals() or locals():
                    clean_steps = hkl.load(clean_steps_file)
                s2, s2_dates = download_layer(bbx, clean_steps = clean_steps, epsg = epsg)
                print(s2_dates)
                hkl.dump(s2, s2_file, mode='w', compression='gzip')
                hkl.dump(s2_dates, s2_dates_file, mode='w', compression='gzip')

            if not os.path.exists(folder + "raw/misc/dem_{}.hkl".format(idx)):
                dem = download_dem(dem_bbx, epsg = epsg) # get the DEM BBOX
                hkl.dump(dem, folder + "raw/misc/dem_{}.hkl".format(idx), mode='w', compression='gzip')

In [35]:
def reject_outliers(data, m = 4):
    d = data - np.median(data, axis = (0))
    mdev = np.median(data, axis = 0)
    s = d / mdev
    n_changed = 0
    for x in tnrange(data.shape[1]):
        for y in range(data.shape[2]):
            for band in range(data.shape[3]):
                to_correct = np.where(s[:, x, y, band] > m) 
                data[to_correct, x, y, band] = mdev[x, y, band]
                n_changed += len(to_correct[0])
    print(f"Rejected {n_changed} outliers")
    return data

In [36]:
def process_sentinel_1_tile(sentinel1: np.ndarray, dates: np.ndarray) -> np.ndarray:
    """Converts a (?, X, Y, 2) Sentinel 1 array to (24, X, Y, 2)

        Parameters:
         sentinel1 (np.array):
         dates (np.array):

        Returns:
         s1 (np.array)
    """
    s1, _ = calculate_and_save_best_images(sentinel1, dates)
    biweekly_dates = np.array([day for day in range(0, 360, 5)])
    to_remove = np.argwhere(biweekly_dates % 15 != 0)
    s1 = np.delete(s1, to_remove, 0)
    return s1


def convert_to_int16(array: np.array) -> np.array:
    '''Converts a float32 array to int16, reducing storage costs by three-fold'''
    return np.trunc(array * 65535).astype(int)


def make_folder_names(step_x: int, step_y: int) -> (list, list):
    '''Given an input tile location (step_x, step_y), identify the folder and file
       names for each 5x5 subtile
       
       Parameters:
         step_x (int):
         step_y (int):

        Returns:
         x_vals (list)
         y_vals (list)
    '''
    x_vals = []
    y_vals = []
    for i in range(25):
        y_val = (24 - i) // 5
        x_val = 5 - ((25 - i) % 5)
        x_val = 0 if x_val == 5 else x_val
        x_vals.append(x_val)
        y_vals.append(y_val)
    y_vals = [i + (5*step_y) for i in y_vals]
    x_vals = [i + (5*step_x) for i in x_vals]
    return x_vals, y_vals


def process_large_tile(coord: tuple,
                       step_x: int,
                       step_y: int,
                       folder: str = OUTPUT_FOLDER) -> None:
    '''Wrapper function to interpolate clouds and temporal gaps, superresolve tiles,
       calculate relevant indices, and save analysis-ready data to the output folder
       
       Parameters:
        coord (tuple)
        step_x (int):
        step_y (int):
        foldre (str):

       Returns:
        None
    '''
    idx = str(step_y) + "_" + str(step_x)
    x_vals, y_vals = make_folder_names(step_x, step_y)

    processed = True
    for x, y in zip(x_vals, y_vals):
        folder_path = f"{str(y)}/{str(x)}"
        processed_exists = os.path.exists(folder + "processed/" + folder_path + ".hkl")
        output_exists = os.path.exists(folder + "output/" + folder_path + ".npy")
        if not (processed_exists or output_exists):
            processed = False
    if not processed:
        clouds = hkl.load(folder + "raw/clouds/clouds_{}.hkl".format(idx))
        sentinel1 = hkl.load(folder + "raw/s1/{}.hkl".format(idx))
        radar_dates = hkl.load(folder + "raw/misc/s1_dates_{}.hkl".format(idx))
        sentinel2 = hkl.load(folder + "raw/s2/{}.hkl".format(idx))
        print(sentinel2.shape)
        dem = hkl.load(folder + "raw/misc/dem_{}.hkl".format(idx))
        print(dem.shape)
        image_dates = hkl.load(folder + "raw/misc/s2_dates_{}.hkl".format(idx))
        if os.path.exists(folder + "raw/clouds/shadows_{}.hkl".format(idx)):
            shadows = hkl.load(folder + "raw/clouds/shadows_{}.hkl".format(idx))
        else:
            print("No shadows file, so calculating shadows with L2A")
            shadows = mcm_shadow_mask(sentinel2, clouds)

        to_remove = calculate_bad_steps(sentinel2, clouds)
        sentinel2 = np.delete(sentinel2, to_remove, axis = 0)
        clouds = np.delete(clouds, to_remove, axis = 0)
        shadows = np.delete(shadows, to_remove, axis = 0)
        image_dates = np.delete(image_dates, to_remove)
        print(f"{len(to_remove)} Cloudy and missing images removed, radar processed: {to_remove}")
        to_remove = remove_missed_clouds(sentinel2)
        sentinel2 = np.delete(sentinel2, to_remove, axis = 0)
        clouds = np.delete(clouds, to_remove, axis = 0)
        shadows = np.delete(shadows, to_remove, axis = 0)
        image_dates = np.delete(image_dates, to_remove)
        print(f"{len(to_remove)} missed cloudy images were removed: {to_remove}")
        x, interp = remove_cloud_and_shadows(sentinel2, clouds, shadows, image_dates)
        print("Clouds and shadows interpolated")    
                
        to_remove = np.argwhere(np.mean(interp, axis = (1, 2, 3)) > 0.5)
        print(f"{len(to_remove)} steps removed because of >50% interpolation rate")
        sentinel2 = np.delete(sentinel2, to_remove, axis = 0)
        clouds = np.delete(clouds, to_remove, axis = 0)
        shadows = np.delete(shadows, to_remove, axis = 0)
        image_dates = np.delete(image_dates, to_remove)
        
        index = 0
        print("Super resolving tile")
        x = superresolve_tile(x)
        
        dem_i = np.tile(dem[np.newaxis, 1:-1, 1:-1, :], (x.shape[0], 1, 1, 1))
        print(dem_i.shape)
        x = np.concatenate([x, dem_i / 90], axis = -1)
        x = evi(x, verbose = True)
        x = bi(x, verbose = True)
        x = msavi2(x, verbose = True)
        x = si(x, verbose = True)
        #x, _ = calculate_and_save_best_images(x, image_dates)
        interim_file = f"{folder}interim/{idx}.hkl"
        interim_dates = f"{folder}interim/dates_{idx}.hkl"
        hkl.dump(x, interim_file, mode = 'w', compression = 'gzip')
        hkl.dump(image_dates, interim_dates, mode = 'w', compression = 'gzip')
        
        
        tiles = tile_window(IMSIZE, IMSIZE, window_size = 142)
        for t in tiles:
            start_x, start_y = t[0], t[1]
            end_x = start_x + t[2]
            end_y = start_y + t[3]
            
            output_file = f"{folder}processed/{y_vals[index]}/{x_vals[index]}.hkl"
            out_y_folder = f"{folder}processed/{y_vals[index]}/"
            index += 1
            print(f"{index}: The output file is {output_file}")
            #if not os.path.exists(output_file):
                #subtile = x[:, start_x:end_x, start_y:end_y, :]
                #subtile = interpolate_array(subtile, dim = 142)
                #subtile = np.concatenate(
                #    [subtile, sentinel1[:, start_x:end_x, start_y:end_y, :]], axis = -1
                #)
                #if not os.path.exists(os.path.realpath(out_y_folder)):
                #    os.makedirs(os.path.realpath(out_y_folder))
                #subtile = convert_to_int16(subtile)
                #assert subtile.shape[1] == 142, f"subtile shape is {subtile.shape}"
                #hkl.dump(subtile, output_file, mode='w', compression='gzip')

# 2.5 Function execution

In [37]:
downloaded = 0

if not os.path.exists(os.path.realpath(OUTPUT_FOLDER)):
            os.makedirs(os.path.realpath(OUTPUT_FOLDER))
        
print(f"Downloading {year} for {landscape}")

max_x = 1
max_y = 1

for x_tile in range(0, max_x):
    for y_tile in range(0, max_y):
        contains = True
        #contains = check_contains(coords, x_tile, y_tile, OUTPUT_FOLDER)
        if contains:
            print(f"Download {downloaded}/{max_x*max_y}; X: {x_tile} Y:{y_tile}")
            downloaded += 1
            download_large_tile(coord = coords, step_x = x_tile, step_y = y_tile)
            process_large_tile(coords, x_tile, y_tile)
            print("\n")

Downloading 2017 for pilot-kenya-3
Download 0/1; X: 0 Y:0
(13, 646, 646, 10)
(648, 648, 1)
2 Cloudy and missing images removed, radar processed: [ 4 11]
0 missed cloudy images were removed: []
Interpolated 1001860 px
Clouds and shadows interpolated
3 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (11, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


(11, 646, 646, 1)
bis error: -59.000000000003084, 81.99999999999562
sis error: 0.1207765751289976, 1.0017868157581125
1: The output file is ../tile_data/pilot-kenya-3/2017/processed/4/0.hkl
2: The output file is ../tile_data/pilot-kenya-3/2017/processed/4/1.hkl
3: The output file is ../tile_data/pilot-kenya-3/2017/processed/4/2.hkl
4: The output file is ../tile_data/pilot-kenya-3/2017/processed/4/3.hkl
5: The output file is ../tile_data/pilot-kenya-3/2017/processed/4/4.hkl
6: The output file is ../tile_data/pilot-kenya-3/2017/processed/3/0.hkl
7: The output file is ../tile_data/pilot-kenya-3/2017/processed/3/1.hkl
8: The output file is ../tile_data/pilot-kenya-3/2017/processed/3/2.hkl
9: The output file is ../tile_data/pilot-kenya-3/2017/processed/3/3.hkl
10: The output file is ../tile_data/pilot-kenya-3/2017/processed/3/4.hkl
11: The output file is ../tile_data/pilot-kenya-3/2017/processed/2/0.hkl
12: The output file is ../tile_data/pilot-kenya-3/2017/processed/2/1.hkl
13: The output

In [40]:
INPUT_FOLDER = f'../tile_data/{landscape}/'
def process_multiple_years(coord: tuple,
                       step_x: int,
                       step_y: int,
                       path: str = INPUT_FOLDER) -> None:
    '''Wrapper function to interpolate clouds and temporal gaps, superresolve tiles,
       calculate relevant indices, and save analysis-ready data to the output folder
       
       Parameters:
        coord (tuple)
        step_x (int):
        step_y (int):
        folder (str):

       Returns:
        None
    '''
    idx = str(step_y) + "_" + str(step_x)
    x_vals, y_vals = make_folder_names(step_x, step_y)
    
    # load 
    x2017 = hkl.load(f"{path}/2017/interim/{idx}.hkl")
    x2018 = hkl.load(f"{path}/2018/interim/{idx}.hkl")
    x2019 = hkl.load(f"{path}/2019/interim/{idx}.hkl")
    
    d2017 = hkl.load(f"{path}/2017/interim/dates_{idx}.hkl")
    d2018 = hkl.load(f"{path}/2018/interim/dates_{idx}.hkl")
    d2019 = hkl.load(f"{path}/2019/interim/dates_{idx}.hkl")
    
    s1_2017 = hkl.load(f"{path}/2017/raw/s1/{idx}.hkl")
    s1_2018 = hkl.load(f"{path}2018/raw/s1/{idx}.hkl")
    s1_2019 = hkl.load(f"{path}2019/raw/s1/{idx}.hkl")
    
    
    s1_all = np.concatenate([s1_2017, s1_2018, s1_2019], axis = 0)
    index = 0
    tiles = tile_window(IMSIZE, IMSIZE, window_size = 142)
    for t in tiles:
        start_x, start_y = t[0], t[1]
        end_x = start_x + t[2]
        end_y = start_y + t[3]
        
        # 2017 
        s2017 = x2017[:, start_x:end_x, start_y:end_y, :]
        s2018 = x2018[:, start_x:end_x, start_y:end_y, :]
        s2019 = x2019[:, start_x:end_x, start_y:end_y, :]
        
        s2017, _  = calculate_and_save_best_images(s2017, d2017)
        s2018, _ = calculate_and_save_best_images(s2018, d2018)
        s2019, _ = calculate_and_save_best_images(s2019, d2019)
        
        subtile = np.concatenate([s2017, s2018, s2019], axis = 0)
        print(subtile.shape)
        
        
        out_17 = f"{path}/2017/processed/{y_vals[index]}/{x_vals[index]}.hkl"
        out_18 = f"{path}/2018/processed/{y_vals[index]}/{x_vals[index]}.hkl"
        out_19 = f"{path}/2019/processed/{y_vals[index]}/{x_vals[index]}.hkl"
        
        index += 1
        print(f"{index}: The output file is {out_17}")
        
        subtile = interpolate_array(subtile, dim = 142)
        subtile = np.concatenate(
            [subtile, s1_all[:, start_x:end_x, start_y:end_y, :]], axis = -1
        )
        for folder in [out_17, out_18, out_19]:
            output_folder = "/".join(folder.split("/")[:-1])
            if not os.path.exists(os.path.realpath(output_folder)):
                os.makedirs(os.path.realpath(output_folder))
        subtile = convert_to_int16(subtile)
        assert subtile.shape[1] == 142, f"subtile shape is {subtile.shape}"
        
        hkl.dump(subtile[:24], out_17, mode='w', compression='gzip')
        hkl.dump(subtile[24:48], out_18, mode='w', compression='gzip')
        hkl.dump(subtile[48:], out_19, mode='w', compression='gzip')
        #hkl.dump(subtile, output_file, mode='w', compression='gzip')

In [41]:
max_x = 1
max_y = 1

for x_tile in range(0, max_x):
    for y_tile in range(0, max_y):
        contains = True
        #contains = check_contains(coords, x_tile, y_tile, OUTPUT_FOLDER)
        if contains:
            process_multiple_years(coords, x_tile, y_tile)
            print("\n")

Maximum time distance: 180
Maximum time distance: 140
Maximum time distance: 160
(216, 142, 142, 15)
1: The output file is ../tile_data/pilot-kenya-3//2017/processed/4/0.hkl
Maximum time distance: 180
Maximum time distance: 140
Maximum time distance: 160
(216, 142, 142, 15)
2: The output file is ../tile_data/pilot-kenya-3//2017/processed/4/1.hkl
Maximum time distance: 180
Maximum time distance: 140
Maximum time distance: 160
(216, 142, 142, 15)
3: The output file is ../tile_data/pilot-kenya-3//2017/processed/4/2.hkl
Maximum time distance: 180
Maximum time distance: 140
Maximum time distance: 160
(216, 142, 142, 15)
4: The output file is ../tile_data/pilot-kenya-3//2017/processed/4/3.hkl
Maximum time distance: 180
Maximum time distance: 140
Maximum time distance: 160
(216, 142, 142, 15)
5: The output file is ../tile_data/pilot-kenya-3//2017/processed/4/4.hkl
Maximum time distance: 180
Maximum time distance: 140
Maximum time distance: 160
(216, 142, 142, 15)
6: The output file is ../tile