# Overview

This Jupyter notebook downloads and preprocesses Sentinel 1 and 2 tiles for large areas (at least 40 sq km). The workflow entails generating the tile coordinates, downloading the raw data, and processing (cloud and shadow removal, gap interpolation, indices, and superresolution).

The notebook is broken down into the following sections:

   * **Parameter definition**:
   * **Projection functions**
   * **Data download functions**
   * **Cloud and shadow removal functions**
   * **Superresoluttion functions**
   * **Tile and folder management functions**
   * **Function execution**

If you are planning to download new Sentinel data, you need to have an API key to use the data provider [Sentinel Hub](https://www.sentinel-hub.com). If you do not have an API key but have access to sentinel imagery, the input data for this notebook is an entire year of:
  * Cloud masks
  * L1C bands 2, 8A, 11
  * 10- and 20m L2A bands
  * VV-VH Sentinel 1 bands
  * Digital elevation model
  
  
The data are tiled into 6300m x 6300m windows. An example of the raw data can be downloaded by running the following cell. This data can be preprocessed (cloud interpolation, super resolution, smoothing, etcetera) by running the rest of the notebook. It can then also be predicted by running `4b-predict-large-area`.

In [1]:
# If using example raw data
import os
if not os.path.exists("../data/example/raw"):
    os.makedirs("../data/example/raw/")
    
landscape = 'example'
OUTPUT_FOLDER = '../data/{}/'.format(landscape)
coords = (13.727334, -90.015579)
coords = (coords[1], coords[0])

In [2]:
# Download example raw data - only if you don't have an API key!
!curl https://restoration-monitoring-external.s3.amazonaws.com/restoration-mapper/example/example.zip \
    -o ../data/example/raw/data.zip
!unzip ../data/example/raw/data.zip -d ../data/example/raw/

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 71  374M   71  266M    0     0  7524k      0  0:00:50  0:00:36  0:00:14 8022k^C
Archive:  ../data/example/raw/data.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of ../data/example/raw/data.zip or
        ../data/example/raw/data.zip.zip, and cannot find ../data/example/raw/data.zip.ZIP, period.


# 1.0 Package Imports

In [1]:
import pandas as pd
import numpy as np
from random import shuffle
from osgeo import ogr, osr
from sentinelhub import WmsRequest, WcsRequest, MimeType, CRS, BBox, constants
import logging
from collections import Counter
import datetime
import os
import yaml
from sentinelhub import DataSource
import scipy.sparse as sparse
import scipy
from scipy.sparse.linalg import splu
from skimage.transform import resize
from sentinelhub import CustomUrlParam
from time import time as timer
from time import sleep as sleep
import multiprocessing
import math
import reverse_geocoder as rg
import pycountry
import pycountry_convert as pc
import hickle as hkl
from shapely.geometry import Point, Polygon
import geopandas
from tqdm import tnrange, tqdm_notebook
import math
import boto3
from pyproj import Proj, transform
from timeit import default_timer as timer
from typing import Tuple, List
import warnings

In [2]:
if os.path.exists("../config.yaml"):
    with open("../config.yaml", 'r') as stream:
        key = (yaml.safe_load(stream))
        API_KEY = key['key']
        AWSKEY = key['awskey']
        AWSSECRET = key['awssecret']
else:
    API_KEY = "none"

In [16]:
%run ../src/preprocessing/slope.py
%run ../src/preprocessing/indices.py
%run ../src/downloading/utils.py
%run ../src/preprocessing/cloud_removal.py
%run ../src/preprocessing/whittaker_smoother.py
%run ../src/dsen2/utils/DSen2Net.py
%run ../src/io/upload.py

# 1.1 Constants and Parameters

Currently the only years that can be downloaded from Sentinel Hub are 2018 and 2019. 2017 has an ETA of Summer 2020.

The `landscapes` dictionary has a key, value convention of the landscape name, and a `(lat, long)` tuple

In [17]:
year = 2019
landscape = 'kenya-makueni'

if year > 2017:
    dates = (f'{str(year - 1)}-12-01' , f'{str(year + 1)}-02-01')
else: 
    dates = (f'{str(year)}-01-01' , f'{str(year + 1)}-02-01')
dates_sentinel_1 = (f'{str(year)}-01-01' , f'{str(year)}-12-31')
SIZE = 9*5
IMSIZE = (7*2) + (SIZE * 14)+2

days_per_month = [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30]
starting_days = np.cumsum(days_per_month)

In [18]:
database = pd.read_csv("../project-monitoring/database.csv")
coords = database[database['landscape'] == landscape]
path = coords['path'].tolist()[0]
coords = (float(coords['longitude']), float(coords['latitude']))

IO_PARAMS = {'prefix': '../',
             'bucket': 'restoration-monitoring',
             'coords': coords,
             'bucket-prefix': '',
             'path': path}

OUTPUT_FOLDER = IO_PARAMS['prefix'] + IO_PARAMS['path'] + str(year) + '/'
print(coords, OUTPUT_FOLDER)

(37.11402, -3.08768) ../project-monitoring//kenya/makueni/mtito-andei/2019/


In [19]:
#uploader = FileUploader(awskey = AWSKEY, awssecret = AWSSECRET)
#file = '../tile_data/processed/data_x_l2a_processed.hkl'
#key = get_folder_prefix(coordinates) + '2018/raw/s2/0_0.hkl'
#key = 'restoration-mapper/model-data/global/data_x_l2a_processed.hkl'
#uploader.upload(bucket = 'restoration-monitoring', key = key, file = file)

In [20]:
to_append = pd.DataFrame({'landscape': ['kenya-makueni-3'], 
                             'latitude': ['-2.657604'],
                             'longitude': ['38.044953'],
                             'path': [get_folder_prefix((-2.657604, 38.044953),
                                                        params = {'bucket-prefix': 'project-monitoring/'})]})
database = database.append([to_append])
database.to_csv("../project-monitoring/database.csv", index=False)

# 2.0 Projection functions

In [21]:
def pts_in_geojson(lats: List[float], longs: List[float], geojson: 'geojson') -> bool:  
    """ Identifies whether candidate download tile is within an input geojson
        
        Parameters:
         lats (list): list of latitudes
         longs (list): list of longitudes
         geojson (float): path to input geojson
    
        Returns:
         bool 
    """
    polys = geopandas.read_file(geojson)['geometry']
    polys = geopandas.GeoSeries(polys)
    pnts = [Point(x, y) for x, y in zip(list(lats), list(longs))]
    
    def _contains(pt):
        return polys.contains(pt)[0]

    if any([_contains(pt) for pt in pnts]):
        return True
    else: return False

# 2.1 Data download functions

If using Sentinel hub, identify the following layers:
  * CLOUD: return [CLP / 255]
  * SHADOW: return [B02, B8A, B11]
  * DEM: return [DEM]
  * SENT: return [VV, VH]
  * L2A10: return [B02,B03,B04, B08]
  * L2A20: return [B05,B06,B07, B8A,B11,B12]

In [22]:
def identify_clouds(bbox: List[Tuple[float, float]],
                        epsg: 'CRS', dates: dict = dates) -> (np.ndarray, np.ndarray, np.ndarray):
    """ Downloads and calculates cloud cover and shadow
        
        Parameters:
         bbox (list): output of calc_bbox
         epsg (float): EPSG associated with bbox 
         dates (tuple): YY-MM-DD - YY-MM-DD bounds for downloading 
    
        Returns:
         cloud_img (np.array):
         shadows (np.array): 
         clean_steps (np.array):
    """
    box = BBox(bbox, crs = epsg)
    cloud_request = WcsRequest(
        layer='CLOUD_NEW',
        bbox=box,
        time=dates,
        resx='160m', 
        resy='160m',
        image_format = MimeType.TIFF_d8,
        maxcc=0.7,
        instance_id=API_KEY,
        custom_url_params = {constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
        time_difference=datetime.timedelta(hours=72),
    )

    shadow_request = WcsRequest(
        layer='SHADOW',
        bbox=box,
        time=dates,
        resx='20m',
        resy='20m',
        image_format =  MimeType.TIFF_d16,
        maxcc=0.7,
        instance_id=API_KEY,
        custom_url_params = {constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
        time_difference=datetime.timedelta(hours=72))

    cloud_img = np.array(cloud_request.get_data())
    
    if np.max(cloud_img > 10):
        cloud_img = cloud_img / 255
        
    assert np.max(cloud_img) <= 1., f'The max cloud probability is {np.max(cloud_img)}'
    c_probs_pus = ((40*40)/(512*512)) *(1/3)*cloud_img.shape[0]
    print(f"Cloud_probs used {round(c_probs_pus, 1)} processing units")
    
    cloud_img = resize(cloud_img, (cloud_img.shape[0], IMSIZE, IMSIZE), order = 0)
    cloud_img_flat = cloud_img.reshape(cloud_img.shape[0], cloud_img.shape[1]*cloud_img.shape[2])
    n_cloud_px = np.sum(cloud_img_flat > 0.30, axis = 1)
    print(np.around(n_cloud_px / IMSIZE**2, 2))
    cloud_steps = np.argwhere(n_cloud_px > IMSIZE**2 / 20)
    clean_steps = [x for x in range(cloud_img.shape[0]) if x not in cloud_steps]
    shadow_img = shadow_request.get_data(data_filter = clean_steps)
    shadow_img = np.array(shadow_img)
    shadow_pus = (shadow_img.shape[1]*shadow_img.shape[2])/(512*512) * shadow_img.shape[0]
    shadow_img = resize(shadow_img, (shadow_img.shape[0], IMSIZE, IMSIZE, shadow_img.shape[-1]), order = 0)
    
    if np.max(shadow_img > 10):
        print(f"The max shadows is {np.max(shadow_img)}")
        shadow_img = shadow_img / 65535
 
    cloud_img = np.delete(cloud_img, cloud_steps, 0)
    shadows = mcm_shadow_mask(np.array(shadow_img), cloud_img) # Make usre this makes sense??
    print(f"Shadows ({shadows.shape}) used {round(shadow_pus, 1)} processing units")
    return cloud_img, shadows, clean_steps
    
def download_dem(bbox: List[Tuple[float, float]], epsg: 'CRS') -> np.ndarray:
    """ Downloads the DEM layer from Sentinel hub
        
        Parameters:
         bbox (list): output of calc_bbox
         epsg (float): EPSG associated with bbox 
    
        Returns:
         dem_image (arr):
    """

    box = BBox(bbox, crs = epsg)
    dem_s = (630)+4+8+8
    dem_request = WmsRequest(data_source=DataSource.DEM,
                         layer='DEM',
                         bbox=box,
                         width=dem_s,
                         height=dem_s,
                         instance_id=API_KEY,
                         image_format=MimeType.TIFF_d32f,
                         custom_url_params={CustomUrlParam.SHOWLOGO: False})
    dem_image = dem_request.get_data()[0]
    dem_image = calcSlope(dem_image.reshape((1, dem_s, dem_s)),
                  np.full((dem_s, dem_s), 10), np.full((dem_s, dem_s), 10), zScale = 1, minSlope = 0.02)
    dem_image = dem_image.reshape((dem_s,dem_s, 1))
    dem_image = dem_image[1:dem_s-1, 1:dem_s-1, :]
    print(f"DEM used {round(((IMSIZE*IMSIZE)/(512*512))*2, 1)} processing units")
    return dem_image
 

def download_layer(bbox: List[Tuple[float, float]],
                   clean_steps: np.ndarray, epsg: 'CRS',
                   dates: dict = dates, year: int = year) -> (np.ndarray, np.ndarray):
    """ Downloads the L2A sentinel layer with 10 and 20 meter bands
        
        Parameters:
         bbox (list): output of calc_bbox
         clean_steps (list): list of steps to filter download request
         epsg (float): EPSG associated with bbox 
         time (tuple): YY-MM-DD - YY-MM-DD bounds for downloading 
    
        Returns:
         img (arr):
         img_request (obj): 
    """
    box = BBox(bbox, crs = epsg)
    image_request = WcsRequest(
            layer='L2A20',
            bbox=box,
            time=dates,
            image_format = MimeType.TIFF_d16,
            maxcc=0.7,
            resx='20m', resy='20m',
            instance_id=API_KEY,
            custom_url_params = {constants.CustomUrlParam.DOWNSAMPLING: 'NEAREST',
                                constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
            time_difference=datetime.timedelta(hours=72),
        )
    img_bands = image_request.get_data(data_filter = clean_steps)
    img_20 = np.stack(img_bands).astype(np.float32)
    if np.max(img_20) >= 10:
        img_20 = img_20 / 65535
    assert np.max(img_20) <= 2.
    
    s2_20_usage = (img_20.shape[1]*img_20.shape[2])/(512*512) * (6/3) * img_20.shape[0]
    print(f"Original 20 meter bands size: {img_20.shape}, using {round(s2_20_usage, 1)} PU")
    img_20 = resize(img_20, (img_20.shape[0], IMSIZE, IMSIZE, img_20.shape[-1]), order = 0)

    image_request = WcsRequest(
            layer='L2A10',
            bbox=box,
            time=dates,
            image_format = MimeType.TIFF_d16,
            maxcc=0.7,
            resx='10m', resy='10m',
            instance_id=API_KEY,
            custom_url_params = {constants.CustomUrlParam.DOWNSAMPLING: 'BICUBIC',
                                constants.CustomUrlParam.UPSAMPLING: 'BICUBIC'},
            time_difference=datetime.timedelta(hours=72),
    )
    
    img_bands = image_request.get_data(data_filter = clean_steps)
    img_10 = np.stack(img_bands).astype(np.float32)
    if np.max(img_10) >= 10:
        img_10 = img_10 / 65535
    assert np.max(img_10) <= 2.
    
    s2_10_usage = (img_10.shape[1]*img_10.shape[2])/(512*512) * (4/3) * img_10.shape[0]
    print(f"Original 10 meter bands size: {img_10.shape}, using {round(s2_10_usage, 1)} PU")
    img_10 = resize(img_10, (img_10.shape[0], IMSIZE, IMSIZE, img_10.shape[-1]), order = 0)
    img = np.concatenate([img_10, img_20], axis = -1)
    print(f"Sentinel 2 used {round(s2_20_usage + s2_10_usage, 1)} PU")

    image_dates = []
    for date in image_request.get_dates():
        if date.year == year - 1:
            image_dates.append(-365 + starting_days[(date.month-1)] + date.day)
        if date.year == year:
            image_dates.append(starting_days[(date.month-1)] + date.day)
        if date.year == year + 1:
            image_dates.append(365 + starting_days[(date.month-1)]+date.day)
    image_dates = [val for idx, val in enumerate(image_dates) if idx in clean_steps]
    image_dates = np.array(image_dates, dtype = np.float32)
    return img, image_dates

        
def download_sentinel_1(bbox: List[Tuple[float, float]],
                        epsg: 'CRS', imsize: int = IMSIZE, 
                        dates: dict = dates_sentinel_1, layer: str = "SENT",
                        year: int = year) -> (np.ndarray, np.ndarray):
    """ Downloads the GRD Sentinel 1 VV-VH layer from Sentinel Hub
        
        Parameters:
         bbox (list): output of calc_bbox
         epsg (float): EPSG associated with bbox 
         imsize (int):
         dates (tuple): YY-MM-DD - YY-MM-DD bounds for downloading 
         layer (str):
         year (int): 
    
        Returns:
         s1 (arr):
         image_dates (arr): 
    """
    box = BBox(bbox, crs = epsg)
    image_request = WcsRequest(
            layer=layer,
            bbox=box,
            time=dates,
            image_format = MimeType.TIFF_d16,
            data_source=DataSource.SENTINEL1_IW,
            maxcc=1.0,
            resx='10m', resy='5m',
            instance_id=API_KEY,
            custom_url_params = {constants.CustomUrlParam.DOWNSAMPLING: 'NEAREST',
                                constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
            time_difference=datetime.timedelta(hours=96),
        )
    data_filter = None
    if len(image_request.download_list) > 50:
        data_filter = [x for x in range(len(image_request.download_list)) if x % 2 == 0]
    img_bands = image_request.get_data(data_filter = data_filter)
    s1 = np.stack(img_bands).astype(np.float32)
    if np.max(s1) >= 1000:
            s1 = s1 / 65535.
    
    s1_usage = (2/3) * s1.shape[0] * ((s1.shape[1]*s1.shape[2]) / (512*512))
    print(f"Sentinel 1 used {round(s1_usage, 1)} PU for \
          {s1.shape[0]} out of {len(image_request.download_list)} images")
    s1 = resize(s1, (s1.shape[0], imsize*2, imsize*2, s1.shape[-1]), order = 0)
    s1 = np.reshape(s1, (s1.shape[0], s1.shape[1]//2, 2, s1.shape[2] // 2, 2, s1.shape[-1]))
    s1 = np.mean(s1, (2, 4))

    image_dates = []
    for date in image_request.get_dates():
        if date.year == year - 1:
            image_dates.append(-365 + starting_days[(date.month-1)] + date.day)
        if date.year == year:
            image_dates.append(starting_days[(date.month-1)] + date.day)
        if date.year == year + 1:
            image_dates.append(365 + starting_days[(date.month-1)]+date.day)
    image_dates = np.array(image_dates)
    s1c = np.copy(s1)
    s1c[np.where(s1c < 1.)] = 0
    n_pix_oob = np.sum(s1c, axis = (1, 2, 3))
    to_remove = np.argwhere(n_pix_oob > (imsize*2*imsize*2)/50)
    s1 = np.delete(s1, to_remove, 0)
    image_dates = np.delete(image_dates, to_remove)
    return s1, image_dates


def identify_s1_layer(coords: Tuple[float, float]) -> str:
    """ Identifies whether to download ascending or descending 
        sentinel 1 orbit based upon predetermined geographic coverage
        
        Reference: https://sentinel.esa.int/web/sentinel/missions/
                   sentinel-1/satellite-description/geographical-coverage
        
        Parameters:
         coords (tuple): 
    
        Returns:
         layer (str): either of SENT, SENT_DESC 
    """
    results = rg.search(coords)
    country = results[-1]['cc']
    continent_name = pc.country_alpha2_to_continent_code(country)
    if continent_name in ['AF', 'OC']:
        layer = "SENT"
    if continent_name in ['SA']:
        if coords[0] > -7.11:
            layer = "SENT"
        else:
            layer = "SENT_DESC"
    if continent_name in ['AS']:
        if coords[0] > 23.3:
            layer = "SENT"
        else:
            layer = "SENT_DESC"
    if continent_name in ['NA']:
        layer = "SENT_DESC"
    print(f"The continent is: {continent_name}, and the sentinel 1 orbit is {layer}")
    return layer

# 2.2 Cloud and shadow removal

In [23]:
def remove_missed_clouds(img: np.ndarray) -> np.ndarray:
    """ Removes clouds that may have been missed by s2cloudless
        by looking at a temporal change outside of IQR
        
        Parameters:
         img (arr): 
    
        Returns:
         to_remove (arr): 
    """
    iqr = np.percentile(img[..., 0].flatten(), 75) - np.percentile(img[..., 0].flatten(), 25)
    thresh_t = np.percentile(img[..., 0].flatten(), 75) + iqr*1.5
    thresh_b = np.percentile(img[..., 0].flatten(), 25) - iqr*1.5
    outlier_percs = []
    for step in range(img.shape[0]):
        bottom = len(np.argwhere(img[step, ..., 0].flatten() > thresh_t))
        top = len(np.argwhere(img[step, ..., 0].flatten() < thresh_b))
        p = 100 * ((bottom + top) / (img.shape[1]*img.shape[2]))
        outlier_percs.append(p)
    to_remove = np.argwhere(np.array(outlier_percs) > 10)
    return to_remove


def calculate_bad_steps(sentinel2: np.ndarray, clouds: np.ndarray) -> np.ndarray:
    """ Calculates the timesteps to remove based upon cloud cover and missing data
        
        Parameters:
         sentinel2 (arr): 
         clouds (arr):
    
        Returns:
         to_remove (arr): 
    """
    n_cloud_px = np.array([len(np.argwhere(clouds[x, ...].reshape((IMSIZE)*(IMSIZE)) > 0.30)) for x in range(clouds.shape[0])])
    cloud_steps = np.argwhere(n_cloud_px > IMSIZE**2 / 10)
    missing_images = [np.argwhere(sentinel2[x, ..., :10].flatten() == 0.0) for x in range(sentinel2.shape[0])]
    missing_images = np.array([len(x) for x in missing_images])
    missing_images_p = [np.argwhere(sentinel2[x, ..., :10].flatten() >= 1) for x in range(sentinel2.shape[0])]
    missing_images_p = np.array([len(x) for x in missing_images_p])
    missing_images += missing_images_p
    missing_images = np.argwhere(missing_images >= (IMSIZE**2) / 20)
    to_remove = np.unique(np.concatenate([cloud_steps.flatten(), missing_images.flatten()]))
    return to_remove

# 2.3 Superresolution

In [24]:
MDL_PATH = "../src/dsen2/models/"

input_shape = ((4, None, None), (6, None, None))
model = s2model(input_shape, num_layers=6, feature_size=128)
predict_file = MDL_PATH+'s2_032_lr_1e-04.hdf5'
print('Symbolic Model Created.')

model.load_weights(predict_file)

def superresolve_tile(arr: np.ndarray, model) -> np.ndarray:
    """Superresolves each 56x56 subtile in a 646x646 input tile
       by padding the subtiles to 64x64 and removing the pad after prediction,
       eliminating boundary artifacts

        Parameters:
         arr (arr): (?, 646, 646, 10) array

        Returns:
         superresolved (arr): (?, 646, 646, 10) array
    """
    print(f"The input array to superresolve is {arr.shape}")
    #superresolved = np.copy(arr)
    tiles = tile_window(646, 646, 56, 56)
    for i in tnrange(len(tiles)):
        subtile = tiles[i]
        to_resolve = arr[:, subtile[0]:subtile[0]+56, subtile[1]:subtile[1]+56, :]
        #to_resolve = np.pad(to_resolve, ((0, 0), (4, 4), (4, 4), (0, 0)), 'reflect')
        resolved = superresolve(
            np.pad(to_resolve, ((0, 0), (4, 4), (4, 4), (0, 0)), 'reflect'),
            model)
        resolved = resolved[:, 4:-4, 4:-4, :]
        arr[:, subtile[0]:subtile[0]+56, subtile[1]:subtile[1]+56] = resolved
    return arr


Symbolic Model Created.


# 2.4 Tiling and folder management functions

In [25]:
def make_output_and_temp_folders(idx: str, output_folder: str = OUTPUT_FOLDER) -> None:
    """Makes necessary folder structures for IO of raw and processed data

        Parameters:
         idx (str)
         output_folder (path)

        Returns:
         None
    """
    def _find_and_make_dirs(dirs):
        if not os.path.exists(os.path.realpath(dirs)):
            os.makedirs(os.path.realpath(dirs))
            
    _find_and_make_dirs(output_folder + "raw/")
    _find_and_make_dirs(output_folder + "raw/clouds/")
    _find_and_make_dirs(output_folder + "raw/s1/")
    _find_and_make_dirs(output_folder + "raw/s2/")
    _find_and_make_dirs(output_folder + "raw/misc/")
    _find_and_make_dirs(output_folder + "processed/")
    _find_and_make_dirs(output_folder + "interim/")

    
def to_int32(array: np.array) -> np.array:
    '''Converts a float32 array to int32, reducing storage costs by three-fold'''
    return np.trunc(array * 65535).astype(np.int32)

def to_int16(array: np.array) -> np.array:
    '''Converts a float32 array to int16, reducing storage costs by three-fold'''
    return np.trunc(array * 65535).astype(np.uint16)

def to_float32(array: np.array) -> np.array:
    #divide = 1. if ) < 100 else 65535
    divide = 1. if isinstance(array.flat[0], np.floating) else 65535
    print(divide, "divide")
    return np.float32(array) / divide
    

def download_large_tile(coord: tuple,
                        step_x: int,
                        step_y: int,
                        folder: str = OUTPUT_FOLDER, 
                        year: int = year,
                        s1_layer: str = "SENT") -> None:
    """Wrapper function to download cloud probs, Sentinel 2, Sentinel 1, and DEM

        Parameters:
         coord (tuple):
         step_x (int):
         step_y (int):
         folder (path):
         year (int):
         s1_layer (str):

        Returns:
         None
    """
    bbx, epsg = calculate_bbx_pyproj(coord, step_x, step_y, expansion = 80)
    dem_bbx, _ = calculate_bbx_pyproj(coord, step_x, step_y, expansion = 90)
    idx = str(step_y) + "_" + str(step_x)
    idx = str(idx)
    make_output_and_temp_folders(idx)

    if not os.path.exists(folder + "output/" + str(step_y*5) + "/" + str(step_x*5) + ".npy"):
        if not os.path.exists(folder + "processed/" + str(step_y*5) + "/" + str(step_x*5) + ".hkl"):
            clouds_file = f'{folder}raw/clouds/clouds_{idx}.hkl'
            shadows_file = f'{folder}raw/clouds/shadows_{idx}.hkl'
            s1_file = f'{folder}raw/s1/{idx}.hkl'
            s1_dates_file = f'{folder}raw/misc/s1_dates_{idx}.hkl'
            s2_file = f'{folder}raw/s2/{idx}.hkl'
            s2_dates_file = f'{folder}raw/misc/s2_dates_{idx}.hkl'
            clean_steps_file = f'{folder}raw/clouds/clean_steps_{idx}.hkl'
            
            if not os.path.exists(clouds_file):
                # All this needs to be int16, copied to cloud with io.save_file
                print(f"Downloading clouds because {clouds_file} does not exist")
                cloud_probs, shadows, clean_steps = identify_clouds(bbx, epsg = epsg)
                hkl.dump(cloud_probs, clouds_file, mode='w', compression='gzip')
                hkl.dump(shadows, shadows_file, mode='w', compression='gzip')
                hkl.dump(clean_steps, clean_steps_file, mode='w', compression='gzip')
            
            if not os.path.exists(s1_file):
                # All this needs to be int16, copied to cloud with io.save_file
                print(f"Downloading S1 because {s1_file} does not exist")
                s1_layer = identify_s1_layer((coord[1], coord[0]))
                s1, s1_dates = download_sentinel_1(bbx, layer = s1_layer, epsg = epsg)
                if s1.shape[0] == 0:
                    s1_layer = "SENT_DESC" if s1_layer == "SENT" else "SENT"
                    print(f'Switching to {s1_layer}')
                    s1, s1_dates = download_sentinel_1(bbx, layer = s1_layer, epsg = epsg)
                s1 = process_sentinel_1_tile(s1, s1_dates)
                hkl.dump(to_int16(s1), s1_file, mode='w', compression='gzip')
                hkl.dump(s1_dates, s1_dates_file, mode='w', compression='gzip')

            if not os.path.exists(s2_file):
                # All this needs to be int16, copied to cloud with io.save_file
                print(f"Downloading S2 because {s2_file} does not exist")
                if 'clean_steps' not in globals() or locals():
                    clean_steps = hkl.load(clean_steps_file)
                s2, s2_dates = download_layer(bbx, clean_steps = clean_steps, epsg = epsg)
                hkl.dump(to_int16(s2), s2_file, mode='w', compression='gzip')
                hkl.dump(s2_dates, s2_dates_file, mode='w', compression='gzip')

            if not os.path.exists(folder + "raw/misc/dem_{}.hkl".format(idx)):
                # All this needs to be int16, copied to cloud with io.save_file
                dem = download_dem(dem_bbx, epsg = epsg)
                hkl.dump(dem, folder + "raw/misc/dem_{}.hkl".format(idx), mode='w', compression='gzip')

In [26]:
def reject_outliers(data, m = 4):
    d = data - np.median(data, axis = (0))
    mdev = np.median(data, axis = 0)
    s = d / mdev
    n_changed = 0
    for x in tnrange(data.shape[1]):
        for y in range(data.shape[2]):
            for band in range(data.shape[3]):
                to_correct = np.where(s[:, x, y, band] > m) 
                data[to_correct, x, y, band] = mdev[x, y, band]
                n_changed += len(to_correct[0])
    print(f"Rejected {n_changed} outliers")
    return data

In [27]:
def process_sentinel_1_tile(sentinel1: np.ndarray, dates: np.ndarray) -> np.ndarray:
    """Converts a (?, X, Y, 2) Sentinel 1 array to (24, X, Y, 2)

        Parameters:
         sentinel1 (np.array):
         dates (np.array):

        Returns:
         s1 (np.array)
    """
    s1, _ = calculate_and_save_best_images(sentinel1, dates)
    biweekly_dates = np.array([day for day in range(0, 360, 5)])
    to_remove = np.argwhere(biweekly_dates % 15 != 0)
    s1 = np.delete(s1, to_remove, 0)
    return s1


def make_folder_names(step_x: int, step_y: int) -> (list, list):
    '''Given an input tile location (step_x, step_y), identify the folder and file
       names for each 5x5 subtile
       
       Parameters:
         step_x (int):
         step_y (int):

        Returns:
         x_vals (list)
         y_vals (list)
    '''
    x_vals = []
    y_vals = []
    for i in range(25):
        y_val = (24 - i) // 5
        x_val = 5 - ((25 - i) % 5)
        x_val = 0 if x_val == 5 else x_val
        x_vals.append(x_val)
        y_vals.append(y_val)
    y_vals = [i + (5*step_y) for i in y_vals]
    x_vals = [i + (5*step_x) for i in x_vals]
    return x_vals, y_vals


def process_large_tile(coord: tuple,
                       step_x: int,
                       step_y: int,
                       folder: str = OUTPUT_FOLDER,
                       model: 'model' = model) -> None:
    '''Wrapper function to interpolate clouds and temporal gaps, superresolve tiles,
       calculate relevant indices, and save analysis-ready data to the output folder
       
       Parameters:
        coord (tuple)
        step_x (int):
        step_y (int):
        foldre (str):

       Returns:
        None
    '''
    idx = str(step_y) + "_" + str(step_x)
    x_vals, y_vals = make_folder_names(step_x, step_y)

    processed = True
    for x, y in zip(x_vals, y_vals):
        folder_path = f"{str(y)}/{str(x)}"
        processed_exists = os.path.exists(folder + "processed/" + folder_path + ".hkl")
        output_exists = os.path.exists(folder + "output/" + folder_path + ".npy")
        if not (processed_exists or output_exists):
            processed = False
    if not processed:
        # All this needs to be converted to float32
        clouds = hkl.load(f'{folder}raw/clouds/clouds_{idx}.hkl')
        sentinel1 = to_float32(hkl.load(f'{folder}raw/s1/{idx}.hkl'))
        radar_dates = hkl.load(f'{folder}raw/misc/s1_dates_{idx}.hkl')
        sentinel2 = to_float32(hkl.load(f'{folder}raw/s2/{idx}.hkl'))
        dem = hkl.load(f'{folder}raw/misc/dem_{idx}.hkl')
        image_dates = hkl.load(f'{folder}raw/misc/s2_dates_{idx}.hkl')
        print(image_dates)
        if os.path.exists(f'{folder}raw/clouds/shadows_{idx}.hkl'):
            shadows = hkl.load(f'{folder}raw/clouds/shadows_{idx}.hkl')
        else:
            print("No shadows file, so calculating shadows with L2A")
            shadows = mcm_shadow_mask(sentinel2, clouds)        
        
        to_remove = calculate_bad_steps(sentinel2, clouds)
        sentinel2 = np.delete(sentinel2, to_remove, axis = 0)
        clouds = np.delete(clouds, to_remove, axis = 0)
        shadows = np.delete(shadows, to_remove, axis = 0)
        image_dates = np.delete(image_dates, to_remove)
        print(f"{len(to_remove)} Cloudy and missing images removed, radar processed: {to_remove}")
        
        to_remove = remove_missed_clouds(sentinel2)
        sentinel2 = np.delete(sentinel2, to_remove, axis = 0)
        clouds = np.delete(clouds, to_remove, axis = 0)
        shadows = np.delete(shadows, to_remove, axis = 0)
        image_dates = np.delete(image_dates, to_remove)
        print(f"{len(to_remove)} missed cloudy images were removed: {to_remove}")
        x, interp = remove_cloud_and_shadows(sentinel2, clouds, shadows, image_dates)
        print("Clouds and shadows interpolated")    
                
        to_remove = np.argwhere(np.mean(interp, axis = (1, 2, 3)) > 0.5)
        print(f"{len(to_remove)} steps removed because of >50% interpolation rate")
        x = np.delete(x, to_remove, axis = 0)
        clouds = np.delete(clouds, to_remove, axis = 0)
        shadows = np.delete(shadows, to_remove, axis = 0)
        image_dates = np.delete(image_dates, to_remove)
                
        index = 0
        print("Super resolving tile")
        x = np.float32(x)
        x = superresolve_tile(x, model)
        
        dem_i = np.tile(dem[np.newaxis, 1:-1, 1:-1, :], (x.shape[0], 1, 1, 1))
        x = np.concatenate([x, dem_i / 90], axis = -1)
        x = evi(x, verbose = True)
        x = bi(x, verbose = True)
        x = msavi2(x, verbose = True)
        x = si(x, verbose = True)

        # spaghetti code to interpolate NA values induced in msavi2 ocassionally
        for x_loc in range(x.shape[1]):
            for y_loc in range(x.shape[2]):
                n_na = np.sum(np.isnan(x[:, x_loc, y_loc, :]), axis = 1)
                for date in range(x.shape[0]):
                    if n_na.flatten()[date] > 0:
                        before, after = calculate_proximal_steps(date, np.argwhere(n_na == 0))
                        x[date, x_loc, y_loc, :] = (x[date + before, x_loc, y_loc] + x[date + after, x_loc, y_loc]) / 2
        
        numb_na = np.sum(np.isnan(x), axis = (1, 2, 3))
        print(numb_na)

        interim_file = f"{folder}interim/{idx}.hkl"
        interim_dates = f"{folder}interim/dates_{idx}.hkl"
        hkl.dump(np.float32(x), interim_file, mode = 'w', compression = 'gzip')
        hkl.dump(image_dates, interim_dates, mode = 'w', compression = 'gzip')

# 2.5 Function execution

In [44]:
downloaded = 0

if not os.path.exists(os.path.realpath(OUTPUT_FOLDER)):
            os.makedirs(os.path.realpath(OUTPUT_FOLDER))
        
print(f"Downloading {year} for {landscape}")

max_x = 50
max_y = 50

for y_tile in range(0, 25):
    for x_tile in range(0, 5):
        #contains = True
        contains = check_contains(coords, x_tile, y_tile, OUTPUT_FOLDER)
        print(y_tile, x_tile, contains, downloaded)
        if contains:
            print(f"Download {downloaded}/{max_x*max_y}; X: {x_tile} Y:{y_tile}")
            downloaded += 1
            download_large_tile(coord = coords, step_x = x_tile, step_y = y_tile)
            process_large_tile(coords, x_tile, y_tile)
            print("\n")

Downloading 2019 for kenya-makueni
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
0 0 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
0 1 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
0 2 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
0 3 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
0 4 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
1 0 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
1 1 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
1 2 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
1 3 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
1 4 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
2 0 False 0
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.g



Cloud_probs used 0.1 processing units
[0.45 0.97 0.38 0.38 0.06 0.88 0.16 0.05 0.   0.03 0.99 0.   0.17 0.13
 0.   0.84 0.   0.23 0.71 0.01 0.05 0.22 0.31 0.44 0.37 0.75 0.46 0.57
 1.   1.   0.58 1.   0.98 1.   0.93 0.03 0.77 0.   0.   0.88 1.   0.99
 0.8  0.89 0.64 0.01 0.56 0.95 0.11 0.09 0.21 0.93 0.   0.4  0.98 0.96
 0.79 0.19 0.65 0.75 0.76 1.   0.6  0.   0.89 0.32 0.55 0.7  0.   0.8 ]


HBox(children=(IntProgress(value=0, max=14), HTML(value='')))




HBox(children=(IntProgress(value=0, max=14), HTML(value='')))


Shadows ((14, 646, 646)) used 5.6 processing units
Downloading S1 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s1/18_3.hkl does not exist
The continent is: AF, and the sentinel 1 orbit is SENT
Sentinel 1 used 57.3 PU for           27 out of 54 images
Maximum time distance: 48
Downloading S2 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s2/18_3.hkl does not exist
Original 20 meter bands size: (14, 323, 323, 6), using 11.1 PU
Original 10 meter bands size: (14, 646, 646, 4), using 29.7 PU
Sentinel 2 used 40.9 PU
DEM used 3.2 processing units
65535 divide
65535 divide
[ 19.  24.  34.  49.  59.  74.  79. 174. 199. 204. 244. 289. 354. 384.]
0 Cloudy and missing images removed, radar processed: []
1 missed cloudy images were removed: [[6]]
Interpolated 36171 px
Clouds and shadows interpolated
0 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (13, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


[0 0 0 0 0 0 0 0 0 0 0 0 0]


../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
18 4 True 1
Download 1/2500; X: 4 Y:18
Downloading clouds because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/clouds/clouds_18_4.hkl does not exist
Cloud_probs used 0.1 processing units
[0.87 0.99 0.47 0.18 0.09 1.   0.29 0.15 0.06 0.04 0.53 0.   0.41 0.12
 0.01 0.57 0.   0.34 0.87 0.01 0.11 0.21 0.18 0.58 0.34 0.69 0.7  0.61
 0.98 1.   0.71 1.   0.73 1.   0.97 0.12 0.91 0.   0.   0.71 0.98 0.67
 0.46 0.65 0.08 0.58 1.   0.21 0.09 0.29 0.54 0.   0.4  0.98 0.73 0.36
 0.58 0.44 0.63 0.95 0.38 0.   0.99 0.23 0.57 0.69 0.06 0.82]


HBox(children=(IntProgress(value=0, max=9), HTML(value='')))




HBox(children=(IntProgress(value=0, max=9), HTML(value='')))


Shadows ((9, 646, 646)) used 3.6 processing units
Downloading S1 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s1/18_4.hkl does not exist
The continent is: AF, and the sentinel 1 orbit is SENT
Sentinel 1 used 57.3 PU for           27 out of 54 images
Maximum time distance: 48
Downloading S2 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s2/18_4.hkl does not exist
Original 20 meter bands size: (9, 323, 323, 6), using 7.2 PU
Original 10 meter bands size: (9, 646, 646, 4), using 19.1 PU
Sentinel 2 used 26.3 PU
DEM used 3.2 processing units
65535 divide
65535 divide
[ 24.  34.  49.  59.  74. 199. 204. 289. 354.]
0 Cloudy and missing images removed, radar processed: []
0 missed cloudy images were removed: []
Interpolated 36829 px
Clouds and shadows interpolated
0 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (9, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


[0 0 0 0 0 0 0 0 0]


../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
19 0 False 2
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
19 1 False 2
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
19 2 True 2
Download 2/2500; X: 2 Y:19
Downloading clouds because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/clouds/clouds_19_2.hkl does not exist
Cloud_probs used 0.1 processing units
[0.53 0.97 0.34 0.28 0.07 0.83 0.1  0.26 0.11 0.03 0.33 0.   0.5  0.31
 0.   0.7  0.   0.17 0.07 0.15 0.   0.31 0.26 0.33 0.18 0.68 0.59 0.65
 1.   1.   0.45 1.   0.98 0.88 0.99 0.   0.95 0.02 0.   0.96 1.   0.99
 0.72 0.61 0.89 0.   0.56 0.98 0.05 0.24 0.36 0.87 0.   0.31 0.99 0.39
 0.23 0.6  0.83 0.97 0.8  1.   0.42 0.   1.   0.04 0.25 0.74 0.62 0.98]


HBox(children=(IntProgress(value=0, max=13), HTML(value='')))




HBox(children=(IntProgress(value=0, max=13), HTML(value='')))


Shadows ((13, 646, 646)) used 5.2 processing units
Downloading S1 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s1/19_2.hkl does not exist
The continent is: AF, and the sentinel 1 orbit is SENT
Sentinel 1 used 57.3 PU for           27 out of 54 images
Maximum time distance: 48
Downloading S2 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s2/19_2.hkl does not exist
Original 20 meter bands size: (13, 323, 323, 6), using 10.3 PU
Original 10 meter bands size: (13, 646, 646, 4), using 27.6 PU
Sentinel 2 used 37.9 PU
DEM used 3.2 processing units
65535 divide
65535 divide
[ 24.  34.  49.  59.  79. 174. 199. 204. 244. 259. 289. 354. 369.]
0 Cloudy and missing images removed, radar processed: []
0 missed cloudy images were removed: []
Interpolated 95605 px
Clouds and shadows interpolated
1 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (12, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


[0 0 0 0 0 0 0 0 0 0 0 0]


../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
19 3 True 3
Download 3/2500; X: 3 Y:19
Downloading clouds because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/clouds/clouds_19_3.hkl does not exist
Cloud_probs used 0.1 processing units
[0.91 0.98 0.3  0.13 0.06 0.71 0.15 0.05 0.12 0.1  0.87 0.   0.08 0.05
 0.   0.33 0.   0.45 0.85 0.19 0.03 0.15 0.3  0.22 0.51 0.94 0.61 0.66
 0.98 1.   0.86 1.   0.96 1.   0.96 0.   0.95 0.08 0.   0.77 1.   0.99
 0.6  0.82 0.89 0.03 0.54 0.99 0.16 0.45 0.24 0.89 0.   0.33 0.99 0.96
 0.45 0.34 0.46 0.6  0.76 1.   0.07 0.   1.   0.08 0.57 0.88 0.07 0.99]


HBox(children=(IntProgress(value=0, max=9), HTML(value='')))




HBox(children=(IntProgress(value=0, max=9), HTML(value='')))


Shadows ((9, 646, 646)) used 3.6 processing units
Downloading S1 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s1/19_3.hkl does not exist
The continent is: AF, and the sentinel 1 orbit is SENT
Sentinel 1 used 57.3 PU for           27 out of 54 images
Maximum time distance: 48
Downloading S2 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s2/19_3.hkl does not exist
Original 20 meter bands size: (9, 323, 323, 6), using 7.2 PU
Original 10 meter bands size: (9, 646, 646, 4), using 19.1 PU
Sentinel 2 used 26.3 PU
DEM used 3.2 processing units
65535 divide
65535 divide
[ 34.  49.  59.  79. 174. 204. 244. 289. 354.]
0 Cloudy and missing images removed, radar processed: []
2 missed cloudy images were removed: [[3]
 [6]]
Interpolated 3036 px
Clouds and shadows interpolated
0 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (7, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


[0 0 0 0 0 0 0]


../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
19 4 True 4
Download 4/2500; X: 4 Y:19
Downloading clouds because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/clouds/clouds_19_4.hkl does not exist
Cloud_probs used 0.1 processing units
[0.99 1.   0.35 0.14 0.26 1.   0.18 0.09 0.1  0.1  0.38 0.   0.2  0.21
 0.   0.41 0.   0.47 0.66 0.03 0.07 0.13 0.3  0.43 0.32 0.44 0.78 0.56
 0.98 1.   0.8  1.   0.47 1.   0.6  0.01 0.92 0.   0.   0.72 1.   0.27
 0.56 0.78 0.08 0.63 0.89 0.13 0.15 0.24 0.56 0.   0.5  0.97 0.7  0.12
 0.33 0.26 0.93 1.   0.27 0.   1.   0.19 0.63 0.6  0.01 0.95]


HBox(children=(IntProgress(value=0, max=10), HTML(value='')))




HBox(children=(IntProgress(value=0, max=10), HTML(value='')))


Shadows ((10, 646, 646)) used 4.0 processing units
Downloading S1 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s1/19_4.hkl does not exist
The continent is: AF, and the sentinel 1 orbit is SENT
Sentinel 1 used 57.3 PU for           27 out of 54 images
Maximum time distance: 48
Downloading S2 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s2/19_4.hkl does not exist
Original 20 meter bands size: (10, 323, 323, 6), using 8.0 PU
Original 10 meter bands size: (10, 646, 646, 4), using 21.2 PU
Sentinel 2 used 29.2 PU
DEM used 3.2 processing units
65535 divide
65535 divide
[ 34.  49.  59.  74. 174. 199. 204. 289. 354. 384.]
0 Cloudy and missing images removed, radar processed: []
1 missed cloudy images were removed: [[3]]
Interpolated 21503 px
Clouds and shadows interpolated
0 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (9, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


[0 0 0 0 0 0 0 0 0]


../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
20 0 True 5
Download 5/2500; X: 0 Y:20
Downloading clouds because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/clouds/clouds_20_0.hkl does not exist
Cloud_probs used 0.1 processing units
[0.95 0.45 0.49 0.03 1.   0.01 0.17 0.16 0.   0.26 0.   0.54 0.45 0.01
 0.84 0.   0.19 0.06 0.13 0.16 0.26 0.39 0.26 0.37 0.08 0.61 0.83 1.
 0.4  0.99 0.87 0.67 1.   0.06 0.91 0.   0.   0.77 0.86 0.95 0.35 0.33
 0.65 0.17 0.8  0.98 0.   0.09 0.13 0.97 0.   0.24 0.91 0.32 0.87 0.53
 0.67 1.   0.79 0.74 0.61 0.   0.97 0.37 0.3  0.55 0.55 0.58 0.98]


HBox(children=(IntProgress(value=0, max=11), HTML(value='')))




HBox(children=(IntProgress(value=0, max=11), HTML(value='')))


Shadows ((11, 646, 646)) used 4.4 processing units
Downloading S1 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s1/20_0.hkl does not exist
The continent is: AF, and the sentinel 1 orbit is SENT
Sentinel 1 used 65.8 PU for           31 out of 61 images
Maximum time distance: 0
Downloading S2 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s2/20_0.hkl does not exist
Original 20 meter bands size: (11, 323, 323, 6), using 8.8 PU
Original 10 meter bands size: (11, 646, 646, 4), using 23.3 PU
Sentinel 2 used 32.1 PU
DEM used 3.2 processing units
65535 divide
65535 divide
[ -1.   9.  24.  34.  49.  59. 199. 204. 259. 289. 354.]
0 Cloudy and missing images removed, radar processed: []
0 missed cloudy images were removed: []
Interpolated 36707 px
Clouds and shadows interpolated
0 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (11, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


[0 0 0 0 0 0 0 0 0 0 0]


../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
20 1 True 6
Download 6/2500; X: 1 Y:20
Downloading clouds because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/clouds/clouds_20_1.hkl does not exist
Cloud_probs used 0.2 processing units
[1.   0.94 0.54 0.46 0.08 0.99 0.01 0.16 0.1  0.06 0.38 0.   0.41 0.28
 0.01 0.8  0.   0.37 0.02 0.01 0.17 0.97 0.25 0.16 0.15 0.3  0.14 0.67
 0.9  0.99 1.   0.2  1.   0.78 1.   1.   0.   1.   0.93 0.   0.   0.95
 1.   0.96 0.7  0.5  0.83 0.03 0.66 1.   0.02 0.11 0.19 0.92 0.   0.21
 0.99 0.29 0.39 0.81 0.42 1.   0.96 0.77 1.   0.67 0.   0.98 0.96 0.28
 0.48 0.76 0.12 0.96]


HBox(children=(IntProgress(value=0, max=13), HTML(value='')))




HBox(children=(IntProgress(value=0, max=13), HTML(value='')))


Shadows ((13, 646, 646)) used 5.2 processing units
Downloading S1 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s1/20_1.hkl does not exist
The continent is: AF, and the sentinel 1 orbit is SENT
Sentinel 1 used 57.3 PU for           27 out of 54 images
Maximum time distance: 48
Downloading S2 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s2/20_1.hkl does not exist
Original 20 meter bands size: (13, 323, 323, 6), using 10.3 PU
Original 10 meter bands size: (13, 646, 646, 4), using 27.6 PU
Sentinel 2 used 37.9 PU
DEM used 3.2 processing units
65535 divide
65535 divide
[  9.  34.  49.  59.  69.  74. 174. 199. 204. 244. 259. 289. 354.]
0 Cloudy and missing images removed, radar processed: []
0 missed cloudy images were removed: []
Interpolated 44336 px
Clouds and shadows interpolated
0 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (13, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


[0 0 0 0 0 0 0 0 0 0 0 0 0]


../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
20 2 True 7
Download 7/2500; X: 2 Y:20
Downloading clouds because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/clouds/clouds_20_2.hkl does not exist
Cloud_probs used 0.2 processing units
[0.86 1.   0.11 0.31 0.08 0.9  0.11 0.5  0.2  0.12 0.64 0.   0.51 0.04
 0.01 0.74 0.   0.59 0.02 0.05 0.11 0.93 0.22 0.52 0.13 0.4  0.35 0.73
 0.8  0.95 1.   0.58 1.   0.86 1.   1.   0.   1.   0.97 0.05 0.   0.94
 1.   0.97 0.71 0.75 0.9  0.01 0.76 0.99 0.01 0.24 0.29 0.99 0.   0.2
 1.   0.39 0.43 0.53 0.12 1.   1.   0.88 1.   0.44 0.   0.97 1.   0.02
 0.53 0.5  0.2  1.  ]


HBox(children=(IntProgress(value=0, max=13), HTML(value='')))




HBox(children=(IntProgress(value=0, max=13), HTML(value='')))


Shadows ((13, 646, 646)) used 5.2 processing units
Downloading S1 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s1/20_2.hkl does not exist
The continent is: AF, and the sentinel 1 orbit is SENT
Sentinel 1 used 57.3 PU for           27 out of 54 images
Maximum time distance: 48
Downloading S2 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s2/20_2.hkl does not exist
Original 20 meter bands size: (13, 323, 323, 6), using 10.3 PU
Original 10 meter bands size: (13, 646, 646, 4), using 27.6 PU
Sentinel 2 used 37.9 PU
DEM used 3.2 processing units
65535 divide
65535 divide
[ 34.  44.  49.  59.  69. 174. 199. 204. 244. 259. 289. 354. 369.]
0 Cloudy and missing images removed, radar processed: []
0 missed cloudy images were removed: []
Interpolated 90522 px
Clouds and shadows interpolated
0 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (13, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


[0 0 0 0 0 0 0 0 0 0 0 0 0]


../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
20 3 True 8
Download 8/2500; X: 3 Y:20
Downloading clouds because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/clouds/clouds_20_3.hkl does not exist
Cloud_probs used 0.2 processing units
[0.84 0.95 0.35 0.16 0.06 0.98 0.06 0.1  0.07 0.1  0.29 0.   0.26 0.1
 0.   0.46 0.   0.15 0.15 0.12 0.02 0.93 0.19 0.25 0.34 0.32 0.4  0.74
 0.54 0.92 1.   0.94 1.   0.56 1.   0.92 0.05 1.   0.97 0.18 0.   0.77
 1.   0.94 0.48 0.91 0.95 0.03 0.83 0.9  0.06 0.27 0.19 1.   0.   0.32
 0.99 0.98 0.77 0.47 0.09 1.   0.89 0.81 1.   0.25 0.   1.   1.   0.06
 0.47 0.8  0.22 0.94]


HBox(children=(IntProgress(value=0, max=9), HTML(value='')))




HBox(children=(IntProgress(value=0, max=9), HTML(value='')))


Shadows ((9, 646, 646)) used 3.6 processing units
Downloading S1 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s1/20_3.hkl does not exist
The continent is: AF, and the sentinel 1 orbit is SENT
Sentinel 1 used 57.3 PU for           27 out of 54 images
Maximum time distance: 48
Downloading S2 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s2/20_3.hkl does not exist
Original 20 meter bands size: (9, 323, 323, 6), using 7.2 PU
Original 10 meter bands size: (9, 646, 646, 4), using 19.1 PU
Sentinel 2 used 26.3 PU
DEM used 3.2 processing units
65535 divide
65535 divide
[ 34.  49.  59.  79. 174. 204. 244. 289. 354.]
0 Cloudy and missing images removed, radar processed: []
2 missed cloudy images were removed: [[3]
 [6]]
Interpolated 15641 px
Clouds and shadows interpolated
0 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (7, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


[0 0 0 0 0 0 0]


../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
20 4 True 9
Download 9/2500; X: 4 Y:20
Downloading clouds because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/clouds/clouds_20_4.hkl does not exist
Cloud_probs used 0.1 processing units
[0.93 0.94 0.15 0.51 0.14 1.   0.18 0.04 0.01 0.32 0.28 0.   0.24 0.23
 0.   0.33 0.   0.39 0.04 0.04 0.08 0.93 0.05 0.31 0.38 0.25 0.45 0.74
 0.68 0.89 1.   0.8  1.   0.43 1.   0.37 0.05 1.   0.87 0.05 0.   0.75
 0.98 0.07 0.76 0.76 0.03 0.56 0.75 0.06 0.15 0.14 0.89 0.   0.25 0.99
 0.8  0.22 0.27 1.   0.49 0.96 1.   0.28 0.   1.   0.13 0.77 0.42 0.09
 0.99]


HBox(children=(IntProgress(value=0, max=13), HTML(value='')))




HBox(children=(IntProgress(value=0, max=13), HTML(value='')))


Shadows ((13, 646, 646)) used 5.2 processing units
Downloading S1 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s1/20_4.hkl does not exist
The continent is: AF, and the sentinel 1 orbit is SENT
Sentinel 1 used 57.3 PU for           27 out of 54 images
Maximum time distance: 48
Downloading S2 because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/s2/20_4.hkl does not exist
Original 20 meter bands size: (13, 323, 323, 6), using 10.3 PU
Original 10 meter bands size: (13, 646, 646, 4), using 27.6 PU
Sentinel 2 used 37.9 PU
DEM used 3.2 processing units
65535 divide
65535 divide
[ 14.  19.  34.  49.  59.  69.  74. 174. 199. 204. 244. 289. 354.]
0 Cloudy and missing images removed, radar processed: []
0 missed cloudy images were removed: []
Interpolated 97430 px
Clouds and shadows interpolated
0 steps removed because of >50% interpolation rate
Super resolving tile
The input array to superresolve is (13, 646, 646, 10)


HBox(children=(IntProgress(value=0, max=144), HTML(value='')))


[0 0 0 0 0 0 0 0 0 0 0 0 0]


../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
21 0 True 10
Download 10/2500; X: 0 Y:21
Downloading clouds because ../project-monitoring//kenya/makueni/mtito-andei/2019/raw/clouds/clouds_21_0.hkl does not exist


KeyboardInterrupt: 

In [45]:
INPUT_FOLDER = "/".join(OUTPUT_FOLDER.split("/")[:-2]) + "/"
def process_multiple_years(coord: tuple,
                       step_x: int,
                       step_y: int,
                       path: str = INPUT_FOLDER) -> None:
    '''Wrapper function to interpolate clouds and temporal gaps, superresolve tiles,
       calculate relevant indices, and save analysis-ready data to the output folder
       
       Parameters:
        coord (tuple)
        step_x (int):
        step_y (int):
        folder (str):

       Returns:
        None
    '''

    idx = str(step_y) + "_" + str(step_x)
    x_vals, y_vals = make_folder_names(step_x, step_y)
    
    d2017 = hkl.load(f"{path}/2017/interim/dates_{idx}.hkl")
    d2018 = hkl.load(f"{path}/2018/interim/dates_{idx}.hkl")
    d2019 = hkl.load(f"{path}/2019/interim/dates_{idx}.hkl")
    
    x2017 = hkl.load(f"{path}/2017/interim/{idx}.hkl").astype(np.float32)
    x2018 = hkl.load(f"{path}/2018/interim/{idx}.hkl").astype(np.float32)
    x2019 = hkl.load(f"{path}/2019/interim/{idx}.hkl").astype(np.float32)
  
    s1_all = np.empty((72, 646, 646, 2))
    s1_2017 = hkl.load(f"{path}/2017/raw/s1/{idx}.hkl")
    s1_all[:24] = s1_2017
    s1_2018 = hkl.load(f"{path}2018/raw/s1/{idx}.hkl")
    s1_all[24:48] = s1_2018
    s1_2019 = hkl.load(f"{path}2019/raw/s1/{idx}.hkl")
    s1_all[48:] = s1_2019
    

    index = 0
    tiles = tile_window(IMSIZE, IMSIZE, window_size = 142)
    for t in tiles:
        start_x, start_y = t[0], t[1]
        end_x = start_x + t[2]
        end_y = start_y + t[3]
        s2017 = x2017[:, start_x:end_x, start_y:end_y, :]
        s2018 = x2018[:, start_x:end_x, start_y:end_y, :]
        s2019 = x2019[:, start_x:end_x, start_y:end_y, :]
        s2017, _  = calculate_and_save_best_images(s2017, d2017)
        s2018, _ = calculate_and_save_best_images(s2018, d2018)
        s2019, _ = calculate_and_save_best_images(s2019, d2019)
        subtile = np.empty((72*3, 142, 142, 15))
        subtile[:72] = s2017
        subtile[72:144] = s2018
        subtile[144:] = s2019
        print(np.sum(np.isnan(subtile), axis = (1, 2, 3)))
        out_17 = f"{path}/2017/processed/{y_vals[index]}/{x_vals[index]}.hkl"
        out_18 = f"{path}/2018/processed/{y_vals[index]}/{x_vals[index]}.hkl"
        out_19 = f"{path}/2019/processed/{y_vals[index]}/{x_vals[index]}.hkl"
        
        index += 1
        print(f"{index}: The output file is {out_17}")
        subtile = interpolate_array(subtile, dim = 142)
        subtile = np.concatenate([subtile, s1_all[:, start_x:end_x, start_y:end_y, :]], axis = -1)
        for folder in [out_17, out_18, out_19]:
            output_folder = "/".join(folder.split("/")[:-1])
            if not os.path.exists(os.path.realpath(output_folder)):
                os.makedirs(os.path.realpath(output_folder))
        subtile = to_int32(subtile)
        assert subtile.shape[1] == 142, f"subtile shape is {subtile.shape}"
        
        hkl.dump(subtile[:24], out_17, mode='w', compression='gzip')
        hkl.dump(subtile[24:48], out_18, mode='w', compression='gzip')
        hkl.dump(subtile[48:], out_19, mode='w', compression='gzip')

In [18]:
max_x = 1
max_y = 1

for x_tile in range(1, 2):
    for y_tile in range(0, max_y):
        contains = True
        #contains = check_contains(coords, x_tile, y_tile, OUTPUT_FOLDER)
        if contains:
            process_multiple_years(coords, x_tile, y_tile)
            print("\n")

KeyboardInterrupt: 

In [49]:
INPUT_FOLDER = "/".join(OUTPUT_FOLDER.split("/")[:-2]) + "/"
def process_single_year(coord: tuple,
                       step_x: int,
                       step_y: int,
                       year = 2019,
                       path: str = INPUT_FOLDER,
                       delete = False) -> None:
    '''Wrapper function to interpolate clouds and temporal gaps, superresolve tiles,
       calculate relevant indices, and save analysis-ready data to the output folder
       
       Parameters:
        coord (tuple)
        step_x (int):
        step_y (int):
        folder (str):

       Returns:
        None
    '''

    idx = str(step_y) + "_" + str(step_x)
    x_vals, y_vals = make_folder_names(step_x, step_y)
    d2019 = hkl.load(f"{path}/{year}/interim/dates_{idx}.hkl")
    x2019 = hkl.load(f"{path}/{year}/interim/{idx}.hkl").astype(np.float32)
    s1_2019 = hkl.load(f"{path}/{year}/raw/s1/{idx}.hkl")
    

    index = 0
    tiles = tile_window(IMSIZE, IMSIZE, window_size = 142)
    for t in tiles:
        start_x, start_y = t[0], t[1]
        end_x = start_x + t[2]
        end_y = start_y + t[3]
        s2019 = x2019[:, start_x:end_x, start_y:end_y, :]
        subtile, _ = calculate_and_save_best_images(s2019, d2019)
        print(np.sum(np.isnan(subtile), axis = (1, 2, 3)))
        out_19 = f"{path}/{year}/processed/{y_vals[index]}/{x_vals[index]}.hkl"
        
        index += 1
        print(f"{index}: The output file is {out_19}")
        sm = Smoother(lmbd = 800, size = subtile.shape[0], nbands = 14, dim = subtile.shape[1])
        subtile = sm.interpolate_array(subtile)
        subtile = np.concatenate([subtile, s1_2019[:, start_x:end_x, start_y:end_y, :]], axis = -1)
       # for folder in [out_17, out_18, out_19]:
        output_folder = "/".join(out_19.split("/")[:-1])
        if not os.path.exists(os.path.realpath(output_folder)):
            os.makedirs(os.path.realpath(output_folder))
        subtile = np.float32(subtile)
        print(np.max(subtile))
        #subtile = to_int32(subtile)
        #print(np.max(subtile))
        assert subtile.shape[1] == 142, f"subtile shape is {subtile.shape}"
        
        hkl.dump(subtile, out_19, mode='w', compression='gzip')
    if delete:
        os.remove(f"{path}/{year}/interim/{idx}.hkl")

In [52]:
max_x = 5
max_y = 25

for x_tile in range(0, 1):
    for y_tile in range(19, 21):
        #contains = True
        contains = check_contains(coords, x_tile, y_tile, OUTPUT_FOLDER)
        if contains:
            process_single_year(coords, x_tile, y_tile, year = 2019, delete = True)
            print("\n")

../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson
../project-monitoring//kenya/makueni/mtito-andei/2019/makueni.geojson


OSError: Unable to open file (unable to open file: name = '../project-monitoring//kenya/makueni/mtito-andei//2019/interim/20_0.hkl', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)