## Note: This notebook is deprecated, as it uses UTM instead of WGS84

If you are planning to download new Sentinel data, you need to have an API key to use the data provider [Sentinel Hub](https://www.sentinel-hub.com). If you do not have an API key but have access to sentinel imagery, the input data for this notebook is an entire year of:
  * Cloud masks
  * L1C bands 2, 8A, 11
  * 10- and 20m L2A bands
  * VV-VH Sentinel 1 bands
  * Digital elevation model
  
  
The data are tiled into 6300m x 6300m windows. An example of the raw data can be downloaded by running the following cell. This data can be preprocessed (cloud interpolation, super resolution, smoothing, etcetera) by running the rest of the notebook. It can then also be predicted by running `4b-predict-large-area`.

## Processing units per tile

There are an average of 21.3 dates per year across the training data. 

- Cloud probabilities: 0.2
- Shadows: 2.4
- S210: 45.3
- S220: 16.3
- S1: 12
- DEM: 3
- Total: 79



# 1.0 Package Imports

In [1]:
import pandas as pd
import numpy as np
from random import shuffle
from osgeo import ogr, osr
from sentinelhub import WmsRequest, WcsRequest, MimeType, CRS, BBox, constants
import logging
from collections import Counter
import datetime
import os
import yaml
from sentinelhub import DataSource
import scipy.sparse as sparse
from scipy.sparse.linalg import splu
from skimage.transform import resize
from sentinelhub import CustomUrlParam
from time import time as timer
import multiprocessing
import math
import reverse_geocoder as rg
import pycountry
import pycountry_convert as pc
import hickle as hkl
from shapely.geometry import Point, Polygon
import geopandas
from tqdm import tnrange, tqdm_notebook
import math
import boto3
from pyproj import Proj, transform
from timeit import default_timer as timer
from typing import Tuple, List
import warnings
from scipy.ndimage import median_filter

In [2]:
if os.path.exists("../config.yaml"):
    with open("../config.yaml", 'r') as stream:
        key = (yaml.safe_load(stream))
        API_KEY = key['key']
        AWSKEY = key['awskey']
        AWSSECRET = key['awssecret']
else:
    API_KEY = "none"

In [3]:
%run ../src/preprocessing/slope.py
%run ../src/preprocessing/indices.py
%run ../src/downloading/utils.py
%run ../src/preprocessing/cloud_removal.py
%run ../src/preprocessing/whittaker_smoother.py
%run ../src/io/upload.py

# 1.1 Constants and Parameters

Years can vary from 2017 to 2020. The value of `landscape` pulls coordinates from `../project-monitoring/database.csv`

In [4]:
year = 2019
landscape = 'cameroon-mogazang'

if year > 2017:
    dates = (f'{str(year - 1)}-11-15' , f'{str(year + 1)}-02-15')
else: 
    dates = (f'{str(year)}-01-01' , f'{str(year + 1)}-02-15')
    
dates_sentinel_1 = (f'{str(year)}-01-01' , f'{str(year)}-12-31')
SIZE = 9*5
IMSIZE = (7*2) + (SIZE * 14)+2 # process 6320 x 6320 m blocks

days_per_month = [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30]
starting_days = np.cumsum(days_per_month)

In [5]:
database = pd.read_csv("../project-monitoring/database.csv")
coords = database[database['landscape'] == landscape]
path = coords['path'].tolist()[0]
coords = (float(coords['longitude']), float(coords['latitude']))

IO_PARAMS = {'prefix': '../',
             'bucket': 'restoration-monitoring',
             'coords': coords,
             'bucket-prefix': '',
             'path': path}

OUTPUT_FOLDER = IO_PARAMS['prefix'] + IO_PARAMS['path'] + str(year) + '/'
print(coords, OUTPUT_FOLDER)

(14.210755, 10.633414) ../project-monitoring/cameroon/far-north/maroua/2019/


In [6]:
append = False
if append:
    to_append = pd.DataFrame({'landscape': ['rwanda-country'],
                                 'latitude': ['-2.91'],
                                 'longitude': ['29.02'],
                                 'path': [get_folder_prefix((-2.91, 29.02),
                                          params = {'bucket-prefix': 'project-monitoring'})]})
    database = database.append([to_append])
    database.to_csv("../project-monitoring/database.csv", index = False)

In [7]:
from functools import wraps
from time import time

def timing(f):
    @wraps(f)
    def wrap(*args, **kw):
        ts = time()
        result = f(*args, **kw)
        te = time()
        print(f'{f.__name__}, {np.around(te-ts, 2)}')
        return result
    return wrap

# 2.1 Data download functions

If using Sentinel hub, identify the following layers:
  * CLOUD: return [CLP / 255]
  * SHADOW: return [B02, B8A, B11]
  * DEM: return [DEM]
  * SENT: return [VV, VH]
  * L2A10: return [B02,B03,B04, B08]
  * L2A20: return [B05,B06,B07, B8A,B11,B12]
  
The following code block contains:
  * `extract_dates` - return a list of calendar dates of imagery
  * `to_int16` convert a floating point array to uint16
  * `to_float32` convert a uint16 array ot float32
  * `make_folder_names`

In [8]:
def extract_dates(date_dict: dict, year: int) -> List:
    """ Transforms a SentinelHub date dictionary to a
         list of integer calendar dates
    """
    dates = []
    days_per_month = [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30]
    starting_days = np.cumsum(days_per_month)
    for date in date_dict:
        if date.year == year - 1:
            dates.append(-365 + starting_days[(date.month-1)] + date.day)
        if date.year == year:
            dates.append(starting_days[(date.month-1)] + date.day)
        if date.year == year + 1:
            dates.append(365 + starting_days[(date.month-1)]+date.day)
    return dates

def to_int16(array: np.array) -> np.array:
    '''Converts a float32 array to uint16, reducing storage costs by three-fold'''
    assert np.min(array) >= 0, np.min(array)
    assert np.max(array) <= 1, np.max(array)
    
    array = np.clip(array, 0, 1)
    array = np.trunc(array * 65535)
    assert np.min(array >= 0)
    assert np.max(array <= 65535)
    
    return array.astype(np.uint16)

@timing
def to_float32(array: np.array) -> np.array:
    """Converts an int_x array to float32"""
    print(f'The original max value is {np.max(array)}')
    if not isinstance(array.flat[0], np.floating):
        assert np.max(array) > 1
        array = np.float32(array) / 65535.
    assert np.max(array) <= 1
    assert array.dtype == np.float32
    return array

def make_folder_names(step_x: int, step_y: int) -> (list, list):
    '''Given an input tile location (step_x, step_y), identify the folder and file
       names for each 5x5 subtile
       
       Parameters:
         step_x (int):
         step_y (int):

        Returns:
         x_vals (list)
         y_vals (list)
    '''
    x_vals = []
    y_vals = []
    for i in range(25):
        y_val = (24 - i) // 5
        x_val = 5 - ((25 - i) % 5)
        x_val = 0 if x_val == 5 else x_val
        x_vals.append(x_val)
        y_vals.append(y_val)
    y_vals = [i + (5*step_y) for i in y_vals]
    x_vals = [i + (5*step_x) for i in x_vals]
    return x_vals, y_vals

### Cloud and cloud shadow

Identify clouds and cloud shadow using s2cloudless and Candra et al 2020.
Returns cloud, shadow masks and a list of imagery dates that have <15% cloud/shadow cover

In [9]:
@timing
def identify_clouds(bbox: List[Tuple[float, float]],
                epsg: 'CRS', dates: dict = dates) -> (np.ndarray, np.ndarray, np.ndarray):
    """ Downloads and calculates cloud cover and shadow
        
        Parameters:
         bbox (list): output of calc_bbox
         epsg (float): EPSG associated with bbox 
         dates (tuple): YY-MM-DD - YY-MM-DD bounds for downloading 
    
        Returns:
         cloud_img (np.array):
         shadows (np.array): 
         clean_steps (np.array):
    """
    # Download 160 x 160 meter cloud masks, 0 - 255
    box = BBox(bbox, crs = epsg)
    cloud_request = WcsRequest(
        layer='CLOUD_NEW',
        bbox=box, time=dates,
        resx='160m',resy='160m',
        image_format = MimeType.TIFF_d8,
        maxcc=0.75, instance_id=API_KEY,
        custom_url_params = {constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
        time_difference=datetime.timedelta(hours=72),
    )

    # Download 60 x 60 meter bands for shadow masking, 0 - 65535
    shadow_request = WcsRequest(
        layer='SHADOW',
        bbox=box, time=dates,
        resx='60m', resy='60m',
        image_format = MimeType.TIFF_d16,
        maxcc=0.75, instance_id=API_KEY,
        custom_url_params = {constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
        time_difference=datetime.timedelta(hours=72))
    
    # Convert to np.array, upscale to IMSIZE
    cloud_img = np.array(cloud_request.get_data())
    cloud_img = resize(cloud_img, (cloud_img.shape[0], IMSIZE, IMSIZE), order = 0,
                       anti_aliasing = False,
                       preserve_range = True).astype(np.uint8)
    
    # Identify steps with at least 15% cloud cover
    n_cloud_px = np.sum(cloud_img > int(0.33 * 255), axis = (1, 2))
    cloud_steps = np.argwhere(n_cloud_px > (IMSIZE**2 * 0.15))
    clean_steps = [x for x in range(cloud_img.shape[0]) if x not in cloud_steps]
    cloud_img = np.delete(cloud_img, cloud_steps, 0)
    
    # Align cloud and shadow imagery dates
    cloud_dates_dict = [x for x in cloud_request.get_dates()]
    cloud_dates = extract_dates(cloud_dates_dict, year)
    cloud_dates = [val for idx, val in enumerate(cloud_dates) if idx in clean_steps]
    shadow_dates_dict = [x for x in shadow_request.get_dates()]
    shadow_dates = extract_dates(shadow_dates_dict, year)
    shadow_steps = [idx for idx, val in enumerate(shadow_dates) if val in cloud_dates]
    
    # Convert to np.array, upscale shadow to IMSIZE
    shadow_img = np.array(shadow_request.get_data(data_filter = shadow_steps))
    shadow_pus = (shadow_img.shape[1]*shadow_img.shape[2])/(512*512) * shadow_img.shape[0] * (6 / 3)
    shadow_img = shadow_img.repeat(6,axis=1).repeat(6,axis=2)
    shadow_img = shadow_img[:, 1:-1, 1:-1, :]
    
    # Type assertions, size assertions
    if not isinstance(cloud_img.flat[0], np.floating):
        assert np.max(cloud_img) > 1
        cloud_img = np.float32(cloud_img) / 255.
    assert np.max(cloud_img) <= 1
    assert cloud_img.dtype == np.float32
    assert shadow_img.dtype == np.uint16
    assert shadow_img.shape[0] == cloud_img.shape[0], (shadow_img.shape, cloud_img.shape)
    
    # Calculate shadow+cloud masks with multitemporal images (Candra et al. 2020)
    print(f"Shadows ({shadow_img.shape}) used {round(shadow_pus, 1)} processing units")
    shadows = mcm_shadow_mask(shadow_img, cloud_img)
    
    return cloud_img, shadows, clean_steps, np.array(cloud_dates)

### Digital elevation model, slope

In [10]:
@timing
def download_dem(bbox: List[Tuple[float, float]], epsg: 'CRS') -> np.ndarray:
    """ Downloads the DEM layer from Sentinel hub
        
        Parameters:
         bbox (list): output of calc_bbox
         epsg (float): EPSG associated with bbox 
    
        Returns:
         dem_image (arr):
    """
    # Download imagery
    box = BBox(bbox, crs = epsg)
    dem_size = 650
    dem_request = WmsRequest(data_source=DataSource.DEM,
                         layer='DEM', bbox=box,
                         width=dem_size, height=dem_size,
                         instance_id=API_KEY,
                         image_format=MimeType.TIFF_d32f,
                         custom_url_params={CustomUrlParam.SHOWLOGO: False})
    dem_image = dem_request.get_data()[0]
    
    # Calculate median filter, slopde
    dem_image = median_filter(dem_image, size = 5)
    dem_image = calcSlope(dem_image.reshape((1, dem_size, dem_size)),
                          np.full((dem_size, dem_size), 10), 
                          np.full((dem_size, dem_size), 10), zScale = 1, minSlope = 0.02)
    dem_image = dem_image.reshape((dem_size,dem_size, 1))
    dem_image = dem_image[1:dem_size-1, 1:dem_size-1, :]
    print(f"DEM used {round(((IMSIZE*IMSIZE)/(512*512))*2, 1)} processing units")
    return dem_image

###  Sentinel 2 L2A, 10 and 20 meter bands

In [11]:
@timing
def download_layer(bbox: List[Tuple[float, float]],
                   clean_steps: np.ndarray, epsg: 'CRS',
                   dates: dict = dates, year: int = year) -> (np.ndarray, np.ndarray):
    """ Downloads the L2A sentinel layer with 10 and 20 meter bands
        
        Parameters:
         bbox (list): output of calc_bbox
         clean_steps (list): list of steps to filter download request
         epsg (float): EPSG associated with bbox 
         time (tuple): YY-MM-DD - YY-MM-DD bounds for downloading 
    
        Returns:
         img (arr):
         img_request (obj): 
    """
    
    # Download 20 meter bands
    box = BBox(bbox, crs = epsg)
    image_request = WcsRequest(
            layer='L2A20',
            bbox=box, time=dates,
            image_format = MimeType.TIFF_d16,
            maxcc=0.75, resx='20m', resy='20m',
            instance_id=API_KEY,
            custom_url_params = {constants.CustomUrlParam.DOWNSAMPLING: 'NEAREST',
                                constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
            time_difference=datetime.timedelta(hours=72),
        )
    image_dates_dict = [x for x in image_request.get_dates()]
    image_dates = extract_dates(image_dates_dict, year)
    steps_to_download = [i for i, val in enumerate(image_dates) if val in clean_steps]
    dates_to_download = [val for i, val in enumerate(image_dates) if val in clean_steps]
    img_20 = np.array(image_request.get_data(data_filter = steps_to_download))
    s2_20_usage = (img_20.shape[1]*img_20.shape[2])/(512*512) * (6/3) * img_20.shape[0]
    
    # Convert 20m bands to np.float32, ensure correct dimensions
    if not isinstance(img_20.flat[0], np.floating):
        print(f"Converting S2, 20m to float32, with {np.max(img_20)} max and"
              f" {len(np.unique(img_20))} unique values")
        assert np.max(img_20) > 1
        img_20 = np.float32(img_20) / 65535.
        assert np.max(img_20) <= 1
        assert img_20.dtype == np.float32
    
    print(f"Original 20 meter bands size: {img_20.shape}, using {round(s2_20_usage, 1)} PU")
    if img_20.shape[2]*img_20.shape[2] != 323*323:
        print(f"Reshaping: {img_20.shape}")
        img_20 = resize(img_20, (img_20.shape[0], 323, 323, img_20.shape[-1]), order = 0)

    # Download 10 meter bands
    image_request = WcsRequest(
            layer='L2A10',
            bbox=box, time=dates,
            image_format = MimeType.TIFF_d16,
            maxcc=0.75, resx='10m', resy='10m',
            instance_id=API_KEY,
            custom_url_params = {constants.CustomUrlParam.DOWNSAMPLING: 'BICUBIC',
                                constants.CustomUrlParam.UPSAMPLING: 'BICUBIC'},
            time_difference=datetime.timedelta(hours=72),
    )
    img_10 = np.array(image_request.get_data(data_filter = steps_to_download))
    s2_10_usage = (img_10.shape[1]*img_10.shape[2])/(512*512) * (4/3) * img_10.shape[0]
    
    # Convert 10 meter bands to np.float32, ensure correct dimensions
    if not isinstance(img_10.flat[0], np.floating):
        print(f"Converting S2, 10m to float32, with {np.max(img_10)} max and"
                  f" {s2_10_usage} PU")
        assert np.max(img_10) > 1
        img_10 = np.float32(img_10) / 65535.
        assert np.max(img_10) <= 1
        assert img_10.dtype == np.float32

    if img_10.shape[2]*img_10.shape[1] != IMSIZE*IMSIZE:
        print(f"Reshaping: {img_10.shape}")
        img_10 = resize(img_10, (img_10.shape[0], IMSIZE, IMSIZE, img_10.shape[-1]), order = 0)
    
    # Ensure output is within correct range
    img_10 = np.clip(img_10, 0, 1)
    img_20 = np.clip(img_20, 0, 1)
    return img_10, img_20, np.array(dates_to_download)

### Sentinel 1 IW bands

In [12]:
def identify_dates_to_download(dates):
    dates = np.array(dates)
    dates_to_download = []
    for i in starting_days:
        s1_month = dates[dates > i]
        s1_month = s1_month[s1_month < (i + 30)]
        if len(s1_month) > 0:
            dates_to_download.append(s1_month[0])
    return dates_to_download

@timing
def download_sentinel_1(bbox: List[Tuple[float, float]],
                        epsg: 'CRS', imsize: int = IMSIZE, 
                        dates: dict = dates_sentinel_1, layer: str = "SENT",
                        year: int = year) -> (np.ndarray, np.ndarray):
    """ Downloads the GRD Sentinel 1 VV-VH layer from Sentinel Hub
        
        Parameters:
         bbox (list): output of calc_bbox
         epsg (float): EPSG associated with bbox 
         imsize (int):
         dates (tuple): YY-MM-DD - YY-MM-DD bounds for downloading 
         layer (str):
         year (int): 
    
        Returns:
         s1 (arr):
         image_dates (arr): 
    """
    # Identify the S1 orbit, imagery dates
    source = DataSource.SENTINEL1_IW_DES if layer == "SENT_DESC" else DataSource.SENTINEL1_IW_ASC
    box = BBox(bbox, crs = epsg)
    image_request = WcsRequest(
            layer=layer, bbox=box,
            time=dates,
            image_format = MimeType.TIFF_d16,
            data_source=source, maxcc=1.0,
            resx='10m', resy='10m',
            instance_id=API_KEY,
            custom_url_params = {constants.CustomUrlParam.DOWNSAMPLING: 'NEAREST',
                                constants.CustomUrlParam.UPSAMPLING: 'NEAREST'},
            time_difference=datetime.timedelta(hours=72),
        )
    
    s1_dates_dict = [x for x in image_request.get_dates()]
    s1_dates = extract_dates(s1_dates_dict, year)
    dates_to_download = identify_dates_to_download(s1_dates)
    steps_to_download = [i for i, val in enumerate(s1_dates) if val in dates_to_download]
    print(f"The following dates will be downloaded: {dates_to_download}")
    data_filter = steps_to_download   
    
    # If the correct orbit is selected, download imagery
    if len(image_request.download_list) > 0:
        s1 = np.array(image_request.get_data(data_filter = data_filter))
        print(f'The original s1 max value is {np.max(s1)}')
        if not isinstance(s1.flat[0], np.floating):
            assert np.max(s1) > 1
            s1 = np.float32(s1) / 65535.
        assert np.max(s1) <= 1

        s1_usage = (2/3) * s1.shape[0] * ((s1.shape[1]*s1.shape[2]) / (512*512))
        print(f"Sentinel 1 used {round(s1_usage, 1)} PU for "
              f" {s1.shape[0]} out of {len(image_request.download_list)} images")

        image_dates_dict = [x for x in image_request.get_dates()]
        image_dates = extract_dates(image_dates_dict, year)
        image_dates = [val for idx, val in enumerate(image_dates) if idx in data_filter]
        image_dates = np.array(image_dates)

        s1c = np.copy(s1)
        s1c[np.where(s1c < 1.)] = 0
        s1c[np.where(s1c >= 1.)] = 1.
        n_pix_oob = np.sum(s1c, axis = (1, 2, 3))
        to_remove = np.argwhere(n_pix_oob > (imsize*2*imsize*2)/10)
        s1 = np.delete(s1, to_remove, 0)
        image_dates = np.delete(image_dates, to_remove)
        
        s1_med = np.median(s1, axis = 0)
        s1_med = np.tile(s1_med[np.newaxis, ...], (s1.shape[0], 1, 1, 1,))
        s1[s1 == 1] = s1_med[s1 == 1]        
        s1 = np.clip(s1, 0, 1)
        return s1, image_dates
    else: 
        return np.empty((0,)), np.empty((0,))


def identify_s1_layer(coords: Tuple[float, float]) -> str:
    """ Identifies whether to download ascending or descending 
        sentinel 1 orbit based upon predetermined geographic coverage
        
        Reference: https://sentinel.esa.int/web/sentinel/missions/
                   sentinel-1/satellite-description/geographical-coverage
        
        Parameters:
         coords (tuple): 
    
        Returns:
         layer (str): either of SENT, SENT_DESC 
    """
    results = rg.search(coords)
    country = results[-1]['cc']
    continent_name = pc.country_alpha2_to_continent_code(country)
    if continent_name in ['AF', 'OC']:
        layer = "SENT"
    if continent_name in ['SA']:
        if coords[0] > -7.11:
            layer = "SENT"
        else:
            layer = "SENT_DESC"
    if continent_name in ['AS']:
        if coords[0] > 23.3:
            layer = "SENT"
        else:
            layer = "SENT_DESC"
    if continent_name in ['NA']:
        layer = "SENT_DESC"
    print(f"The continent is: {continent_name}, and the sentinel 1 orbit is {layer}")
    return layer
 
    
def process_sentinel_1_tile(sentinel1: np.ndarray, dates: np.ndarray) -> np.ndarray:
    """Converts a (?, X, Y, 2) Sentinel 1 array to (12, X, Y, 2)

        Parameters:
         sentinel1 (np.array):
         dates (np.array):

        Returns:
         s1 (np.array)
    """
    s1, _ = calculate_and_save_best_images(sentinel1, dates)
    monthly = np.empty((12, sentinel1.shape[1], sentinel1.shape[2], 2))
    index = 0
    for start, end in zip(range(0, 72 + 6, 72 // 12), #0, 72, 6
                          range(72 // 12, 72 + 6, 72 // 12)): # 6, 72, 6
        monthly[index] = np.median(s1[start:end], axis = 0)
        index += 1
        
    return monthly
    

# 2.3 Superresolution

In [13]:
import tensorflow as tf
sess = tf.Session()
from keras import backend as K
K.set_session(sess)

MDL_PATH = "../models/supres/"

model = tf.train.import_meta_graph(MDL_PATH + 'model.meta')
model.restore(sess, tf.train.latest_checkpoint(MDL_PATH))

logits = tf.get_default_graph().get_tensor_by_name("Add_6:0")
inp = tf.get_default_graph().get_tensor_by_name("Placeholder:0")
inp_bilinear = tf.get_default_graph().get_tensor_by_name("Placeholder_1:0")

def superresolve(input_data, bilinear_upsample):
    """ Worker function to run predictions on input data
    """
    x = sess.run([logits], 
                 feed_dict={inp: input_data,
                            inp_bilinear: bilinear_upsample})
    return x[0]

@timing
def superresolve_tile(arr: np.ndarray) -> np.ndarray:
    """Superresolves each 56x56 subtile in a 646x646 input tile
       by padding the subtiles to 64x64 and removing the pad after prediction,
       eliminating boundary artifacts

        Parameters:
         arr (arr): (?, 646, 646, 10) array

        Returns:
         superresolved (arr): (?, 646, 646, 10) array
    """
    print(f"The input array to superresolve is {arr.shape}")
    tiles = tile_window(646, 646, 60, 60)
    for i in tnrange(len(tiles)):
        subtile = tiles[i]
        pad_l = 0 if subtile[0] >= 2 else 2
        pad_r = 0 if subtile[0] < (644 - 60) else 2
        pad_u = 0 if subtile[1] >= 2 else 2
        pad_d = 0 if subtile[1] < (644 - 60) else 2
        to_resolve = arr[:, np.max([subtile[0]-2, 0]):subtile[0]+62,
                            np.max([subtile[1]-2, 0]):subtile[1]+62, :]
        to_resolve = np.pad(to_resolve, ((0, 0), (pad_l, pad_r), (pad_u, pad_d), (0, 0)), 'reflect')
        
        bilinear = to_resolve[..., 4:]
        
        resolved = superresolve(
            to_resolve, bilinear)
        resolved = resolved[:, 2:-2, 2:-2, :]
        arr[:, subtile[0]:subtile[0]+60, subtile[1]:subtile[1]+60, 4:] = resolved
    return arr

Using TensorFlow backend.


# 2.4 Tiling and folder management functions

In [14]:
# move to src/utils/pathing.py
def make_output_and_temp_folders(idx: str, output_folder: str = OUTPUT_FOLDER) -> None:
    """Makes necessary folder structures for IO of raw and processed data

        Parameters:
         idx (str)
         output_folder (path)

        Returns:
         None
    """
    def _find_and_make_dirs(dirs):
        if not os.path.exists(os.path.realpath(dirs)):
            os.makedirs(os.path.realpath(dirs))
            
    folders = ['raw/', 'raw/clouds/', 'raw/s1/', 'raw/s2_10/', 'raw/s2_20/',
              'raw/misc/', 'processed/', 'interim']
    
    for folder in folders:
        _find_and_make_dirs(output_folder + folder)


def id_missing_px(sentinel2: np.ndarray, thresh: int = 11) -> np.ndarray:
    """Identifies missing (na) values in input array
    """
    missing_images_0 = np.sum(sentinel2[..., :10] == 0.0, axis = (1, 2, 3))
    missing_images_p = np.sum(sentinel2[..., :10] >= 1., axis = (1, 2, 3))
    missing_images = missing_images_0 + missing_images_p
    
    missing_images = np.argwhere(missing_images >= (sentinel2.shape[1]**2) / thresh)
    missing_images = missing_images.flatten()
    if len(missing_images) > 0:
        print(f"The missing image bands (0) are: {missing_images_0}")
        print(f"The missing image bands (1.0) are: {missing_images_p}")
    return missing_images
 

# Download worker fn

In [15]:
def download_large_tile(coord: tuple,
                        step_x: int,
                        step_y: int,
                        folder: str = OUTPUT_FOLDER, 
                        year: int = year,
                        s1_layer: str = "SENT") -> None:
    """Wrapper function to download cloud probs, Sentinel 2, Sentinel 1, and DEM

        Parameters:
         coord (tuple):
         step_x (int):
         step_y (int):
         folder (path):
         year (int):
         s1_layer (str):

        Returns:
         None
    """
    bbx, epsg = calculate_bbx_pyproj(coord, step_x, step_y, expansion = 80)
    dem_bbx, _ = calculate_bbx_pyproj(coord, step_x, step_y, expansion = 90)
    idx = str(str(step_y) + "_" + str(step_x))
    make_output_and_temp_folders(idx)
    
    output_path = f"{folder}output/{str(step_y*5)}/{str(step_x*5)}.npy"
    process_path = f"{folder}processed/{str(step_y*5)}/{str(step_x*5)}.npy"
    processed = (os.path.exists(output_path) or os.path.exists(process_path))
                 
    clouds_file = f'{folder}raw/clouds/clouds_{idx}.hkl'
    shadows_file = f'{folder}raw/clouds/shadows_{idx}.hkl'
    s1_file = f'{folder}raw/s1/{idx}.hkl'
    s1_dates_file = f'{folder}raw/misc/s1_dates_{idx}.hkl'
    s2_10_file = f'{folder}raw/s2_10/{idx}.hkl'
    s2_20_file = f'{folder}raw/s2_20/{idx}.hkl'
    s2_dates_file = f'{folder}raw/misc/s2_dates_{idx}.hkl'
    s2_file = f'{folder}raw/s2/{idx}.hkl'
    clean_steps_file = f'{folder}raw/clouds/clean_steps_{idx}.hkl'

    if not (os.path.exists(clouds_file) or processed):
        print(f"Downloading {clouds_file}")
        cloud_probs, shadows, _, image_dates = identify_clouds(bbx, epsg = epsg)
        to_remove, _ = calculate_cloud_steps(cloud_probs, image_dates)

        if len(to_remove) > 0:
            clean_dates = np.delete(image_dates, to_remove)
            cloud_probs = np.delete(cloud_probs, to_remove, 0)
            shadows = np.delete(shadows, to_remove, 0)
        else:
            clean_dates = image_dates
            
        to_remove = subset_contiguous_sunny_dates(clean_dates)
        if len(to_remove) > 0:
            clean_dates = np.delete(clean_dates, to_remove)
            cloud_probs = np.delete(cloud_probs, to_remove, 0)
            shadows = np.delete(shadows, to_remove, 0)
        
        hkl.dump(cloud_probs, clouds_file, mode='w', compression='gzip')
        hkl.dump(shadows, shadows_file, mode='w', compression='gzip')
        hkl.dump(clean_dates, clean_steps_file, mode='w', compression='gzip')

    if not (os.path.exists(s1_file) or processed):
        print(f"Downloading {s1_file}")
        s1_layer = identify_s1_layer((coord[1], coord[0]))
        s1, s1_dates = download_sentinel_1(bbx, layer = s1_layer, epsg = epsg)
        if s1.shape[0] == 0:
            s1_layer = "SENT_DESC" if s1_layer == "SENT" else "SENT"
            print(f'Switching to {s1_layer}')
            s1, s1_dates = download_sentinel_1(bbx, layer = s1_layer, epsg = epsg)
        s1 = process_sentinel_1_tile(s1, s1_dates)
        hkl.dump(to_int16(s1), s1_file, mode='w', compression='gzip')
        hkl.dump(s1_dates, s1_dates_file, mode='w', compression='gzip')

    if not (os.path.exists(s2_10_file) or processed):
        print(f"Downloading {s2_10_file}")
        clean_steps = list(hkl.load(clean_steps_file))
        cloud_probs = hkl.load(clouds_file)
        shadows = hkl.load(shadows_file)    
        s2_10, s2_20, s2_dates = download_layer(bbx, clean_steps = clean_steps, epsg = epsg)

        # Steps to ensure that L2A, L1C derived products have exact matching dates
        print(f"Shadows {shadows.shape}, clouds {cloud_probs.shape},"
              f" S2, {s2_10.shape}, S2d, {s2_dates.shape}")
        to_remove_clouds = [i for i, val in enumerate(clean_steps) if val not in s2_dates]
        to_remove_dates = [val for i, val in enumerate(clean_steps) if val not in s2_dates]
        if len(to_remove_clouds) >= 1:
            print(f"Removing {to_remove_dates} from clouds because not in S2")
            cloud_probs = np.delete(cloud_probs, to_remove_clouds, 0)
            shadows = np.delete(shadows, to_remove_clouds, 0)
            print(f"Shadows {shadows.shape}, clouds {cloud_probs.shape}"
                  f" S2, {s2_10.shape}, S2d, {s2_dates.shape}")
            hkl.dump(cloud_probs, clouds_file, mode='w', compression='gzip')
            hkl.dump(shadows, shadows_file, mode='w', compression='gzip')

        assert cloud_probs.shape[0] == s2_10.shape[0], "There is a date mismatch"
        hkl.dump(to_int16(s2_10), s2_10_file, mode='w', compression='gzip')
        hkl.dump(to_int16(s2_20), s2_20_file, mode='w', compression='gzip')
        hkl.dump(s2_dates, s2_dates_file, mode='w', compression='gzip')

    if not (os.path.exists(folder + "raw/misc/dem_{}.hkl".format(idx)) or processed):
        dem = download_dem(dem_bbx, epsg = epsg)
        hkl.dump(dem, folder + "raw/misc/dem_{}.hkl".format(idx), mode='w', compression='gzip')

In [16]:
def process_large_tile(coord: tuple,
                       step_x: int,
                       step_y: int,
                       folder: str = OUTPUT_FOLDER,
                       model: 'model' = model) -> None:
    '''Wrapper function to interpolate clouds and temporal gaps, superresolve tiles,
       calculate relevant indices, and save analysis-ready data to the output folder
       
       Parameters:
        coord (tuple)
        step_x (int):
        step_y (int):
        foldre (str):

       Returns:
        None
    '''
    idx = str(step_y) + "_" + str(step_x)
    x_vals, y_vals = make_folder_names(step_x, step_y)

    processed = True
    for x, y in zip(x_vals, y_vals):
        folder_path = f"{str(y)}/{str(x)}"
        processed_exists = os.path.exists(folder + "processed/" + folder_path + ".hkl")
        output_exists = os.path.exists(folder + "output/" + folder_path + ".npy")
        if not (processed_exists or output_exists):
            processed = False
    if not processed:
        print(f"Processing because folder {folder_path}.npy does not exist")

        clouds = hkl.load(f'{folder}raw/clouds/clouds_{idx}.hkl')
        sentinel2_10 = to_float32(hkl.load(f'{folder}raw/s2_10/{idx}.hkl'))
        sentinel2_20 = to_float32(hkl.load(f'{folder}raw/s2_20/{idx}.hkl'))
        dem = hkl.load(f'{folder}raw/misc/dem_{idx}.hkl')
        image_dates = hkl.load(f'{folder}raw/misc/s2_dates_{idx}.hkl')
        shadows = hkl.load(f'{folder}raw/clouds/shadows_{idx}.hkl')  
        
        sentinel2 = np.empty((sentinel2_10.shape[0], 646, 646, 10))
        sentinel2[..., :4] = sentinel2_10
        for band in range(6):
            for time in range(sentinel2.shape[0]):
                sentinel2[time, ..., band + 4] = resize(sentinel2_20[time,..., band], (646, 646), 1)

        missing_px = id_missing_px(sentinel2, 3)
        if len(missing_px) > 0:
            print(f"Removing {missing_px} dates due to missing data")
            clouds = np.delete(clouds, missing_px, axis = 0)
            shadows = np.delete(shadows, missing_px, axis = 0)
            image_dates = np.delete(image_dates, missing_px)
            sentinel2 = np.delete(sentinel2, missing_px, axis = 0)
                    
        x, interp = remove_cloud_and_shadows(sentinel2, clouds, shadows, image_dates) 
         
        x = superresolve_tile(np.float32(x))
        dem_i = np.tile(dem[np.newaxis, 1:-1, 1:-1, :], (x.shape[0], 1, 1, 1))
        dem_i = dem_i / 90
        x = np.concatenate([x, dem_i], axis = -1)
        x = np.clip(x, 0, 1)
        return x, image_dates, interp
    else:
        print(f"Skipping because folder {folder_path}.npy exists")
        return None, None, None
        

In [17]:
INPUT_FOLDER = "/".join(OUTPUT_FOLDER.split("/")[:-2]) + "/"

def interpolate_na_vals(s2):
    '''Interpolates NA values with closest time steps, to deal with
       the small potential for NA values in calculating indices'''
    for x_loc in range(s2.shape[1]):
        for y_loc in range(s2.shape[2]):
            n_na = np.sum(np.isnan(s2[:, x_loc, y_loc, :]), axis = 1)
            for date in range(s2.shape[0]):
                if n_na.flatten()[date] > 0:
                    before, after = calculate_proximal_steps(date, np.argwhere(n_na == 0))
                    s2[date, x_loc, y_loc, :] = ((s2[date + before, x_loc, y_loc] + 
                                                 s2[date + after, x_loc, y_loc]) / 2)
    numb_na = np.sum(np.isnan(s2), axis = (1, 2, 3))
    if np.sum(numb_na) > 0:
        print(f"There are {numb_na} NA values")
    return s2

def process_subtiles(coord: tuple,
                       step_x: int,
                       step_y: int,
                       year = 2019,
                       path: str = INPUT_FOLDER,
                       s2: np.ndarray = None, 
                       dates: np.ndarray = None,
                       interp: np.ndarray = None) -> None:
    '''Wrapper function to interpolate clouds and temporal gaps, superresolve tiles,
       calculate relevant indices, and save analysis-ready data to the output folder
       
       Parameters:
        coord (tuple)
        step_x (int):
        step_y (int):
        folder (str):

       Returns:
        None
    '''
    idx = str(step_y) + "_" + str(step_x)
    x_vals, y_vals = make_folder_names(step_x, step_y)
    s1 = hkl.load(f"{path}/{year}/raw/s1/{idx}.hkl")

    s2 = evi(s2, verbose = True)
    s2 = bi(s2, verbose = True)
    s2 = msavi2(s2, verbose = True)
    s2 = ndvi(s2, verbose = True)
    s2 = interpolate_na_vals(s2)

    index = 0
    tiles = tile_window(IMSIZE, IMSIZE, window_size = 142)
    for t in tiles:
        start_x, start_y = t[0], t[1]
        end_x = start_x + t[2]
        end_y = start_y + t[3]
        subset = s2[:, start_x:end_x, start_y:end_y, :]
        interp_tile = interp[:, start_x:end_x, start_y:end_y]
        interp_tile = np.sum(interp_tile, axis = (1, 2))

        dates_tile = np.copy(dates)
        to_remove = np.argwhere(interp_tile > ((142*142) / 10)).flatten()
        if len(to_remove) > 0:
            dates_tile = np.delete(dates_tile, to_remove)
            subset = np.delete(subset, to_remove, 0)
            print(f"Removing {to_remove} interp, leaving {len(dates_tile)} / {len(dates)}")

        missing_px = id_missing_px(subset)
        if len(missing_px) > 0:
            dates_tile = np.delete(dates_tile, missing_px)
            subset = np.delete(subset, missing_px, 0)
            print(f"Removing {missing_px} missing, leaving {len(dates_tile)}")

        to_remove = remove_missed_clouds(subset) 
        if len(to_remove) > 0:
            subset = np.delete(subset, to_remove, axis = 0)
            dates_tile = np.delete(dates_tile, to_remove)
            print(f"{len(to_remove)} missed clouds, leaving {len(dates_tile)}")

        subtile, _ = calculate_and_save_best_images(subset, dates_tile)
        output = f"{path}/{year}/processed/{y_vals[index]}/{x_vals[index]}.hkl"

        index += 1
        
        median = np.median(subtile, axis = 0)
        median_s1 = np.median(s1[:, start_x:end_x, start_y:end_y, :], axis = 0)
        median_s1 = median_s1 / 65535
        median = np.concatenate([median, median_s1], axis = -1)
        
        sm = Smoother(lmbd = 800, size = subtile.shape[0], nbands = 14, dim = subtile.shape[1])
        subtile = sm.interpolate_array(subtile)
        subtile = np.concatenate([subtile, s1[:, start_x:end_x, start_y:end_y, :]], axis = -1)
        subtile[..., -2:] = subtile[..., -2:] / 65535
        
        output_folder = "/".join(output.split("/")[:-1])
        if not os.path.exists(os.path.realpath(output_folder)):
            os.makedirs(os.path.realpath(output_folder))
        subtile = np.concatenate([subtile, median[np.newaxis]], axis = 0)
        #The indices can range from -1 to 1, convert to 0-1 for uint16
        subtile[..., 11:15] = np.clip(subtile[..., 11:15], -1, 1)
        subtile[..., 11:15] = (subtile[..., 11:15] + 1) / 2
        
        subtile = np.clip(subtile, 0, 1)
        subtile = to_int16(subtile)
        print(f"{index}: Writing {output}")
        assert subtile.shape[1] == 142, f"subtile shape is {subtile.shape}"
        assert subtile.shape[0] == 13, f"subtile shape is {subtile.shape}"

        hkl.dump(subtile, output, mode='w', compression='gzip')

# 2.5 Function execution

In [None]:
print(f"Downloading {year} for {landscape}")

downloaded = 0
max_x = 50
max_y = 50
if not os.path.exists(os.path.realpath(OUTPUT_FOLDER)):
            os.makedirs(os.path.realpath(OUTPUT_FOLDER))
    
for y_tile in range(1, 4):
    for x_tile in range(1, 4):
        contains = True
        #contains = check_contains(coords, x_tile, y_tile, OUTPUT_FOLDER)
        if contains:
            print(f"Download {downloaded}/{max_x*max_y}; X: {x_tile} Y:{y_tile}")
            downloaded += 1
            time1 = time()
            download_large_tile(coord = coords, step_x = x_tile, step_y = y_tile)
            s2, image_dates, interp = process_large_tile(coords, x_tile, y_tile)
            if s2 is not None:
                process_subtiles(coords, x_tile, y_tile, year = year,
                                    s2 = s2, dates = image_dates, interp = interp)
                print("\n")
            time2 = time()
            print(f"Finished in {np.around(time2 - time1, 1)} seconds")