<font face="Calibri" size="2"> <i>eSBAE - Notebook Series - Part 3, version 0.4, March 2023. Andreas Vollrath, UN-Food and Agricultural Organization, Rome</i>
</font>

![title](images/header.png)

# III - eSBAE Time-Series Extraction
### Extract various time-series data for large sets of points from Google Earth Engine
-------

This notebook takes you through the process of extracting time-series information for a set of points using [Google Earth Engine](https://earthengine.google.com/). The script is optimized to deal with thousands of points and will use parallelization to efficiently extract the information from the platform.

**You will need**:
- a valid Earth Engine account ([sign up here](https://code.earthengine.google.com/register))
- an uploaded table of points (Feature Collection) 
- the table needs a unique point identifier (Point ID)

**You should be aware, that:** 

- As a SEPAL user: this notebook does **not need huge resources**, as processing is done on the platform. An **m2 instance** is best suited.  
- The extraction can take up to days (>100000 points). If you are on SEPAL, make use of the **"keep instance running"** option within the user report dashboard.
  - You do this by clicking on the cost per hour shown at the bottom right of your screen. Select the edit button on the right side under "sessions", then move the slider to the right until several days are selected and close the window. However, **do not forget** to shut down your machine once processing has finished or you will continue to be charged. 
- Interruption of connectivity to the SEPAL server may block the output of the Jupyter notebook. **This does not mean the processing stopped.** A logfile is created within your "tmp" folder where you can check if there is an issue.
    - Go to your "tmp" folder by making sure the File Browser icon is selected from the four tabs on the left of your Jupyter Notebooks screen, then click on the folder icon on the far left of the displayed path to your working folder. This will take you one directory up from your working folder where the "tmp" folder is located. Inside the "tmp" folder you can see the "Last Modified" times. Check to see that the last modified time was within the last few minutes when you ran the code cell (proving the processing is still ongoing). If the last modified time seems far too long ago, try checking your instance is still active (there should be a non-zero cost per hour on the bottom of your screen) and then **restarting the kernel and running all the cells again**.  
- If you restart the kernel and execute all cells, extraction will **start where it stopped**. This is also valid if your instance has been shut down before processing was completely finished.

### 1 - Import libraries

This cell will provide us with the functionality we need for running the subsequent cells of the notebook.

In [1]:
# initialize EE    
import ee
try:
    ee.Initialize(opt_url='https://earthengine-highvolume.googleapis.com')
except:
    ee.Authenticate()
    ee.Initialize(opt_url='https://earthengine-highvolume.googleapis.com')
    
from sampling_handler import TimeSeriesExtraction

  warn("cupy is not available in this environment, GPU fonctionnalities won't be available")


### 2 - Basic Input Variables

Here a class instance is initialized. The class instance needs some parameters to be set and is written into the *esbae* variable.

In [4]:
esbae = TimeSeriesExtraction(
     # your project name that you use for all of the notebooks
    project_name  = 'my_first_esbae_project',
    
    # your start and end date 
    # NOTE: this start date should go further back in the past than the 
    # envisaged monitoing period for calibration purposes
    ts_start      = '2015-01-01',      # YYYY-MM-DD format
    ts_end        = '2023-01-01',      # YYYY-MM-DD format
    
    # satellite platform (for now, only Landsat is supported)
    satellite     = 'Landsat',
    
    # at what resolution in metres you want to extract (should conform with forest definition MMU)
    scale         = 70, # pixel size in metres
    
    # wether the time series will be extracted on a bounding box with diameter scale with original scale (e.g. 30m for Landsat) of the underlying data (True), 
    # or if the underlying data is rescaled to the scale (False)
    # setting it to True might be more accurate, but tends to be slower
    bounds_reduce = False,
    
    # bands
    bands         =  [
        'green', 'red', 'nir', 'swir1', 'swir2',   # reflectance bands
        'ndfi', #'ndmi', 'ndvi',                    # indices
        'brightness', 'greenness', 'wetness'       # Tasseled Cap 
    ], 
    # Uncomment the text below in the case where you haven't run notebook 1 and 2, and want to directly start from here with an aoi defined by the geometry around an existing set of point
    # aoi = ee.FeatureCollection(ee.FeatureCollection('users/username/my_points').geometry().convexHull(100))
    
)

INFO: Using existing project directory at /home/sepal-user/module_results/esbae/my_first_esbae_project
INFO: Using existent config file from project directory /home/sepal-user/module_results/esbae/my_first_esbae_project


### 3 - Landsat parameters

Here you can select, which satellites you want to include from the Landsat mission.
In addition you can select the BRDF correction and a filter for maximum cloud cover. Note that the bands parameter is already set in the initialization and will be taken from the class attribute. 

In [None]:
# landsat related parameters
lsat_params = {
    'l9': True,
    'l8': True,
    'l7': True,
    'l5': True,
    'l4': True,
    'brdf': True,
    'bands': esbae.bands,
    'max_cc': 75    # percent
} 

# apply the basic configuration set in the cell above
esbae.lsat_params = lsat_params

### 4 - Processing parameters

Here you can refine the parallelization options. For efficient extraction, the time-series extraction is done on chunks of data, defined by squared grids of given sizes. The routine will check how many points are in each chunk. If this is below the max_points_per_chunk, it will go on and process those points. Otherwise it will try to process those points at a lower grid size level. Some optimized settings are given below, comment and uncomment as appropriate.

In [None]:
esbae.workers = 10                   # this defines how many parallel requests will be send to EarthEngine at a time
esbae.max_points_per_chunk = 100     # this defines the maximum amount of points as send per request to Earth Engine at a time

# this defines the chunk sizes (in degree) to create the requests
#esbae.grid_size_levels = [0.1, 0.075, 0.05]   # optimized for 1km systematic grid
esbae.grid_size_levels = [0.2, 0.15, 0.1]    # optimized for 2km systematic grid
#esbae.grid_size_levels = [0.4, 0.3, 0.2]     # optimized for 4km systematic grid

### 5 - Set a custom grid 

This step is only necessary if you skipped notebook 2. You then need to define an Earth Engine feature collection as well as the unique point identifier. Uncomment the lines by removing the #

In [None]:
#esbae.sample_asset = 'users/username/my_already_existing_points'
#esbae.pid = 'my_unique_point_id'

### 4 - Check for already processed data (optional)

This is useful for large points sizes and when the connection to Sepal gets interrupted. Usually processing will continue, but it is not straightforward to track progress. 
You can instead restart the kernel, execute all cells and see if processing has been finished with the following line of code.

In [None]:
esbae.check_if_completed()

### 5 - Run the time-series data extraction *(only execute this)*

In [None]:
esbae.get_time_series_data()