# Data Download

### How to Install

1. Install conda environment.

```
conda env create -f processing_environment.yml
conda activate ee
```
  
2. Install kernel.

```
python -m ipykernel install --user --name ee --display-name "ee kernel"
```

3. In new notebook from jupylab, select kernel 'ee kernel'

Source on how to install ee: https://developers.google.com/earth-engine/python_install-conda

### How to Add New Areas

In utils/gee_settings.py
1. In 'areas' list, include area, removing spaces i.e. Villa del Rosario > villadelrosario
2. In BBOX dict, add bounding box arranged as a list of 4 numbers, upper left and lower right
3. In CLOUD_PARAMS dict, specify cloud filter and if will be masked or not
4. In admin2RefN, add name in Admin Boundary shapefile

Once downloaded file shows in gs://immap-gee
1. check if the area is split into multiple files
2. If yes, add area to multi-part list in Section Input params

## Load tools

In [1]:
import geopandas as gpd
from fiona.crs import to_string
import pathlib
from tqdm import tqdm

import sys
sys.path.insert(0, '../utils')
from gee import sen2median, deflatecrop1
from gee_settings import BBOX, CLOUD_PARAMS, admin2RefN

data_dir = "../data/"

Enter verification code:  4/0wHEFyMyQjLwTQk2mZs6f1tGH3LfCRZcsqslurp9sbjwNrw5CJGDDkg



Successfully saved authorization token.


In [3]:
adm_dir = data_dir + 'admin_bounds/'
img_dir = data_dir + 'images/'
tmp_dir = data_dir + 'tmp/'

dirs = [adm_dir, img_dir, tmp_dir]
for dir_ in dirs:
    with pathlib.Path(dir_) as path:
        if not path.exists():
            path.mkdir(parents=True, exist_ok=True)

# get area shape file
# !gsutil cp gs://immap-masks/admin_boundaries/admin_bounds.gpkg {adm_dir}
# gdf = gpd.read_file(adm_dir + 'admin_bounds.gpkg')
# fcrs = to_string({'init': 'epsg:4326', 'no_defs': True})
# gdf.crs = fcrs

## Input params

In [4]:
PRODUCT = 'COPERNICUS/S2' # L1C
years = ['2015-2016', '2017-2018', '2019-2020']
def get_minmaxdt(year_str):
    list_ = year_str.split('-')
    return list_[0] + '-01-01', list_[1] + '-12-31'

areas = [
#     'riohacha', 'maicao', 'uribia', 
#     'arauca', 'arauquita', 'cucuta', 'tibu', 'soacha', #'villadelrosario', 'saravena',
#     'bogota', 'sabanalarga', 'soledad', 'santamarta', 'barranquilla',
#     'inirida','puertocarreno2', 'bucaramanga', 'monteria', 'fonseca',
#     'fortul', 'fundacion', 'malambo', 'manaure', 'ocana', 'pasto', 'puertosantander', 'saravena', 'villadelrosario', 'tame', 'yopal',  
#     'sabanalargaatlantico', 'cumbal', 'cali',
#     'valledupar', 'cienaga', 'sanjuandelcesar', 'baranoa', 'convencion', 'albania', 'santotomas', 'polonuevo', 'elbanco', 'dibulla', 'turbaco', 'cartagena', 'planadas', 'medellin', 'puertocolombia',
#     'facatativa','bosconia','puertogaitan','tubara','lapazcesar','cota','sanmarcos','pitalito','agustincodazzi','floridablanca','piedecuesta','itagui','sincelejo','palmira','bello',
    
    'pereira',
    'chia',
    'pamplona',
    'rionegroantioquia',
    'lospatios',
    'envigado',
    'magangue',
    'armenia',
    'jamundi',
    'barrancabermeja',
    'zipaquira',
    'ibague',
    'chinacota',
    'barrancas',
    'tunja',
    'dosquebradas',
    'tumaco',
    'mosquera',
    'manizales',
    'ipiales',
    'giron',
    'villavicencio',
    'madrid',
    
]

multipart = ['arauca', 'tibu', 'bogota', 'puertocarreno2']

## Download from GEE

In [5]:
for area in areas:
    for year in years:
        cloud_pct, mask = CLOUD_PARAMS[area][year]
        min_dt, max_dt = get_minmaxdt(year)
        sen2median(
            BBOX[area], 
            FILENAME = f'gee_{area}_{year}', 
            min_dt = min_dt, 
            max_dt = max_dt,
            cloud_pct = cloud_pct, 
            mask = mask,
            PRODUCT = PRODUCT,
            verbose = 1
        )

Processing gee_pereira_2015-2016
using COPERNICUS/S2
Filtering to images with cloud cover < 40
with mask
Task started
Processing gee_pereira_2017-2018
using COPERNICUS/S2
Filtering to images with cloud cover < 40
with mask
Task started
Processing gee_pereira_2019-2020
using COPERNICUS/S2
Filtering to images with cloud cover < 40
with mask
Task started
Processing gee_chia_2015-2016
using COPERNICUS/S2
Filtering to images with cloud cover < 20
with mask
Task started
Processing gee_chia_2017-2018
using COPERNICUS/S2
Filtering to images with cloud cover < 40
with mask
Task started
Processing gee_chia_2019-2020
using COPERNICUS/S2
Filtering to images with cloud cover < 40
with mask
Task started
Processing gee_pamplona_2015-2016
using COPERNICUS/S2
Filtering to images with cloud cover < 40
with mask
Task started
Processing gee_pamplona_2017-2018
using COPERNICUS/S2
Filtering to images with cloud cover < 40
with mask
Task started
Processing gee_pamplona_2019-2020
using COPERNICUS/S2
Filtering

## Deflate and crop

In [7]:
# create shapefiles for cropping
for area in areas:
    area1 = gdf[gdf['admin2RefN'] == admin2RefN[area]]
    area1.to_file(adm_dir + area + '.shp')

In [8]:
# collect filenames to be processed
files_ = []

for area in areas:
    for year in years:
        if area in multipart:
            # just get the largest part
            files_.append(f'gee_{area}_{year}0000000000-0000000000')
        else:
            files_.append(f'gee_{area}_{year}')

In [10]:
for f in tqdm(files_):
    deflatecrop1(
        raw_filename = f, 
        output_dir = img_dir, 
        adm_dir = adm_dir,
        tmp_dir = tmp_dir,
        bucket = 'gs://immap-images/20200518/',
        clear_local = True
    )

100%|██████████| 9/9 [05:56<00:00, 39.63s/it]
