# Data Download

### How to Install

1. Install GEE in GCE by following: https://developers.google.com/earth-engine/python_install-conda

  - if encountered jupyter command not found, add conda to path
  
```
    export PATH=~/anaconda3/bin:$PATH
```

2. Make new kernel to use in jupylab

```
conda install ipykernel
ipython kernel install --name ee --user
```

3. Update kernel.json with 3.8.2 python (what gee uses)

```
jupyter kernelspec list
vim /home/cholo/.local/share/jupyter/kernels/ee/kernel.json
```

    - update via vim with this python
  
```
    /home/cholo/anaconda3/envs/ee/bin/python
```

4. Install gdal

```
conda activate ee
conda install gdal
```

5. In new notebook from jupylab, select kernel 'ee'

To bypass ee.Authenticate, you can run 'earthengine authenticate' in terminal

### How to Add New Areas

In utils/gee_settings.py
1. In 'areas' list, include area, removing spaces i.e. Villa del Rosario > villadelrosario
2. In BBOX dict, add bounding box arranged as a list of 4 numbers, upper left and lower right
3. In CLOUD_PARAMS dict, specify cloud filter and if will be masked or not
4. In admin2RefN, add name in Admin Boundary shapefile

Once downloaded file shows in gs://immap-gee
1. check if the area is split into multiple files
2. If yes, add area to multi-part list in Section Input params

## Load tools

In [1]:
import geopandas as gpd
from fiona.crs import to_string
import pathlib
from tqdm import tqdm

import sys
sys.path.insert(0, '../utils')
from gee import sen2median, deflatecrop1
from gee_settings import areas, BBOX, CLOUD_PARAMS, admin2RefN

data_dir = "../data/"

Enter verification code:  4/zQEo1-ZcYGrP-qFqUJ1g3QMH7E6TVqJ7E7xBT1R5aLuPWm1PezxWwcw



Successfully saved authorization token.


In [2]:
adm_dir = data_dir + 'admin_bounds/'
img_dir = data_dir + 'images/'
tmp_dir = data_dir + 'tmp/'

dirs = [adm_dir, img_dir, tmp_dir]
for dir_ in dirs:
    with pathlib.Path(dir_) as path:
        if not path.exists():
            path.mkdir(parents=True, exist_ok=True)

# get area shape file
# !gsutil cp gs://immap-masks/admin_boundaries/admin_bounds.gpkg {adm_dir}
gdf = gpd.read_file(adm_dir + 'admin_bounds.gpkg')
fcrs = to_string({'init': 'epsg:4326', 'no_defs': True})
gdf.crs = fcrs

## Input params

In [37]:
PRODUCT = 'COPERNICUS/S2' # L1C
years = ['2015-2016', '2017-2018', '2019-2020']
def get_minmaxdt(year_str):
    list_ = year_str.split('-')
    return list_[0] + '-01-01', list_[1] + '-12-31'

# subset
years = ['2015-2016']
areas = ['arauca']

multipart = ['arauca', 'tibu', 'bogota']

## Download from GEE

In [8]:
for area in areas:
    for year in years:
        cloud_pct, mask = CLOUD_PARAMS[area][year]
        min_dt, max_dt = get_minmaxdt(year)
        sen2median(
            BBOX[area], 
            FILENAME = f'gee_{area}_{year}', 
            min_dt = min_dt, 
            max_dt = max_dt,
            cloud_pct = cloud_pct, 
            mask = mask,
            PRODUCT = PRODUCT,
            verbose = 0
        )

Processing gee_arauca_2015-2016
using COPERNICUS/S2
Filtering to images with cloud cover < 10
with no mask
Task started


## Deflate and crop

In [4]:
for area in areas:   
    area1 = gdf[gdf['admin2RefN'] == admin2RefN[area]]
    area1.to_file(adm_dir + area + '.shp')

In [38]:
files_ = []

for area in areas:
    for year in years:
        if area in multipart:
            # just get the largest part
            files_.append(f'gee_{area}_{year}0000000000-0000000000')
        else:
            files_.append(f'gee_{area}_{year}')

In [42]:
for f in tqdm(files_):
    deflatecrop1(
        raw_filename = f, 
        output_dir = img_dir, 
        adm_dir = adm_dir,
        tmp_dir = tmp_dir,
        bucket = 'gs://immap-images/20200501/',
        clear_local = False
    )

100%|██████████| 1/1 [06:29<00:00, 389.49s/it]
