## Generate Tilesets JSON

---

`earthengine upload image --manifest` does not allow for tifs across epsg zones. In this notebook we generate a file `tilesets.json` such that running

```bash
# https://github.com/wri/dl_exporter
dl_exporter manifest manifest_config.yaml tilesets.json gee_manifest.json
```

Will generate a manifest file `gee_manifest.json-<i>` for each (i-th) epsg zone.  Once you have those files you can create an `image_collection` and upload each epsg-specific image into the `image_collection`.

---

In [1]:
import re
import math
import pandas as pd
import mproc
from pprint import pprint
from descarteslabs.scenes import DLTile
import dl_exporter.utils as utils

---

### HELPER METHODS

In [2]:
H='^gs://dl-exports/wri_urban_india-2019/wri:ulu-india_'
T='_2019.tif$'

def extract_tile_id(path):
    path=re.sub(H,'',path)
    return re.sub(T,'',path)
    
    
def get_crs(tile_key):
    return DLTile.from_key(tile_key).crs


def get_crs_list(tile_keys):
    return [get_crs(k) for k in tile_keys]


def flatten(lst):
    l=[]
    for _l in lst:
        l+=_l
    return l

---

### GENERATE DATA CSV

In [3]:
"""
1. fetch file list 
2. insert into dataframe
3. add tile_key column
"""
!echo path > gcs_data.csv
!gsutil ls gs://dl-exports/wri_urban_india-2019/*.tif >> gcs_data.csv
df=pd.read_csv('gcs_data.csv')
df['tile_key']=df.path.apply(extract_tile_id)
r=df.sample().iloc[0]
print(df.shape[0],r.path,r.tile_key)

3449 gs://dl-exports/wri_urban_india-2019/wri:ulu-india_6144:0:5.0:46:5:100_2019.tif 6144:0:5.0:46:5:100


---

### ADD CRS COLUMN

In [4]:
"""
1. create batched list of tile_keys
2. generate crs using multiprocessing
3. add crs column
"""
NB_BATCHES=16
TOTAL=df.shape[0]
BATCH_SIZE=int(math.ceil(TOTAL/NB_BATCHES))
BATCHED_KEYS=[df.tile_key[i*BATCH_SIZE:(i+1)*BATCH_SIZE].tolist() for i in range(NB_BATCHES)]

In [5]:
%time crs_lists=mproc.map_with_threadpool(get_crs_list,BATCHED_KEYS,max_processes=NB_BATCHES)

AuthError: Could not find client_id

In [None]:
print('nb_batches:',len(crs_lists))
crs_list=flatten(crs_lists)
print('nb_crs_rows:',len(crs_list))
crs_list[:3]

In [None]:
df['crs']=crs_list
df.to_csv('lulc-india-2019.csv')
df.sample(10)

---

### CREATE URIS-JSON FOR MULTI-MANIFEST RUN

In [None]:
def to_tilesets(df):
    """ uris-json
    
    Creates a list of objects, each of which generates a seperate
    gee-mainfest for sources in a single epsg zone.
    """
    jsn={ c: []  for c in df.crs.unique().tolist() }
    for i,r in df.iterrows():
        jsn[r.crs].append(r.path)
    lst=[]
    for i,(c,u) in enumerate(jsn.items()):
        lst.append({
            'id': re.sub(":","",c),
            'crs': c,
            'sources': [{'uris': _u} for _u in u] })
    return lst

In [None]:
jsn=to_tilesets(df)

In [None]:
utils.save_json(jsn,'../tilesets.json')

In [None]:
pprint(utils.read_json('../tilesets.json'))