# Create Layer Config Backup

This notebook outlines how to run a process to create a remote backup of gfw layers.

Rough process:

- Run this notebook from the `gfw/data` folder
- Wait...
- Check `_metadata.json` files in the `production` and `staging` folders for changes
- If everything looks good, make a PR

First, install the latest version of LMIPy

In [1]:
!pip install LMIPy

from IPython.display import clear_output
clear_output()

print('LMI ready!')

LMI ready!


Next, import relevent modules

In [23]:
import LMIPy as lmi
import os
import json
from pprint import pprint
from datetime import datetime
import shutil

First, pull the gfw repo and check that the following path correctly finds the `data/layers` folder, inside which, you should find a `production` and `staging` folder.

In [3]:
envs = ['staging', 'production']

In [4]:
path = './layers'

In [27]:
# Create directory and archive previous datasets
with open(path + '/metadata.json') as f:
    date = json.load(f)[0]['updatedAt']
    
shutil.make_archive(path + f'/archived/archive_{date}', 'zip', path)

'/Users/vizzuality/Workspace/gfw/data/layers/archived/archive_2019-06-21@09h-31m-18s.zip'

In [28]:
# Check correct folders are found

if not all([folder in os.listdir(path) for folder in envs]):
    print(f'Boo! Incorrect path: {path}')
else:
    print('Good to go!')

Good to go!


Run the following to save, build `.json` files and log changes.

In [29]:
%%time
for env in envs:
    
    # Get all old ids
    old_ids = [file.split('.json')[0] for file in os.listdir(path + f'/{env}') if '_metadata' not in file]
    
    old_datasets = []
    files = os.listdir(path + f'/{env}')
    
    # Extract all oild datasets
    for file in files:
        if '_metadata' not in file:
            with open(path + f'/{env}/{file}') as f:
                old_datasets.append(json.load(f))
    
    # Now pull all current gfw datasets and save
    col = lmi.Collection(app=['gfw'], env=env)
    col.save(path + f'/{env}')
    
    # Get all new ids
    new_ids = [file.split('.json')[0] for file in os.listdir(path + f'/{env}') if '_metadata' not in file]
    
    # See which are new, and which have been removed
    added = list(set(new_ids) - set(old_ids))
    removed = list(set(old_ids) - set(new_ids))
    changed = []
    
    # COmpare old and new, logging those that have changed
    for old_dataset in old_datasets:
        ds_id = old_dataset['id']
        old_ids.append(ds_id)
        with open(path + f'/{env}/{ds_id}.json') as f:
                new_dataset = json.load(f)
        
        if old_dataset != new_dataset:
            changed.append(ds_id)
    
    # Create metadata json
    with open(path + f'/{env}/_metadata.json', 'w') as f:
        
        meta = {
            'updatedAt': datetime.today().strftime('%Y-%m-%d@%Hh-%Mm-%Ss'),
            'env': env,
            'differences': {
                'changed': changed,
                'added': added,
                'removed': removed
            }
        }
        
        # And save it too!
        json.dump(meta,f)
        
print('Done!')

  0%|          | 0/15 [00:00<?, ?it/s]

Saving to path: ./layers/staging


100%|██████████| 15/15 [00:04<00:00,  3.33it/s]
  0%|          | 0/298 [00:00<?, ?it/s]

Saving to path: ./layers/production


100%|██████████| 298/298 [01:22<00:00,  3.63it/s]

Done!
CPU times: user 6.31 s, sys: 564 ms, total: 6.87 s
Wall time: 1min 29s





In [None]:
# Generate rich metadata

metadata = []
for env in envs:
    with open(path + f'/{env}/_metadata.json') as f:
        metadata.append(json.load(f))
        
for env in metadata:
    for change_type, ds_list in env['differences'].items():
        tmp = []
        for dataset in ds_list:
            # generate Dataset entity to get name etc...
            tmp.append(str(lmi.Dataset(dataset)))
        env['differences'][change_type] = tmp
        
with open(path + f'/metadata.json', 'w') as f:
        
        # And save it too!
        json.dump(metadata,f)

In [None]:
pprint(metadata)