# Sentinel-2 Image Processing 

The notebook presents the training data generation script.

### About Informal Settlement Dataset
The Informal Settlement Dataset was received from iMMAP on March 5, 2020. The dataset contains ground-validated locations of informal migrant settlements in Northern Colombia. Through visual interpretation, we generated ground-truth polygons of the informal settlements. This script contains code for converting the vector shapefiles to raster masks.

### About Sentinel-2 Imagery

SENTINEL-2 is a wide-swath, high-resolution, multi-spectral imaging mission, supporting Copernicus Land Monitoring studies, including the monitoring of vegetation, soil and water cover, as well as observation of inland waterways and coastal areas ([Source](https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/overview)). 

**Note**: 
- For 2016 and 2017 satellite imagery, we obtained L-1C Sentinel2 Imagery. 
- For 2018 - 2020 satellite imagery we obtained L-2A Sentinel2 Imagery. 

## Imports and Setup

In [1]:
import os
import operator
from tqdm import tqdm
import pandas as pd
import numpy as np
pd.set_option('use_inf_as_na', True)

import geopandas as gpd
import rasterio as rio

import sys
sys.path.insert(0, '../utils')
import geoutils

import logging
import warnings
logging.getLogger().setLevel(logging.ERROR)
warnings.filterwarnings("ignore")

import matplotlib.pyplot as plt
%matplotlib inline

%load_ext autoreload
%autoreload 2

## File Locations

In [2]:
data_dir = "../data/"
images_dir = data_dir + 'images/'
indices_dir = data_dir + 'indices/'
pos_mask_dir = data_dir + 'pos_masks/'
neg_mask_dir = data_dir + 'neg_masks/'

if not os.path.exists(data_dir):
    os.makedirs(data_dir)
if not os.path.exists(images_dir):
    os.makedirs(images_dir)
if not os.path.exists(indices_dir):
    os.makedirs(indices_dir)
if not os.path.exists(pos_mask_dir):
    os.makedirs(pos_mask_dir)
if not os.path.exists(neg_mask_dir):
    os.makedirs(neg_mask_dir)

areas = ['maicao', 'riohacha', 'uribia', 'arauca', 'cucuta', 'tibu', 'arauquita', 'soacha', 'bogota']

## Download Files from GCS

In [3]:
#!gsutil -q -m cp gs://immap-images/20200501/*.tif {images_dir}
#!gsutil -m cp gs://immap-indices/20200421/*.tif {indices_dir}
#!gsutil -m cp gs://immap-masks-pos/20200421/*.gpkg {pos_mask_dir}
#!gsutil -m cp gs://immap-masks-neg/20200421/*.gpkg {neg_mask_dir}
print('Operations completed.')

Operations completed.


## Area Filepath Dictionary
The following cell returns a dictionary containing the image filepaths for each area.

In [4]:
area_dict = geoutils.get_filepaths(areas, images_dir, indices_dir, pos_mask_dir, neg_mask_dir)
print("Image filepaths for Bogota:")
area_dict['bogota']

Image filepaths for Bogota:


{'pos_mask_gpkg': '../data/pos_masks/bogota_pos.gpkg',
 'neg_mask_gpkg': '../data/neg_masks/bogota_neg.gpkg',
 'images': ['../data/images/bogota_2015-2016.tif',
  '../data/images/bogota_2017-2018.tif',
  '../data/images/bogota_2019-2020.tif'],
 'indices': ['../data/indices/indices_bogota_2015-2016.tif',
  '../data/indices/indices_bogota_2017-2018.tif',
  '../data/indices/indices_bogota_2019-2020.tif']}

In [6]:
for area in area_dict:
    pos_mask = area_dict[area]['pos_mask_gpkg']
    neg_mask = area_dict[area]['neg_mask_gpkg']
    !gsutil -q -m cp {pos_mask} gs://immap-masks-pos/20200504/
    !gsutil -q -m cp {neg_mask} gs://immap-masks-neg/20200504/

## Generate TIFF Files for Indices
The following script is used to generate TIFF files for the derived indices for each of the images. There is no need to run this if the indices have already been generated. 

In [5]:
for area in areas[3:]:
    area_dict = geoutils.write_indices(area_dict, area, indices_dir)

100%|██████████| 3/3 [23:35<00:00, 471.71s/it]
100%|██████████| 3/3 [07:07<00:00, 142.62s/it]
100%|██████████| 3/3 [17:46<00:00, 355.36s/it]
100%|██████████| 3/3 [19:48<00:00, 396.28s/it]
100%|██████████| 3/3 [01:11<00:00, 23.95s/it]
100%|██████████| 3/3 [10:53<00:00, 217.88s/it]


## Generate Target Raster Masks
The following scripts generate TIFF masks for the vector GPKG files of both positive (new informal settlements) and negative (non-new informal settlement) samples.

### Positive Labels: Informal Settlements

In [7]:
area_dict = geoutils.get_pos_raster_mask(area_dict)
for area in areas:
    print("Raster filepath for {}: {}".format(area, area_dict[area]['pos_mask_tiff']))

Raster filepath for maicao: ../data/pos_masks/maicao_pos.tiff
Raster filepath for riohacha: ../data/pos_masks/riohacha_pos.tiff
Raster filepath for uribia: ../data/pos_masks/uribia_pos.tiff
Raster filepath for arauca: ../data/pos_masks/arauca_pos.tiff
Raster filepath for cucuta: ../data/pos_masks/cucuta_pos.tiff
Raster filepath for tibu: ../data/pos_masks/tibu_pos.tiff
Raster filepath for arauquita: ../data/pos_masks/arauquita_pos.tiff
Raster filepath for soacha: ../data/pos_masks/soacha_pos.tiff
Raster filepath for bogota: ../data/pos_masks/bogota_pos.tiff


### Negative Labels: Formal Settlements and Unoccupied Land

In [9]:
area_dict, target_dict = geoutils.get_neg_raster_mask(area_dict)
print("Target value codes: {}".format(target_dict))
for area in areas:
    print("Raster filepath for {}: {}".format(area, area_dict[area]['neg_mask_tiff']))

Target value codes: {'Formal Settlement': 2, 'Unoccupied Land': 3, 'informal settlement': 1}
Raster filepath for maicao: ../data/neg_masks/maicao_neg.tiff
Raster filepath for riohacha: ../data/neg_masks/riohacha_neg.tiff
Raster filepath for uribia: ../data/neg_masks/uribia_neg.tiff
Raster filepath for arauca: ../data/neg_masks/arauca_neg.tiff
Raster filepath for cucuta: ../data/neg_masks/cucuta_neg.tiff
Raster filepath for tibu: ../data/neg_masks/tibu_neg.tiff
Raster filepath for arauquita: ../data/neg_masks/arauquita_neg.tiff
Raster filepath for soacha: ../data/neg_masks/soacha_neg.tiff
Raster filepath for bogota: ../data/neg_masks/bogota_neg.tiff


## Generate Training Set

In [None]:
data, area_code = geoutils.generate_training_data(area_dict)
print('Area code: {}'.format(area_code))
print('Data dimensions: {}'.format(data.shape))
data.head(3)

## Save and Upload Final Dataset

In [11]:
output_file = data_dir + '20200504_dataset.csv'
data.to_csv(output_file, index=False)

In [12]:
!gsutil -q -m cp {output_file} gs://immap-training/