## **The Training Data**
<p>The Landslide4Sense data consists of the training, validation, and test sets containing 3799, 245, and 800 image patches, respectively. Each image patch is a composite of 14 bands that include:

> **Multispectral data from Sentinel-2 : B1, B2, B3, B4, B5, B6, B7, B8, B9, B10, B11, B12**.
>
> **Slope data from ALOS PALSAR: B13**.
>
> **Digital elevation model (DEM) from ALOS PALSAR: B14**.

All bands in the competition dataset are resized to the resolution of ~10m per pixel. The image patches have the size of 128 x 128 pixels and are labeled pixel-wise.</p>

## **The Testing Data**
<p>Including UAV data of Van Yen and Mu Cang Chai districts:
    
> **High resolution ~ 0.17m**.
>
> **3 bands - Red, Green, Blue**.</p>

# **Pre-processing**
<p>
    This step is for data preparation and tuning. Particularly, data will be changed on how data are stored but remain the original size of (128,128,3). Data can be flipped, rotated, clipped or so on to increase the sense of model and support predictions with more details of recognitivity
</p>

In [1]:
# set environment variables
import os
from utils import *

%set_env LOCAL_TRAINDATA_DIR=Data/TrainData
%set_env LOCAL_VALIDDATA_DIR=Data/ValidData
%set_env LOCAL_PARENT_DIR = Input
%set_env LOCAL_PREDICT_SET_DIR = Predict_Set

# set paths for training images, masks, and valid images
image_dir=os.path.join(os.getenv('LOCAL_TRAINDATA_DIR'), 'img')
mask_dir=os.path.join(os.getenv('LOCAL_TRAINDATA_DIR'), 'mask')
val_dir = os.path.join(os.getenv('LOCAL_VALIDDATA_DIR'), 'img')
pred_dir = os.path.join(os.getenv('LOCAL_PREDICT_SET_DIR'),'')
parent_dir = os.path.join(os.getenv('LOCAL_PARENT_DIR'), '')

env: LOCAL_TRAINDATA_DIR=Data/TrainData
env: LOCAL_VALIDDATA_DIR=Data/ValidData
env: LOCAL_PARENT_DIR=Input
env: LOCAL_PREDICT_SET_DIR=Predict_Set


### **#1 - Finding data**

In [2]:
image_files, image_holder = pp.find_data(image_dir) # find training images

label_files, label_holder = pp.find_data(mask_dir) # find training labels
mask_holder = pp.h5label_to_array(label_holder) # extracting label array

val_files, val_holder = pp.find_data(val_dir) # find validating images

Files are found in folder Data/TrainData\img in 6.886787176132202 seconds
Files are found in folder Data/TrainData\mask in 16.099828720092773 seconds
Extracting...


Progress:   0%|          | 0/3799 [00:00<?, ?it/s]

Convert h5py to numpy array done in 2.25927472114563 seconds
Files are found in folder Data/ValidData\img in 1.3205757141113281 seconds


### **#2 - Nomarlize data**

In [3]:
image_holder_norms = pp.rgb_norm_data(image_holder, data_type='float32')
val_holder_norms = pp.rgb_norm_data(val_holder, data_type='float32')

Start normalization...


Normalization:   0%|          | 0/3799 [00:00<?, ?it/s]

Channel-goes-last (height, width, channel)
Free RAM space


Loading...:   0%|          | 0/3799 [00:00<?, ?it/s]

Normalization is processed in 243.88164925575256 seconds
Start normalization...


Normalization:   0%|          | 0/245 [00:00<?, ?it/s]

Channel-goes-last (height, width, channel)
Free RAM space


Loading...:   0%|          | 0/245 [00:00<?, ?it/s]

Normalization is processed in 15.383058786392212 seconds


### **#3 - Data Augmentation for Training dataset**

In [4]:
# flip data
flipped_images, flipped_masks = pp.flip_image(image_holder_norms, label_holder, axis_to_flip=None)
# rotate data 30 degree
rotated_images, rotated_masks = pp.rotate_image(image_holder_norms, label_holder, 30, reshape=False)
# trim data
trim_images, trim_masks = pp.trim_array(image_holder_norms, mask_holder, 128)

Flipping...


In Flipping loop:   0%|          | 0/3799 [00:00<?, ?it/s]

Flipping done in 2.0563392639160156 seconds
Rotating...


In Rotating loop:   0%|          | 0/3799 [00:00<?, ?it/s]

Rotating done in 109.38085913658142 seconds
Padding...


In Padding loop:   0%|          | 0/3799 [00:00<?, ?it/s]

Trimming...


In Trimming loop:   0%|          | 0/3799 [00:00<?, ?it/s]

Trimming done in 15.271093845367432 seconds


### **#4 - Save to Numpy file**

In [5]:
# save file to npy
import os
os.chdir(os.getenv('LOCAL_PARENT_DIR')) # change directory

import time
import numpy as np

init_time = time.time()


# stack
image_stack = np.stack(image_holder_norms + flipped_images + rotated_images + trim_images) # concatenate the image_holder_norms, flipped_images, rotated_images, trim_images
label_stack = np.stack(mask_holder + flipped_masks + rotated_masks + trim_masks) # concatenate the mask_holder, flippped_masks, rotated_masks, trim_masks
valid_stack = np.stack(val_holder_norms) # stack val_holder_norms

# save now
np.save('image_patch128.npy', image_stack)
np.save('mask_patch128.npy', label_stack)
np.save('valid_patch128.npy', valid_stack)

final_time = time.time()
tot_time = final_time - init_time
#----------------------------------------------------------------
print(f'Numpy files are created in {tot_time} seconds')


Numpy files are created in 57.7584433555603 seconds
