<h1> <center> Exploiting Sentinel-1 imagery time series to detect grasslands in northern Brazil tropical plains</center> </h1>
<h3> <center> Part 1 - Data preparation </center> </h3>
<center> Arian Ferreira Carneiro </center>
<center>Willian Vieira de Oliveira </center>

---
[Dicas Sidnei] Para justificar o trabalho:

- buscar imagens ópticas
- mostrar as limitações dessas imagens e a importância do uso de imagens de radar na região

<b>Websites that might be useful: </b>

https://gee-python-api.readthedocs.io/en/latest/ee.html

   - Tips to work with large datasets in Pandas:
   
    https://towardsdatascience.com/why-and-how-to-use-pandas-with-large-data-9594dda2ea4c


   - Converting from javascript to python api
   
    https://gis.stackexchange.com/questions/336080/converting-map-from-javascript-api-to-python-api
    
    https://github.com/GreenInfo-Network/earthengine-prototyping/issues/6
    
    
   - GEE data in Pandas
    
    https://mygeoblog.com/2017/01/13/your-gee-data-in-pandas/
    
    https://mygeoblog.com/2017/10/06/from-gee-to-numpy-to-geotiff/
    
    
   - Others:
    
    https://www.linkedin.com/pulse/cloud-computing-land-cover-classification-part-1-how-jo%C3%A3o-otavio/
    
    
---

## Import required packages

In [1]:
import numpy as np
import pandas as pd
import geopandas as gpd
from osgeo import gdal, gdal_array, ogr
#from osgeo import osr
#import matplotlib.pyplot as plt
#import time

***

## Input parameters

### Image data cubes

In [2]:
# Directories for the image cubes
dir_NL = "DATA/ee_export/NL.tif"
dir_Ratio = "DATA/ee_export/Ratio.tif"
dir_RGI = "DATA/ee_export/RGI.tif"
dir_VH = "DATA/ee_export/VH.tif"
dir_VV = "DATA/ee_export/VV.tif"

dir_datacubes = [dir_NL, dir_Ratio, dir_RGI, dir_VH, dir_VV]

dir_output = "OUTPUT/"
filenames = ['NL', 'Ratio', 'RGI', 'VH', 'VV']

columns = ['2017-09-22', '2017-10-04','2017-10-16','2017-10-28','2017-11-09','2017-11-21','2017-12-03','2017-12-15',
           '2017-12-27','2018-01-08','2018-01-20','2018-02-01','2018-02-13','2018-02-25','2018-03-09','2018-03-21',
           '2018-04-02','2018-04-14','20 18-04-26','2018-05-08','2018-05-20','2018-06-01','2018-06-13','2018-06-25',
           '2018-07-07','2018-07-19','2018-07-31','2018-08-12','2018-08-24','2018-09-05','2018-09-17']

# Would you like to write the dataframes to CSV files? ['YES', 'NO']
#write_CSV = 'NO'
write_CSV = 'YES'

### Samples

In [3]:
## Shapefiles with point locations of samples of different classes
shp_pasto = "DATA/ee_export/Biomas/2_pasto/8_centroid/pPasto.shp"
shp_floresta = "DATA/ee_export/Biomas/3_floresta/8_centroid/centroid_fl.shp"
shp_agricultura = "DATA/ee_export/Biomas/4_agricultura/8_centroid/cent_agri.shp"

dir_samples = [shp_pasto, shp_floresta, shp_agricultura]
filenames_samples = ['Pasto', 'Floresta', 'Agricultura']

***

## Procedure 1 - Generation of a dataframe that describes the time series of each pixel

We obtained a dataframe regarding each one the image data cubes. 

First of all, we removed border pixels (3-pixel wide) around the data cubes due to the presence of null pixels. We read the data cubes as n-dimensional arrays. Then, we converted these arrays to flatten arrays, where each line represents a pixel and each column presents the value observed in each band of the time series. Finally, we converted this array to dataframe and wrote it to file.

In [4]:
# FUNCTIONS

def openImage(filepath):
    data = gdal.Open(filepath)
    return data

In [5]:
file_id = 0
df_datacubes = []

for filepath in dir_datacubes:
    print("Processing file: ", filepath)
    datacube = openImage(filepath)

    Nrows = datacube.RasterYSize - 6 # We do not consider border pixels. We removed both the first and the last three rows.
    Ncols = datacube.RasterXSize - 6 # We do not consider border pixels. We removed both the first and the last three columns.
    Nbands = datacube.RasterCount
    
    arr = datacube.ReadAsArray(3, 3, Ncols, Nrows) # xoff, yoff, xcount, ycount
    #print(arr.shape)
    
    df_list = []
    for band in range(0, Nbands):
        array = arr[band].flatten()
        df = pd.DataFrame(array, columns=[columns[band]])
        df_list.append(df)
       
    df_datacube = pd.concat(df_list, axis=1)    
    
    # Writing the dataframe to CSV file
    if (write_CSV == 'YES'):
        filename = dir_output+filenames[file_id]+'.csv'
        try:
            #df_datacube.to_csv(filename, sep=',', index=False, encoding='utf-8')
            df_datacube.to_csv(filename, sep=',', index=False, encoding='utf-8-sig') # using 'utf-8-sig' encoding 
                                                                            #improves efficiency to open it on Excel.
            print("The dataframe was written to file!")
        except Exception as e:
            print(str(e))
    
    df_datacubes.append(df_datacube)
    
    # Closing file
    datacube = None
    
    file_id = file_id + 1
    print("\n")

Processing file:  DATA/ee_export/NL.tif
The dataframe was written to file!


Processing file:  DATA/ee_export/Ratio.tif
The dataframe was written to file!


Processing file:  DATA/ee_export/RGI.tif
The dataframe was written to file!


Processing file:  DATA/ee_export/VH.tif
The dataframe was written to file!


Processing file:  DATA/ee_export/VV.tif
The dataframe was written to file!




***

## Procedure 2 - Generation of metrics for classification

We used the dataframes previously obtained in order to computed different metrics. We computed metrics for each pixel using the values observed in their respective time series. 

In [6]:
# FUNCTIONS

def ComputeMetrics(df):
    # Header of the new dataframe
    metrics_header = np.array(['Mean', 'Std', 'Sum', 'Min', 'Max', 'Amplitude', 'CoefVariation', 'FirstSlope'])
    
    # Dataframe, composed only by the header
    df_metrics = pd.DataFrame(columns=metrics_header)
    
    # Metrics
    df_metrics['Mean'] = df.apply(lambda row : row.mean(), axis = 1)
    df_metrics['Std'] = df.apply(lambda row : row.std(), axis = 1)
    df_metrics['Sum'] = df.apply(lambda row : row.sum(), axis = 1)
    df_metrics['Min'] = df.apply(lambda row : row.min(), axis = 1)
    df_metrics['Max'] = df.apply(lambda row : row.max(), axis = 1)
    df_metrics['Amplitude'] = df.apply(lambda row : row.max()-row.min(), axis = 1)
    df_metrics['CoefVariation'] = df.apply(lambda row : row.std()/row.mean(), axis = 1)
    df_metrics['FirstSlope'] = df.apply(lambda row : np.max(abs(np.diff(row))), axis = 1) # First slope maximum
        
    return df_metrics

In [7]:
file_id = 0
list_df_metrics = []

for df in df_datacubes:
    print("Processing metrics for: ", filenames[file_id])
    df_metrics = ComputeMetrics(df)
    
    # Writing the dataframe to CSV file
    if (write_CSV == 'YES'):
        filename = dir_output+filenames[file_id]+'_metrics'+'.csv'
        try:
            #df_metrics.to_csv(filename, sep=',', index=False, encoding='utf-8')
            df_metrics.to_csv(filename, sep=',', index=False, encoding='utf-8-sig') # using 'utf-8-sig' encoding 
                                                                            #improves efficiency to open it on Excel.
            print("The dataframe was written to file!")
        except Exception as e:
            print(str(e))
    
    list_df_metrics.append(df_metrics)
    
    file_id = file_id + 1
    print("\n")

Processing metrics for:  NL
The dataframe was written to file!


Processing metrics for:  Ratio
The dataframe was written to file!


Processing metrics for:  RGI
The dataframe was written to file!


Processing metrics for:  VH
The dataframe was written to file!


Processing metrics for:  VV
The dataframe was written to file!




***

## Procedure 3 - Definition of sample sets

We used the function 'ExtractSamples' in order to extract the time series of the pixels identified by the point locations described in the shapefile archives. We performed this procedure to each one of the data cubes.

In [9]:
def ExtractSamples(raster, shp):
    
    #Define header
    #header=['X', 'Y'] # In case we desire to include the coordinates in the dataframe
    header_dates = ['2017-09-22', '2017-10-04','2017-10-16','2017-10-28','2017-11-09','2017-11-21','2017-12-03','2017-12-15',
           '2017-12-27','2018-01-08','2018-01-20','2018-02-01','2018-02-13','2018-02-25','2018-03-09','2018-03-21',
           '2018-04-02','2018-04-14','2018-04-26','2018-05-08','2018-05-20','2018-06-01','2018-06-13','2018-06-25',
           '2018-07-07','2018-07-19','2018-07-31','2018-08-12','2018-08-24','2018-09-05','2018-09-17']
    
    #create dataframe
    #df = pd.DataFrame(columns=header+header_dates)
    df = pd.DataFrame(columns=header_dates)
    
    #df_temp = pd.DataFrame(np.nan, index=range(1), columns=header+header_dates)
    df_temp = pd.DataFrame(np.nan, index=range(1), columns=header_dates)
    
    src_ds = gdal.Open(raster)
    gt = src_ds.GetGeoTransform()
    
    Nbands = src_ds.RasterCount
      
    ds = ogr.Open(shp)
    lyr = ds.GetLayer()
    
    row = 0
    for feat in lyr:
        geom = feat.GetGeometryRef()
        mx, my = geom.GetX(), geom.GetY() # coord in map units
        
        # Convert from map to pixel coordinates
        # Only works for geotransforms with no rotation
        px = int((mx - gt[0]) / gt[1]) # x pixel
        py = int((my - gt[3]) / gt[5]) # y pixel
        
        #df_temp['X'].loc[0] = float(mx)
        #df_temp['Y'].loc[0] = float(my)
        
        for band in range(0, Nbands):
            rb = src_ds.GetRasterBand(band+1)
            intval = rb.ReadAsArray(px, py, 1, 1)
            
            df_temp[header_dates[band]].loc[0] = float(intval[0]) #### this is the value of the pixel, forcing it to a float 
        
        df.loc[row] = df_temp.loc[0]
        row = row + 1
        
    # Closing files
    src_ds = None
    ds = None
    
    return df

We used the function previouly generated to extract pixel values from several point locations in order to compose sample sets related to the classes analysed in this study. In this procedure, we obtain a dataframe that describes the samples related to each one of the classes.

In [10]:
list_df_samples = []

raster_id = 0
for raster in dir_datacubes:
    file_id = 0
    print("Defining samples for raster: ", filenames[raster_id])
    
    for shp in dir_samples:
        print("  Class: ", filenames_samples[file_id])
        
        # Extracting sample values
        df = ExtractSamples(raster, shp)
        
        # Writing the dataframe to file
        if (write_CSV == 'YES'):
            filename = dir_output+filenames[raster_id]+'_Samples_'+filenames_samples[file_id]+'.csv'
        
            try:
                #df.to_csv(filename, sep=',', index=False, encoding='utf-8')
                df.to_csv(filename, sep=',', index=False, encoding='utf-8-sig') # using 'utf-8-sig' encoding 
                                                                                #improves efficiency to open it on Excel.
                print("    The dataframe was written to file!")
            except Exception as e:
                print(str(e))

        list_df_samples.append(df)
        
        file_id = file_id + 1
    raster_id = raster_id + 1
    print("\n")

Defining samples for raster:  NL
  Class:  Pasto
    The dataframe was written to file!
  Class:  Floresta
    The dataframe was written to file!
  Class:  Agricultura
    The dataframe was written to file!


Defining samples for raster:  Ratio
  Class:  Pasto
    The dataframe was written to file!
  Class:  Floresta
    The dataframe was written to file!
  Class:  Agricultura
    The dataframe was written to file!


Defining samples for raster:  RGI
  Class:  Pasto
    The dataframe was written to file!
  Class:  Floresta
    The dataframe was written to file!
  Class:  Agricultura
    The dataframe was written to file!


Defining samples for raster:  VH
  Class:  Pasto
    The dataframe was written to file!
  Class:  Floresta
    The dataframe was written to file!
  Class:  Agricultura
    The dataframe was written to file!


Defining samples for raster:  VV
  Class:  Pasto
    The dataframe was written to file!
  Class:  Floresta
    The dataframe was written to file!
  Class:  Agri

### Extracting metrics for all samples sets
After we obtained the sample sets, we computed the same metrics obtained for the data cubes (Procedure 2).

In [11]:
list_df_samples_metrics = []

i = 0
raster_id = 0
for raster in dir_datacubes:
    file_id = 0
    print("Processing metrics for: ", filenames[raster_id])
    
    for shp in dir_samples:
        print("  Class: ", filenames_samples[file_id])
        df_metrics = ComputeMetrics(list_df_samples[i])

        # Writing the dataframe to CSV file
        if (write_CSV == 'YES'):
            filename = dir_output+filenames[raster_id]+'_Samples_'+filenames_samples[file_id]+'_metrics'+'.csv'
            
            try:
                #df_metrics.to_csv(filename, sep=',', index=False, encoding='utf-8')
                df_metrics.to_csv(filename, sep=',', index=False, encoding='utf-8-sig') # using 'utf-8-sig' encoding 
                                                                                #improves efficiency to open it on Excel.
                print("The dataframe was written to file!")
            except Exception as e:
                print(str(e))

        list_df_samples_metrics.append(df_metrics)

        i = i + 1
        file_id = file_id + 1
    raster_id = raster_id + 1
    print("\n")

Processing metrics for:  NL
  Class:  Pasto
The dataframe was written to file!
  Class:  Floresta
The dataframe was written to file!
  Class:  Agricultura
The dataframe was written to file!


Processing metrics for:  Ratio
  Class:  Pasto
The dataframe was written to file!
  Class:  Floresta
The dataframe was written to file!
  Class:  Agricultura
The dataframe was written to file!


Processing metrics for:  RGI
  Class:  Pasto
The dataframe was written to file!
  Class:  Floresta
The dataframe was written to file!
  Class:  Agricultura
The dataframe was written to file!


Processing metrics for:  VH
  Class:  Pasto
The dataframe was written to file!
  Class:  Floresta
The dataframe was written to file!
  Class:  Agricultura
The dataframe was written to file!


Processing metrics for:  VV
  Class:  Pasto
The dataframe was written to file!
  Class:  Floresta
The dataframe was written to file!
  Class:  Agricultura
The dataframe was written to file!




***

## Procedure 4 - Preparation of the files required for classification

The classification method implemented in this study require the description of the samples related to all classes in an unique dataframe. Therefore, we concatenated the dataframes obtained in Procedure 3. In addition, it was necessary to include a new column to these dataframes in order to identify the class associated with each pixel/sample in the final dataframe.

We performed this procedure in order to obtained three different dataframes. The first dataframe presents the time series of each sample as pixel values, while the second dataframe describes the time series of each sample using different metrics. The third dataframe presents the class identifiers associated with the samples described in the other dataframes. 

In the classification stage, we will use these dataframes to perform two different classifications, using the pixel values and the metrics.

### Class identifiers

We used the following numbers to identify the analysed classes.
    
    1 - Agricultura
    2 - Floresta
    3 - Pasto

In [13]:
i = 0
raster_id = 0
for raster in dir_datacubes:
    file_id = 0
    print("Processing pixel values and metrics for: ", filenames[raster_id])
    
    List_metrics = []
    List_pixelValues = []
    List_classes = []
    for shp in dir_samples:
        print("  Class: ", filenames_samples[file_id])
        
        if (filenames_samples[file_id] == 'Agricultura'):
            list_df_samples[i]['Class'] = 1
            list_df_samples_metrics[i]['Class'] = 1
        elif (filenames_samples[file_id] == 'Floresta'):
            list_df_samples[i]['Class'] = 2
            list_df_samples_metrics[i]['Class'] = 2
        elif (filenames_samples[file_id] == 'Pasto'):
            list_df_samples[i]['Class'] = 3
            list_df_samples_metrics[i]['Class'] = 3
        else: # Only to avoid errors. This condition is unlikely to be required
            list_df_samples[i]['Class'] = 0
            list_df_samples_metrics[i]['Class'] = 0
        
        List_pixelValues.append(list_df_samples[i])
        List_metrics.append(list_df_samples_metrics[i])
        List_classes.append(list_df_samples_metrics[i][['Class']])
        
        i = i + 1
        file_id = file_id + 1
        
    df_Samples_all_pixelValues = pd.concat(List_pixelValues, ignore_index=True)
    df_Samples_all_metrics = pd.concat(List_metrics, ignore_index=True)
    df_Samples_all_classes = pd.concat(List_classes, ignore_index=True)

    # Writing the dataframe to CSV file
    if (write_CSV == 'YES'):
        filename_pValues = dir_output+filenames[raster_id]+'_AllSamples_pValues'+'.csv'
        filename_metrics = dir_output+filenames[raster_id]+'_AllSamples_metrics'+'.csv'
        filename_classes = dir_output+filenames[raster_id]+'_AllSamples_classes'+'.csv'

        try:
            df_Samples_all_pixelValues.to_csv(filename_pValues, sep=',', index=False, encoding='utf-8-sig')
            df_Samples_all_metrics.to_csv(filename_metrics, sep=',', index=False, encoding='utf-8-sig')
            df_Samples_all_classes.to_csv(filename_classes, sep=',', index=False, encoding='utf-8-sig')
            print("    The dataframes were written to file!")
        except Exception as e:
            print(str(e))
    
    raster_id = raster_id + 1
    print("\n")

Processing pixel values and metrics for:  NL
  Class:  Pasto
  Class:  Floresta
  Class:  Agricultura
    The dataframes were written to file!


Processing pixel values and metrics for:  Ratio
  Class:  Pasto
  Class:  Floresta
  Class:  Agricultura
    The dataframes were written to file!


Processing pixel values and metrics for:  RGI
  Class:  Pasto
  Class:  Floresta
  Class:  Agricultura
    The dataframes were written to file!


Processing pixel values and metrics for:  VH
  Class:  Pasto
  Class:  Floresta
  Class:  Agricultura
    The dataframes were written to file!


Processing pixel values and metrics for:  VV
  Class:  Pasto
  Class:  Floresta
  Class:  Agricultura
    The dataframes were written to file!


