<p><strong><font size="6">WALOUS</font></strong></p>

<p><strong><font size="6">Land Use Classification</font></strong></p>

<p><strong><font size="6">Validation</font></strong></p>

This python code implement the method developed by ANAGEO (ULB). 

Code developped on Linux Mint 18.1 (Ubuntu Xenial 16.04), PostgreSQL 9.6.3, PostGIS 2.4.4 (r16526), GRASS GIS 7.3.svn (r71315) and GDAL 1.10.1

## List of dependencies

- PostgreSQL installed on local computer or remote server, with PostGIS installed. A database should already have been created and postgis extension created on it.
- "shp2pgsql" program which should already be installed by postgis installation process. See [this quick guide](http://www.bostongis.com/pgsql2shp_shp2pgsql_quickguide.bqg) for more information on the use of this program. 

# Table of Contents

<div id="toc"></div>

The following cell is a Javascript section of code for building the Jupyter notebook's table of content.

In [None]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

# Define working environment

**Import libraries**

In [None]:
# Import libraries needed for setting parameters of operating system 
import os
import sys
import csv
import tempfile
import glob

In [None]:
## Import Psycopg2 library (interection with postgres database)
import psycopg2
## Import Subprocess
import subprocess

In [None]:
## Import Pandas library (View and manipulaiton of tables)
import pandas as pd
pd.set_option('display.max_columns', 100)
import pandas.io.sql as sqlio

In [None]:
## Import multiprocessing and functools libraries
import multiprocessing
from multiprocessing import Pool
from functools import partial

**Add folder with SCR provided belong to this notebook**

In [None]:
# Add local module to the path
src = os.path.abspath('../SRC')
if src not in sys.path:
    sys.path.append(src)

**Setup environment variables**

Please edit the file in `../SRC/config.py`, containing the configuration parameters, according to your own computer setup. The following cell is used to run this file.



In [None]:
run ../SRC/config.py

In [None]:
print(config_parameters)

In [None]:
# Import functions that setup the environmental variables
import environ_variables as envi

In [None]:
# Set environmental variables
envi.setup_environmental_variables() 
# Display current environment variables of your computer
envi.print_environmental_variables()

**Other functions**

In [None]:
# Import functions for processing time information
import time
from processing_time import start_processing, print_processing_time
# Import function that check and create folder
from mkdir import check_create_dir

**Custom functions: Psycopg2 and Postgresql functions**

In [None]:
# Import function that display postgresql's table header
from display_header import display_header
# Import function to creation connection to Postgresql database 
from postgres_functions import create_pg_connexion
# Import function to creation of Postgresql schema 
from postgres_functions import create_pg_schema
# Import function to give rights to user on a specific schema
from postgres_functions import grant_user

In [None]:
# Import function that manage importation of a Shapefile into postgresql database
from postgres_import import shp2pgsql

# Create new directory for validation results

In [None]:
# Check and create folder if needed
check_create_dir(config_parameters['validationfolder'])

# Create new schema and import validation set

In [None]:
# Create connexion to postgres database
con = create_pg_connexion(config_parameters)
# Create new schema
create_pg_schema(con, 'validation', overwrite=False)
grant_user(con, 'validation', 'bbeaumont')
# Close connexion to postgres database
con.close()

In [None]:
# Import shapefile into postgis database
shp2pgsql(data['validation'], 'validation', config_parameters, from_srid='31370', to_srid='31370', 
          create_opt='-d', psql_stdout=True, quiet=True) 

# Create table with reference label and predicted label 

In [None]:
#from __main__ import *
import sys
import psycopg2
import time
from processing_time import start_processing, print_processing_time
   
def CreateTableReferencePredicted(con, ref_schema, ref_table, pred_schema, pred_table):
    '''
    Function to create table with both reference label and classification prediction  
    
    Args: 
    'con' 
    'result_table_schema' 
    'result_table_name' 
    
    Returns:
    
    '''
    try:
        # Time at starting
        begintime = time.time() 
        # Create cursor
        cursor = con.cursor()
        # Drop table if exists
        query = 'DROP TABLE IF EXISTS %s.valid_pred_ref;'%(ref_schema)
        print(query + "\n")
        cursor.execute(query)
        con.commit()
        # Subquery
        subquery = "SELECT a.geom, a.capakey, walousmajv, b.walousmaj, \
        left(a.walousmajv,1) as ref_l1, left(a.walousmajv,3) as ref_l2, \
        a.codesecond, a.certitude, a.proportion, \
        left(b.walousmaj,1) as pred_l1, left(b.walousmaj,3) as pred_l2 \
        FROM %s.%s AS a LEFT JOIN %s.%s AS b \
        ON a.capakey = b.capakey"%(ref_schema, ref_table, pred_schema, pred_table)
        # Create table
        query = 'CREATE TABLE %s.valid_pred_ref AS(%s);'%(ref_schema,subquery)
        print(query + "\n")
        cursor.execute(query)
        con.commit()
        # Add columns
        queries = []
        queries.append('ALTER TABLE %s.valid_pred_ref ADD COLUMN IF NOT EXISTS agreement_l1 text'%ref_schema)
        queries.append('ALTER TABLE %s.valid_pred_ref ADD COLUMN IF NOT EXISTS agreement_l2 text'%ref_schema)
        print(";\n".join(queries)+";\n")
        cursor.execute("; ".join(queries))
        con.commit() 
        # Update column
        queries = []
        queries.append("UPDATE %s.valid_pred_ref SET ref_l2 = NULL WHERE LENGTH(ref_l2) < 3"%ref_schema)
        queries.append("UPDATE %s.valid_pred_ref SET pred_l2 = NULL WHERE LENGTH(pred_l2) < 3"%ref_schema)
        queries.append("UPDATE %s.valid_pred_ref SET agreement_l1 = \
        CASE WHEN ref_l1 = pred_l1 THEN True ELSE False END"%ref_schema)
        queries.append("UPDATE %s.valid_pred_ref SET agreement_l2 = \
        CASE WHEN ref_l2 = pred_l2 THEN True ELSE \
        CASE WHEN ref_l2 IS NOT NULL AND pred_l2 IS NOT NULL THEN False ELSE NULL END END"%ref_schema)
        print(";\n".join(queries)+";\n")
        cursor.execute("; ".join(queries))
        con.commit()          
        ## Print processing time
        print(print_processing_time(begintime, "Creation of table with reference and prediction achieved in "))
    except (Exception, psycopg2.DatabaseError) as error:
        sys.exit(error)

In [None]:
# Create connexion to postgres database
con = create_pg_connexion(config_parameters)
# Create table with 
CreateTableReferencePredicted(con, 'validation', data['validation'][0], 'validation', 'walousmaj_stratif_l1_200pt')
# Close connexion to postgres database
con.close()

In [None]:
## Import libraries
import sys
import psycopg2
import time
import matplotlib.pyplot as plt
import itertools
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import cohen_kappa_score
from sklearn.metrics import classification_report
from sklearn.metrics import f1_score

def GetAccuracyMeasure(con, schema, table, ref_column, pred_column, 
                       classes, output_folder, condition=None, weight=False):
    
    def plot_confusion_matrix(cm, classes,
                              normalize=False,
                              title='Confusion matrix',
                              cmap=plt.cm.Blues):
        """
        This function prints and plots the confusion matrix.
        Normalization can be applied by setting `normalize=True`.
        """
        plt.imshow(cm, interpolation='nearest', cmap=cmap)
        plt.title(title)
        plt.colorbar()
        tick_marks = np.arange(len(classes))
        plt.xticks(tick_marks, classes, ha='right', rotation=45)
        plt.yticks(tick_marks, classes)

        if normalize:
            cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
            print("Normalized confusion matrix")
        else:
            print('Confusion matrix, without normalization')

        #print(cm)

        thresh = cm.max() / 2.
        for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
            if normalize:
                plt.text(j, i, round(cm[i, j],2),
                horizontalalignment="center",
                color="white" if cm[i, j] > thresh else "black")
            else:
                plt.text(j, i, cm[i, j],
                horizontalalignment="center",
                color="white" if cm[i, j] > thresh else "black")

        plt.tight_layout()
        plt.ylabel('True label')
        plt.xlabel('Predicted label')

    def GetRefPredLists(con, schema, table, ref_column, pred_column, condition):
        '''
        Function that return lists with reference label and prediction labels

        Args: 
        'con' 
        'result_table_schema' 
        'result_table_name' 

        Returns:

        '''
        try:
            # Query
            query = "SELECT %s, %s, ROUND(ST_Area(geom)) FROM %s.%s "%(ref_column, pred_column, schema, table)
            if condition:
                query += "WHERE %s"%condition  
            cursor = con.cursor()
            cursor.execute(query)
            con.commit()
            return zip(*cursor.fetchall())
            cursor.close()
        except (Exception, psycopg2.DatabaseError) as error:
            sys.exit(error)               
    
    # Check and create folder if needed
    check_create_dir(output_folder)
    
    # Get list with reference labels, prediction labels and area for weighting samples 
    ref_list, pred_list, area_list = GetRefPredLists(con,schema,table,ref_column,pred_column,condition)
    
    ##### Confusion matrix #####
    # Compute confusion matrix
    if weight:
        cnf_matrix = confusion_matrix(ref_list, pred_list, sample_weight=area_list)
    else:
        cnf_matrix = confusion_matrix(ref_list, pred_list)
    ## Export the row confusion matrix to output folder
    output_rowconfmat = os.path.join(output_folder,"rowconfusionmatrix.txt")
    np.savetxt(output_rowconfmat, cnf_matrix.astype(np.int), fmt='%d', delimiter=",")
    
    # Plot non-normalized confusion matrix
    plot_title = 'Confusion matrix'
    plotnorm_title = 'Confusion matrix (normalized)'
    if condition:
        plot_title += ' - Condition: %s'%condition
        plotnorm_title += ' - Condition: %s'%condition
    if weight:
        plot_title += ' - Area weighted'
        plotnorm_title += ' - Area weighted'
    fig_cm = plt.figure(figsize=(15,10))
    plot_confusion_matrix(cnf_matrix, classes=classes,title=plot_title)
    # Plot normalized confusion matrix
    fig_cm_normal=plt.figure(figsize=(15,10))
    plot_confusion_matrix(cnf_matrix, classes=classes, normalize=True,title=plotnorm_title)
    ## Set the path to the output
    output_confmat_pdf = os.path.join(output_folder,"confusionmatrix.pdf")
    output_confmatA_png = os.path.join(output_folder,"confusionmatrixA.png")
    output_confmatB_png = os.path.join(output_folder,"confusionmatrixB.png")
    # Export in PDF
    from matplotlib.backends.backend_pdf import PdfPages
    pp = PdfPages(output_confmat_pdf)
    pp.savefig(fig_cm)
    pp.savefig(fig_cm_normal)
    pp.close()
    # Export in PNG
    fig_cm.savefig(output_confmatA_png, format='png', dpi=300)
    fig_cm_normal.savefig(output_confmatB_png, format='png', dpi=300)
    
    ##### Classification repport #####
    # Define dataset to take into account
    y_true = ref_list
    y_pred = pred_list
    class_label = classes
    # Compute precision accuracy
    if weight:
        accuracy = accuracy_score(y_true, y_pred, normalize=True, sample_weight=area_list)
    else:
        accuracy = accuracy_score(y_true, y_pred, normalize=True)
    # Compute Cohen's Kappa
    if weight:
        cohen_kappa = cohen_kappa_score(y_true, y_pred, sample_weight=area_list)
    else:
        cohen_kappa = cohen_kappa_score(y_true, y_pred)    
    # Compute f1-score
    if weight:
        f_1 = f1_score(y_true, y_pred, average='weighted', sample_weight=area_list)
    else:
        f_1 = f1_score(y_true, y_pred, average='weighted')
    # Compute 'classification report'
    if weight:
        classif_report = classification_report(y_true, y_pred, target_names=class_label, sample_weight=area_list)
    else:
        classif_report = classification_report(y_true, y_pred, target_names=class_label)    
    # Save as .txt file
    output = os.path.join(output_folder,"classif_report.txt")
    f = open(output, 'w')
    f.write("Folder name: '%s' \n"%output_folder)
    f.write("\n\n")
    f.write("Filter condition: '%s' \n"%condition)
    f.write("\n\n")
    f.write("----- Accuracy measures -----\n")
    f.write("Overall Accuracy: "+str(accuracy)+"\n")
    f.write("Cohen's Kappa: "+str(cohen_kappa)+"\n")
    f.write("F1-score: "+str(f_1)+"\n")
    f.write("\n\n")
    f.write("----- Classification report -----\n")
    f.write(classif_report)
    f.close()
    # Show file content
    f = open(output,'r')
    file_contents=f.read()
    print(file_contents)
    f.close()

# Compute accuracy measures - level 1

In [None]:
# Create connexion to postgres database
con = create_pg_connexion(config_parameters)
# Define output folder
output_folder = os.path.join(config_parameters['validationfolder'],"l1_validation")
# List of labels
classes = ['1_ProductionPrimaire', '2_ProductionSecondaire', 
           '3_ProductionTertiaire', '4_Reseaux', '5_Residentiel', '6_Autres']
# Compute accuracy measures 
GetAccuracyMeasure(con, schema='validation', table='valid_pred_ref', ref_column='ref_l1', pred_column='pred_l1', 
                   classes=classes, output_folder=output_folder, condition='certitude::int > 80', weight=False)
# Close connexion to postgres database
con.close()

# Compute accuracy measures - level 1 - surface weighted

In [None]:
# Create connexion to postgres database
con = create_pg_connexion(config_parameters)
# Define output folder
output_folder = os.path.join(config_parameters['validationfolder'],"l1_validation_weighted")
# List of labels
classes = ['1_ProductionPrimaire', '2_ProductionSecondaire', 
           '3_ProductionTertiaire', '4_Reseaux', '5_Residentiel', '6_Autres']
# Compute accuracy measures 
GetAccuracyMeasure(con, schema='validation', table='valid_pred_ref', ref_column='ref_l1', pred_column='pred_l1', 
                   classes=classes, output_folder=output_folder, condition='certitude::int > 80', weight=True)
# Close connexion to postgres database
con.close()

# Compute accuracy measures - level 1 - ALL

In [None]:
# Create connexion to postgres database
con = create_pg_connexion(config_parameters)
# Define output folder
output_folder = os.path.join(config_parameters['validationfolder'],"l1_validation_all")
# List of labels
classes = ['1_ProductionPrimaire', '2_ProductionSecondaire', 
           '3_ProductionTertiaire', '4_Reseaux', '5_Residentiel', '6_Autres']
# Compute accuracy measures 
GetAccuracyMeasure(con, schema='validation', table='valid_pred_ref', ref_column='ref_l1', pred_column='pred_l1', 
                   classes=classes, output_folder=output_folder, weight=False)
# Close connexion to postgres database
con.close()

# Compute accuracy measures - level 1 - surface weighted - ALL

In [None]:
# Create connexion to postgres database
con = create_pg_connexion(config_parameters)
# Define output folder
output_folder = os.path.join(config_parameters['validationfolder'],"l1_validation_weighted_all")
# List of labels
classes = ['1_ProductionPrimaire', '2_ProductionSecondaire', 
           '3_ProductionTertiaire', '4_Reseaux', '5_Residentiel', '6_Autres']
# Compute accuracy measures 
GetAccuracyMeasure(con, schema='validation', table='valid_pred_ref', ref_column='ref_l1', pred_column='pred_l1', 
                   classes=classes, output_folder=output_folder, weight=True)
# Close connexion to postgres database
con.close()

# Compute accuracy measures - level 2

In [None]:
# Create connexion to postgres database
con = create_pg_connexion(config_parameters)
# Define output folder
output_folder = os.path.join(config_parameters['validationfolder'],"l2_validation")
# List of labels
classes = ['1_1','1_2','1_3',
           '2_1','2_2','2_3','2_4',
           '3_1','3_2','3_3','3_4',
           '4_1','4_3',
           '5_1','5_2','5_3',
           '6_1','6_2','6_3','6_6']
# Compute accuracy measures 
GetAccuracyMeasure(con, schema='validation', table='valid_pred_ref', ref_column='ref_l2', pred_column='pred_l2', 
                   classes=classes, output_folder=output_folder, 
                   condition='ref_l2 is not null and pred_l2 is not null and certitude::int > 80', weight=False)
# Close connexion to postgres database
con.close()

# Compute accuracy measures - level 2 - surface weighted

In [None]:
# Create connexion to postgres database
con = create_pg_connexion(config_parameters)
# Define output folder
output_folder = os.path.join(config_parameters['validationfolder'],"l2_validation_weighted")
# List of labels
classes = ['1_1','1_2','1_3',
           '2_1','2_2','2_3','2_4',
           '3_1','3_2','3_3','3_4',
           '4_1','4_3',
           '5_1','5_2','5_3',
           '6_1','6_2','6_3','6_6']
# Compute accuracy measures 
GetAccuracyMeasure(con, schema='validation', table='valid_pred_ref', ref_column='ref_l2', pred_column='pred_l2', 
                   classes=classes, output_folder=output_folder, 
                   condition='ref_l2 is not null and pred_l2 is not null and certitude::int > 80', weight=True)
# Close connexion to postgres database
con.close()

# Compute accuracy measures - level 2 - ALL

In [None]:
# Create connexion to postgres database
con = create_pg_connexion(config_parameters)
# Define output folder
output_folder = os.path.join(config_parameters['validationfolder'],"l2_validation_all")
# List of labels
classes = ['1_1','1_2','1_3',
           '2_1','2_2','2_3','2_4',
           '3_1','3_2','3_3','3_4',
           '4_1','4_2','4_3',
           '5_1','5_2','5_3',
           '6_1','6_2','6_3','6_6']
# Compute accuracy measures 
GetAccuracyMeasure(con, schema='validation', table='valid_pred_ref', ref_column='ref_l2', pred_column='pred_l2', 
                   classes=classes, output_folder=output_folder, 
                   condition='ref_l2 is not null and pred_l2 is not null', weight=False)
# Close connexion to postgres database
con.close()

# Compute accuracy measures - level 2 - surface weighted

In [None]:
# Create connexion to postgres database
con = create_pg_connexion(config_parameters)
# Define output folder
output_folder = os.path.join(config_parameters['validationfolder'],"l2_validation_weighted_all")
# List of labels
classes = ['1_1','1_2','1_3',
           '2_1','2_2','2_3','2_4',
           '3_1','3_2','3_3','3_4',
           '4_1','4_2','4_3',
           '5_1','5_2','5_3',
           '6_1','6_2','6_3','6_6']
# Compute accuracy measures 
GetAccuracyMeasure(con, schema='validation', table='valid_pred_ref', ref_column='ref_l2', pred_column='pred_l2', 
                   classes=classes, output_folder=output_folder, 
                   condition='ref_l2 is not null and pred_l2 is not null', weight=True)
# Close connexion to postgres database
con.close()

# Stratified random selection of cadastral plots for visual validation