<p><strong><font size="6">WALOUS</font></strong></p>

<p><strong><font size="6">D_Descriptive_statistics</font></strong></p>

WALOUS_UTS - Copyright (C) <2020> <Service Public de Wallonie (SWP), Belgique,
					          		Institut Scientifique de Service Public (ISSeP), Belgique,
									Université catholique de Louvain (UCLouvain), Belgique,
									Université Libre de Bruxelles (ULB), Belgique>						 		
	
List of the contributors to the development of WALOUS_UTS: see LICENSE file.


Description and complete License: see LICENSE file.
	
This program (WALOUS_UTS) is free software:
you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option)
any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program (see COPYING file).  If not,
see <http://www.gnu.org/licenses/>.

---------
Jupyter Notebook containing the preprocessing steps consisting of: 
- Computing some descriptive statistics about the coherence of information in the input dataset

# Table of Contents

<div id="toc"></div>

The following cell is a Javascript section of code for building the Jupyter notebook's table of content.

In [None]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

# Define working environment

**Import libraries**

In [None]:
# Import libraries needed for setting parameters of operating system 
import os
import sys
import csv
import tempfile
import glob

In [None]:
## Import Psycopg2 library (interection with postgres database)
import psycopg2
## Import Subprocess
import subprocess

In [None]:
## Import Pandas library (View and manipulaiton of tables)
import pandas as pd
pd.set_option('display.max_columns', 100)

In [None]:
## Import multiprocessing and functools libraries
import multiprocessing
from multiprocessing import Pool
from functools import partial

**Add folder with SCR provided belong to this notebook**

In [None]:
# Add local module to the path
src = os.path.abspath('../SRC')
if src not in sys.path:
    sys.path.append(src)

**Setup environment variables**

Please edit the file in `../SRC/config.py`, containing the configuration parameters, according to your own computer setup. The following cell is used to run this file.



In [None]:
run ../SRC/config.py

In [None]:
print(config_parameters)

In [None]:
# Import functions that setup the environmental variables
import environ_variables as envi

In [None]:
# Set environmental variables
envi.setup_environmental_variables() 
# Display current environment variables of your computer
envi.print_environmental_variables()

**Other functions**

In [None]:
# Import functions for processing time information
import time
from processing_time import start_processing, print_processing_time

**psycopg2 + Postgresql functions**

In [None]:
# Import function that display postgresql's table header
from display_header import display_header
# Import function to creation connection to Postgresql database 
from postgres_functions import create_pg_connexion

In [None]:
# Import function for computation of descriptive statistics
from descript_stats import get_count_area, descript_stats_proportion

# Compute descriptive statistics

In [None]:
# Name of the table containing all the informations
final_table = "capa_statistics_wall_a"

**Get values of total count and total area of cadastred spaces in Wallonia**

In [None]:
# Get values
total, total_area = get_count_area(config_parameters, 'results', final_table)

## Completeness of informations in the database

**Records having not any information**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE all_hilucs is null")

**Records having not cadastral information**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE nat_lu_maj is null")

**Records having only 1 information**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE cardinality(all_hilucs) = 1")

**Records having only unknown class (8_8)**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE '8_8' = ALL(all_hilucs)")

**Records having only 1 uncertain cadastral information (corresponding to 8_8)**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE nat_lu_maj = '8_8' AND nat_nb_dist_lu = 1")

**Records having 2 informations**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE cardinality(all_hilucs) = 2")

**Records having 3 informations**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE cardinality(all_hilucs) = 3")

**Records having more than 3 informations**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE cardinality(all_hilucs) > 3")

**Mean number of information per parcel**

In [None]:
con = create_pg_connexion(config_parameters) # Create connexion
cursor = con.cursor() # Create cursor
# SQL query
query = "SELECT AVG(Cardinality(all_hilucs)) FROM {schema}.{table}"
cursor.execute(query.format(schema='results',table=final_table))
i = cursor.fetchone()[0] # fetch the first row
# Close connection with database
cursor.close()
# Close connexion to postgres database
con.close()
print(i)

**Maximum number of information for a parcel**

In [None]:
con = create_pg_connexion(config_parameters) # Create connexion
cursor = con.cursor() # Create cursor
# SQL query
query = "SELECT MAX(Cardinality(all_hilucs)) FROM {schema}.{table}"
cursor.execute(query.format(schema='results',table=final_table))
i = int(cursor.fetchone()[0]) # fetch the first row
# Close connection with database
cursor.close()
# Close connexion to postgres database
con.close()
print(i)

**Records having an uncertain cadastral information but have other information available**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE nat_lu_maj = '8_8' AND Cardinality(all_hilucs) > 1")

## Agreement of DBs informations

### HILUCS agreement for all levels (all hilucs correspondences agree)

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE agreement_all_hilucs is TRUE")

### HILUCS Level 3 agreement

**Records for which all informations agree at HILUCS level 3**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE agreement_l3_hilucs is TRUE")

### HILUCS Level 2 agreement

**Records for which all informations agree at HILUCS level 2**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE agreement_l2_hilucs is TRUE")

**Records having only multiple informations at level 2 and for which all informations agree**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE agreement_l2_hilucs is TRUE AND agreement_l3_hilucs is NULL")

**Records for which all informations agree at HILUCS level 2 but not at level 3**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE agreement_l2_hilucs is TRUE AND agreement_l3_hilucs is FALSE")

### HILUCS Level 1 agreement

**Records for which all informations agree at HILUCS level 1**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE agreement_l1_hilucs is TRUE")

**Records having only multiple informations at level 1 and for which all informations agree**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE agreement_l1_hilucs is TRUE AND agreement_l2_hilucs is NULL")

**Records for which all informations agree at HILUCS level 1 but not at level 2**

In [None]:
# Print proportions
descript_stats_proportion(config_parameters, 'results', final_table, total, total_area, 
                          where="WHERE agreement_l1_hilucs is TRUE AND agreement_l2_hilucs is FALSE")