# UE09 Geopandas

#### Find the pandas documentation under: http://geopandas.org/index.html

#### Find a helpful tutorial under: https://github.com/jorisvandenbossche/geopandas-tutorial


In [None]:
# author: 
# date: 
# content: 

## Topic Overview: GIS
**Definition:** "A GIS is a system of hardware, software and procedures to facilitate the management, manipulation, analysis, modelling, representation and display of georeferenced data to solve complex problems regarding planning and management of resources" (National Centre of Geographic Information and Analysis, 1990) This implicates that in a group of maps of the same territory a given location has the same coordinates in all the maps.

![Layers](./Input/fig_1_capas_gis.jpg)

Imgage: https://geopaisa.blog/2017/03/08/que-es-un-sig/ <br>
Source: http://www.geogra.uah.es/patxi/gisweb/GISModule/GISTheory.pdf


In [None]:
pip install mapclassify

### Import packages (numpy, matplotlib and pandas)

In [2]:
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

## Geospatial data
Geospatial data is often available from specific GIS file formats or data stores, like ESRI shapefiles, GeoJSON files, geopackage files, PostGIS (PostgreSQL) database, ... We can use the GeoPandas library to read many of those GIS file formats.

download gejson file from eurostat:
https://ec.europa.eu/eurostat/de/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts <br>

In [66]:
# https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/download/ref-nuts-2016-60m.geojson.zip

In [13]:
# read the .geojson-file and assign it to a new variable. The result is a geodataframe


### GeoDataFrame
A GeoDataFrame is a tabular data structure that contains a **GeoSeries**. <br>

The most important property of a GeoDataFrame is that it always has one GeoSeries, which is referred to as the GeoDataFrame’s **“geometry”**. 

The other columns are the **attributes** that describe each of the geometries.

Such a GeoDataFrame is just like a pandas DataFrame, but with some additional functionality for working with geospatial data:
- A .geometry attribute that always returns the column with the geometry information (returning a GeoSeries). The column name itself does not necessarily need to be 'geometry', but it will always be accessible as the .geometry attribute.
- It has some extra methods for working with spatial data (area, distance, buffer, intersection, ...)

### GeoSeries

A GeoSeries is essentially a vector where each entry in the vector is a set of shapes. An entry may consist of only one shape (like a single polygon) or multiple shapes that are meant to be thought of as one observation (like the many polygons that make up the State of Hawaii or a country like Indonesia).

geopandas has three basic classes of geometric objects:

    Points / Multi-Points

    Lines / Multi-Lines

    Polygons / Multi-Polygons

#### Take a look at the GeoDataFrame. Which column contains GeoSeries?

#### Plot the GeoDataFrame. 

#### Plot Germany. 

In [14]:
# find countries


In [15]:
# plot Germany


#### Import a new GeoDataFrame with airport locations

In [12]:
# from: https://ec.europa.eu/eurostat/de/web/gisco/geodata/reference-data/transport-networks
 airports = gpd.read_file("./SHAPE/AIRP_PT_2013.shp")

#### Take a look at the DataFrame and plot it.

#### Find 

In [16]:
# find airports in Germany
# plot them


#### Plot both Dataframes in the same plot. Each DataFrame represents a layer.

#### We need to align the reference coordinate systems of both dataframes
Access the reference coordinate system of the Dataframes with the help of the .crs attribute

#### Transform the coordinate systems with the help of the .to_crs() function.

#### Plot both Dataframes in the same plot again.

## Part 2: Add additional data to the GeoDataFrame - Population

In [338]:
# read Data about pooulation
csv = pd.read_csv('./demo_r_pjangrp3_1_Data.csv', sep=',', encoding='iso-8859-1')
csv = csv[csv['SEX'] == 'Insgesamt' ]
csv.head()

Unnamed: 0,TIME,GEO,SEX,UNIT,AGE,Value,Flag and Footnotes
0,2018,EU28,Insgesamt,Anzahl,Insgesamt,512.379.225,
3,2018,EU27,Insgesamt,Anzahl,Insgesamt,508.273.732,
6,2018,BE,Insgesamt,Anzahl,Insgesamt,11.398.589,
9,2018,BE1,Insgesamt,Anzahl,Insgesamt,1.205.492,
12,2018,BE10,Insgesamt,Anzahl,Insgesamt,1.205.492,


In [339]:
# tidy up, convert to integer values
pop = pd.Series(csv['Value'].values, index=csv['GEO'].values)
pop = pop.apply(lambda x: x.replace('.',''))
pop = pop[ pop != ':' ]
pop = pop.astype(np.int64)
pop.name = 'population'
pop.head()

EU28    512379225
EU27    508273732
BE       11398589
BE1       1205492
BE10      1205492
Name: population, dtype: int64

#### When joining DataFrames it can be very helpful if both use the same index
Look at GeoDataFrame and set the column "NUTS_ID" as the index

In [341]:
# filter population for country = Germany
# popDE = pop.loc[geoDE.index]

#### Join the two DataFrames together

#### Create a new column for population per km² and fill it 

#### Plot the new map

#### Suitable coloring depends on the dataset.

Look at the histogram of the poulation

#### Different schemes can be selected in the plot() function 