[Information Visualization Tutorials](https://infovis.fh-potsdam.de/tutorials/) · FH Potsdam · Summer 2020

# Tutorial 8: Geovisualization

In this last instalment of the information visualization tutorials we will be analyzing and visualizing geographic data; i.e., data that refers to geospatial entities. Geospatial entities can, for example, be particular places such as schools and libraries or political boundaries of cities or countries. Of course, this tutorial only scratches the surface. Consider this as a teaser into geovisualization, which in itself has become a branch of research and practice at the intersection of geography and visualization. We will only touch on a few basic steps to get your feet wet and hands dirty.


## 🛒 1. Prepare 

As you come to expect by now we first assemble our tools and then prepare the data. 

In [39]:
import altair as alt
import pandas as pd

### Install packages

For this tutorial we will continue to rely on Altair and Pandas, but add **GeoPandas**, which will help us to work with DataFrames that contain spatial entities to carry out geometric analysis on them. As before, the pip install command is carried out via the shell, which is indicated by the exclamation mark at the beginning of the line:

In [40]:
!pip install geopandas
import geopandas as gpd



To access the data of the OpenStreetMap, we will install the handy package **OSMPythonTools**:

In [41]:
!pip install OSMPythonTools



And finally, we will include **GeoPy**, which will help us turn addresses into geographic coordinates:

In [42]:
!pip install geopy



Once we have the tools assembled, we can get started working with geospatial data. There are actually plenty of formats used to record geospatial data, but GeoJSON has become an important standard for exchanging geospatial data on the web. However, please note that GeoPandas can actually load many other vector-based data formats used in digital cartography, such as shapefiles and GeoPackage.

### Import GeoJSON

Suppose we would like to get the geographic boundaries of Potsdam's districts, which happen to be published on Potsdam's Open Data Portal. Akin to how we would read a JSON file with Pandas, we can also use `read_file()` provided by GeoPandas simply by passing the URL to the data set of interest and get a geographic DataFrame back:

In [43]:
districts = gpd.read_file("https://opendata.potsdam.de/explore/dataset/statistische-bezirke-in-potsdam/download/?format=geojson")

If you replace `geojson` with `shp` at the end of above URL, you can also load this data from a shapefile format. Either way the data is going to be loaded into the data structure of a GeoDataFrame. The main difference between a regular Pandas DataFrame is that a GeoDataFrame features a `geometry` column, which is a geoseries containing the points, paths, and polygons for each row. For example, if each row represents one district, the respective geometries would probably contain the geospatial boundaries…

✏️ *Are you curious what the districts dataframe actually looks like? Take a look at it with the methods you know by now:* 

Geographically speaking, the districts are defined by their geographic shapes, which are represented as polygons, each of which is a list of tuples of geographic coordinates. Next we add information about schools in Potsdam:

In [44]:
schools = gpd.read_file("https://opendata.potsdam.de/explore/dataset/schulen/download/?format=geojson")

✏️ *Have a look at the schools as well, and compare the contents of the `geometry` columns in schools and districts. Do you notice anything?*

### Query OpenStreetMap

OpenStreetMap (OSM) is "a collaborative project to create a free editable map of the world". As such it has millions of contributing users who have been collecting, updating and refining map data for over 15 years, which has generated a vastly comprehensive source of geographic data. It is by no means complete—whatever this would mean—but it is an impressively large geographic database and, of course, a map in itself, too.

To get a list of libraries in Potsdam (according to the OSM), we first need to find the right Potsdam. For this we use the geocoding powers of OSM through the `Nominatim` search service:

In [45]:
from OSMPythonTools.nominatim import Nominatim
nominatim = Nominatim()
city = nominatim.query('Potsdam, Germany')
city.areaId()

3600062369

OpenStreetMap has its own kind of query language, which is quite compact and can also be a source for errors. To make query formulation easier, you can either use the web interface [overpass turbo](http://overpass-turbo.eu) or the `overpassQueryBuilder`, which provides access to the main parameters:

In [46]:
from OSMPythonTools.overpass import overpassQueryBuilder

library_query = overpassQueryBuilder(
    area=city.areaId(), # the query can be contrained by an area of an item
    elementType='node', # which are points (OSM also has ways and relations)
    # the selector in the next line is really the heart of the query:
    selector='"amenity"="library"', # we're looking for libraries
    out='body', # body indicates that we want the data, not just the count
    includeGeometry=True # and we want the geometric information, too
)

library_query

'area(3600062369)->.searchArea;(node["amenity"="library"](area.searchArea);); out body geom;'

The output of above cell is the compact version of the query, which is carried out in the next step:



In [47]:
from OSMPythonTools.overpass import Overpass
overpass = Overpass()

lib_data = overpass.query(library_query)

The variable `lib_data` now already contains the result from the query against OSM. Let's have a look at it. With `nodes()` we can access the retrieved points. Let's take a look at the first entry:

In [48]:
lib_data.nodes()[0].tags()

{'addr:city': 'Potsdam',
 'addr:country': 'DE',
 'addr:housenumber': '5',
 'addr:postcode': '14469',
 'addr:street': 'Kiepenheuerallee',
 'amenity': 'library',
 'contact:email': 'bibliothek@fh-potsdam.de',
 'contact:fax': '+49 331 580 2229',
 'contact:phone': '+49 331 580 2211',
 'contact:website': 'https://www.fh-potsdam.de/informieren/organisation/wiss-einrichtungen/bibliothek/bibliothek-news/',
 'name': 'Hochschulbibliothek',
 'opening_hours': 'Mo-Fr 09:00-19:00; Sa 09:00-14:30; PH,Su off; 2019 Dec 21-2020 Jan 01 off',
 'opening_hours:url': 'https://www.fh-potsdam.de/informieren/organisation/wiss-einrichtungen/bibliothek/wir-ueber-uns/oeffnungszeiten/',
 'operator': 'Fachhochschule Potsdam',
 'operator:type': 'public',
 'operator:wikidata': 'Q896706',
 'operator:wikipedia': 'de:Fachhochschule Potsdam',
 'ref:isil': 'DE-525',
 'wheelchair': 'limited'}

Similarly, we can also access the geometry, which in this case is just a point:

In [49]:
lib_data.nodes()[0].geometry()

{"coordinates": [13.051358, 52.41372], "type": "Point"}

Next, we use the compact form of a list comprehension to extract the libraries' names and coordinates:


In [50]:
libraries = [ (lib.tag("name"), lib.geometry() ) for lib in lib_data.nodes()]

… which we turn into a GeoDataFrame. By naming the second column `geometry` we indicate towards GeoPandas to interpret the coordinates as geographic locations:

In [51]:
libraries = gpd.GeoDataFrame(libraries, columns = ['name', 'geometry'])

Let's repeat the process to retrieve Berlin's trees (as recorded by the OSM community):

In [52]:
# 1. prepare query (and directly include the location lookup)
tree_query = overpassQueryBuilder(
    area=nominatim.query('Berlin, Germany').areaId(),
    elementType='node',
    selector='"natural"="tree"', 
    out='body', 
    includeGeometry=True
)

# 2. execute query (and give it a bit more time to finish)
tree_data = overpass.query(tree_query, timeout=60)

# 3. get ids and coordinates of trees
tree_locs = [ (tree.id(), tree.geometry()) for tree in tree_data.nodes()]

# 4. create GeoDataFrame
trees = gpd.GeoDataFrame(tree_locs, columns=["id", "geometry"])

trees.head()

Unnamed: 0,id,geometry
0,21487172,POINT (13.35177 52.51431)
1,26908663,POINT (13.34627 52.47173)
2,27306554,POINT (13.30086 52.52252)
3,27306733,POINT (13.30062 52.52235)
4,30429119,POINT (13.30101 52.52255)


What are you interested in? You may want to consult the [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API) or play around with [overpass turbo](http://overpass-turbo.eu): 

✏️ *Copy above code into the cell below and adjust the query and variable names according to your interest:*

## 📍 2. Process

<!-- Two important processing steps are turning addresses into geolocations and aggregating geospatial data to feed into choropleth maps. -->

One important processing step is turning addresses into geolocations.

### Geocoding addresses

A typical challenge before actually visualizing geographic data is to extract the geographic coordinates of items of interest (may they be trees or libraries). When loading GeoJSON files or querying OpenStreetMap the geometries are naturally already included. However, oftentimes we may only have street addresses or the names of geographic entities such as cities or points of interests, which cannot be directly used to derive positions on a map. Yet, intuitively speaking names of places and street addresses are unique ways of identifying locations. To turn such geographic strings into geographic tuples we rely on geocoding. In short: geocoding translates an address or place name into geographic coordinates. 

There are plenty of commercial geocoders out there, but for the purpose of this tutorial we simply use OpenStreetMap's Nominatem service via the handy GeoPy package. (We could also stick to the OSMPythonTools that we already used above, but GeoPy has some handy ways of carrying out multiple geocoding steps in a batch.)


In [53]:
# import Nominatim as geopy geocoder
from geopy.geocoders import Nominatim

# register custom user agent (commercial services may also require an API key)
geocoder = Nominatim(user_agent="Information Visualization Tutorial · FH Potsdam")

Let's start with an address, students of FH Potsdam might be familiar with:

In [54]:
loc = geocoder.geocode("Kiepenheuerallee 5, 14469 Potsdam")
print(loc)

Campus Fachhochschule Potsdam, 5, Kiepenheuerallee, Bornstedt, Potsdam Nord, Potsdam, Brandenburg, 14469, Deutschland


We can access the geographic coordinates one by one:

In [55]:
print((loc.latitude, loc.longitude))

(52.4123583, 13.050748548573427)


Note that we can also use a name of a place; however, in this case the coordinates do not refer to the street address, but the center of the place:

In [56]:
loc = geocoder.geocode("Fachhochschule Potsdam")
print((loc.latitude, loc.longitude))

(52.4121432, 13.0507812)


✏️ *Give the geocoder a try and issue a geocoding request for an address or placename of your choice:* 

Let's proceed with a dataset containing multiple places. Do you remember the childcare dataset we loaded in the data wrangling tutorial? The dataset actually did not include geospatial coordinates, but street addresses! Let's load the CSV again into a DataFrame:

In [57]:
kitas = pd.read_csv("https://opendata.potsdam.de/explore/dataset/kitaboerse-20161108/download/", sep=";")

The `hausnummer` column contains some non-numerical items such as months (for no apparent reason). This will cause errors later during geocoding. In the following we extract the first number encountered in the `hausnummer` cells. Some kitas have multiple house numbers, for the purpose of this tutorial one will suffice.

In [58]:
# parameter of extract is a regular expression that the first set of digits 
kitas.hausnummer = kitas.hausnummer.str.extract('(\d+)')

# in some cases the value is not a number, which we will with zeros
kitas.hausnummer = kitas.hausnummer.fillna(0)

Geocoding can take a bit of time, so we limit ourselves to just a few random entries here:


In [59]:
kitas = kitas.sample(20)

✏️ *If you have a bit of time, increase above number or simply comment out the line to analyze all kitas in Potsdam*

Note that the address information is spread across three columns: `strasse`, `hausnummer`, `postleitzahl`.

We will turn them into one column, with which we query the geocoder. For the purpose of this tutorial, we keep the remaining data at a minimum and thus only keep the names of the kitas and their capacities:

In [60]:
kitas

Unnamed: 0,name_der_kindertagesbetreuungseinrichtung,stand_vom,barrierefreie_einrichtung,kinderkrippe_0_3_j,tagespflege_0_3_j,padagogisch_begleitete_spielgruppe_0_3_j,kindergarten_3_j_schuleintritt,hort_ab_schuleintritt,andere_kinderbetreuung_ab_3_klasse,strasse,hausnummer,postleitzahl,kartenansicht,tel_nr,fax_nr,e_mail,homepage_der_einrichtung,stadtteil,trager,tel_nr_trager,e_mail_trager,homepage_des_tragers,platze_unbefristet,darunter_betriebskita_unbefristet,befristete_betriebserlaubnis_erlaubnis,platze_befristet,darunter_betriebskita_befristet,betrieb_e,befristete_betriebserlaubnis_erlaubnis_gultig_bis_monat,befristete_betriebserlaubnis_erlaubnis_gultig_bis_jahr,nur_tagespflege_erste_hilfe_kurs_gultig_bis_monat,nur_tagespflege_erste_hilfe_kurs_gultig_bis_jahr,integrationseinrichtung,inklusionshort,einrichtung_mit_besonderem_padagogischen_angebot,offnungszeiten_mo_fr,offnungszeiten_ab_6_00_uhr,offnungszeiten_nach_17_30_uhr,wochenendoffnung,abweichende_offnungszeiten,ubernachtung_moglich,schliesstage_von_bis,fruhstuck,mittag,vesper,abendessen,versorgungsart,link_zum_bereich_kinder_und_jugend_der_landeshauptstadt_potsdam,link_zu_anmeldeinformationen_der_einrichtung
132,Spielgruppe Waldstadt,08.11.2016,Nein,Nein,Nein,Ja,Nein,Nein,Nein,Ginsterweg,3,14478 Potsdam,https://lhp.maps.arcgis.com/apps/webappviewer/...,0331.81 23 51,,spielgruppe-waldstadt@pbhev.de,http://www.pbhev.de/?page_id=19,Waldstadt II,Potsdamer Betreuungshilfe e.V.,0331.81 23 51,pbhev@t-online.de,http://www.pbhev.de,15.0,0.0,Nein,0.0,0.0,,--,,--,,Nein,Nein,Nein,9 bis 15 Uhr,Nein,Nein,Nein,,Nein,,Ja,Ja,Ja,Nein,Eigenversorgung am Standort,http://www.potsdam.de/kita-tipp,http://pbhev.de/?page_id=21
21,"Rappelkiste, Kita",17.05.2016,Nein,Ja,Nein,Nein,Ja,Ja,Nein,Liefelds Grund,23,14478 Potsdam,https://lhp.maps.arcgis.com/apps/webappviewer/...,0331.273 30 35,0331.273 30 39,kita@rappelkiste-potsdam.de,http://www.rappelkiste-potsdam.de,Waldstadt II,Elternverein Kinderladen Rapperlkiste e.V.,0331.273 30 34,post@aktive-schule-potsdam.de,http://www.rappelkiste-potsdam.de/Rappelkiste/...,70.0,0.0,Nein,0.0,0.0,,--,,--,,Nein,Nein,Nein,7:30 bis 15:30 Uhr,Nein,Nein,Nein,,Nein,,Ja,Ja,Ja,Nein,Eigenversorgung am Standort,http://www.potsdam.de/kita-tipp,http://www.rappelkiste-potsdam.de
17,"Flotowkids, Hort",01.11.2015,Nein,Nein,Nein,Nein,Nein,Ja,Nein,Flotowstr.,10,14480 Potsdam,https://lhp.maps.arcgis.com/apps/webappviewer/...,0331.20 02 97 65,0331.20 02 97 67,traetow.c@gesa-ag.de,http://www.kita-flotow-kids.de,Stern,ASG Anerkannte Schulgesellschaft mbH,03733.426 72 00,info@anerkannte-schulgesellschaft.de,http://www.anerkannte-schulgesellschaft.de,404.0,0.0,Nein,0.0,0.0,,--,,--,,Nein,Ja,Nein,6 bis 18 Uhr,Ja,Ja,Nein,in den Ferien 7 bis 17 Uhr,Nein,,Ja,Ja,Ja,Nein,Eigenversorgung am Standort,http://www.potsdam.de/kita-tipp,http://www.kita-flotow-kids.de
28,"Die Buntstifte, Hort",07.11.2016,Ja,Nein,Nein,Nein,Nein,Ja,Nein,Steinstr.,104,14480 Potsdam,https://lhp.maps.arcgis.com/apps/webappviewer/...,0331.61 11 19,0331.88 74 47 55,hort.buntstifte@stiftung-spi.de,http://www.lindenpark.de/SPI_Potsdam/die-bunts...,Stern,Stiftung SPI NL Brandenburg,0335.387 27 80,brandenburg@stiftung-spi.de,http://www.stiftung-spi.de/index_1.html,112.0,0.0,Ja,75.0,0.0,Röhrenstr. 6 /14480 Potsdam / Galileistr. 6 / ...,8,2018.0,--,,Nein,Nein,Nein,06.00 bis 18.00 Uhr,Ja,Ja,Nein,,Nein,15.08.2016 - 26.08.2016,Ja,Ja,Ja,Nein,Eigenversorgung am Standort,http://www.potsdam.de/kita-tipp,http://www.lindenpark.de/SPI_Potsdam/die-bunts...
135,Tagespflegepersonen FidL - Frauen in der Leben...,01.11.2015,Nein,Nein,Ja,Nein,Nein,Nein,Nein,Alleestr.,1,14469 Potsdam,,0331.86 75 00 87,0331.86 75 00 92,tagespflege@fidl.de,http://fidl.de/kinderbetreuung/tagespflege,Tagespflege-Standorte,FidL - Frauen in der Lebensmitte e.V.,0331.86 75 00 87,tagespflege@fidl.de,www.fidl.de,90.0,0.0,Ja,90.0,0.0,,--,,--,,Nein,Nein,Nein,Beratungszeiten: Di 8-16 Uhr und Do 14-18 Uhr,Nein,Nein,Nein,Terminvereinbarung per Mail info@fidl.de oder ...,Nein,,Ja,Ja,Ja,Nein,keine Angabe,http://www.potsdam.de/kita-tipp,http://fidl.de/kinderbetreuung/tagespflege
106,"Nimmerland, Hort",08.09.2016,Nein,Nein,Nein,Nein,Nein,Ja,Nein,Karl-Marx-Str.,72,14482 Potsdam,https://lhp.maps.arcgis.com/apps/webappviewer/...,0331.704 75 80,0331.740 63 89,nimmerland@elternverein-zwergenland.de,http://www.elternverein-zwergenland.de/?page_i...,Babelsberg Nord,Elternverein Zwergenland e.V.,0331.740 63 91,buero@elternverein-zwergenland.de,http://www.elternverein-zwergenland.de/?page_i...,30.0,0.0,Nein,0.0,0.0,,--,,--,,Nein,Nein,Nein,10.30 bis 17 Uhr,Nein,Nein,Nein,,Nein,Weihnachtsferien,Nein,Ja,Ja,Nein,Eigenversorgung geliefert,http://www.potsdam.de/kita-tipp,http://www.elternverein-zwergenland.de/wp-cont...
109,"Zauberstein, Kita",12.09.2016,Nein,Ja,Nein,Nein,Ja,Nein,Nein,Berliner Str.,27,14467 Potsdam,https://lhp.maps.arcgis.com/apps/webappviewer/...,0331.88 71 98 00,0331.88 71 98 02,zauberstein@lsb-sportservice.de,http://www.lsb-sportservice.de/einrichtungen/k...,Berliner Vorstadt,LSB SportService Brandenburg gGmbH,0331.971 98 83,geschaeftsstelle@lsb-sportservice.de,http://www.lsb-sportservice.de,145.0,0.0,Nein,0.0,0.0,,--,,--,,Nein,Nein,Nein,6.30 bis 17.30 Uhr,Nein,Nein,Nein,,Nein,"06.05.2016, 08.08.2016 -22.08.2016, 04.10.16, ...",Ja,Ja,Ja,Nein,Mischversorgung,http://www.potsdam.de/kita-tipp,http://www.lsb-sportservice.de/files/antrag_au...
99,Evangelische Kindertagesstätte Regenbogenland,15.05.2016,Nein,Ja,Nein,Nein,Ja,Ja,Nein,Hubertusdamm,50,14480 Potsdam,https://lhp.maps.arcgis.com/apps/webappviewer/...,0331.600 42 86,0331.730 94 16,sabine.hintze@hoffbauer-kinder.de,http://www.hoffbauer-bildung.de/kita-regenboge...,Stern,Hoffbauer Kinder gGmbH,0331.231 31 00,Julia.Meike@hoffbauer-stiftung.de,http://www.hoffbauer-bildung.de,164.0,0.0,Nein,0.0,0.0,,--,,--,,Nein,Nein,Nein,6 bis 17.30 Uhr,Ja,Nein,Nein,Fr bis 17 Uhr,Nein,,Ja,Ja,Ja,Nein,Eigenversorgung am Standort,http://www.potsdam.de/kita-tipp,http://www.hoffbauer-bildung.de/kita-regenboge...
67,"Am Heiligen See, Kita",28.04.2016,Nein,Ja,Nein,Nein,Ja,Nein,Nein,Seestr.,43,14467 Potsdam,https://lhp.maps.arcgis.com/apps/webappviewer/...,0331.29 28 59,0331.817 00 69,kita-am-heiligen-see@ejf.de,http://www.ejf.de/?id=115,Berliner Vorstadt,EJF - gAG,030.76 88 42 56,jordan-nimsch.sigrid@ejf.de,http://www.ejf.de,121.0,0.0,Nein,0.0,0.0,,--,,--,,Nein,Nein,Nein,6.30 bis 17.30 Uhr,Nein,Nein,Nein,,Nein,,Ja,Ja,Ja,Nein,Eigenversorgung am Standort,http://www.potsdam.de/kita-tipp,https://www.ejf.de/einrichtungen/kindertagesst...
5,"Nuthegeister, AWO, Hort für hör-, sprach-, ler...",22.04.2016,Ja,Nein,Nein,Nein,Ja,Nein,Nein,Bisamkiez,107,14478 Potsdam,https://lhp.maps.arcgis.com/apps/webappviewer/...,0331.871 31 36,0331.87 00 00 14,nuthegeister@awo-potsdam.de,http://www.awo-potsdam.de/einrichtungen-und-di...,Schlaatz,AWO Kinder- und Jugendhilfe Potsdam gGmbH,0331.58 14 80,info-kjh@awo-potsdam.de,http://www.awo-potsdam.de/awo-bezirksverband/a...,130.0,0.0,Nein,0.0,0.0,,--,,--,,Ja,Nein,Ja,6 bis 17.30 Uhr,Ja,Nein,Nein,Ferien:8-17 Uhr und nach Bedarf ab 6 Uhr,Nein,keine,Nein,Ja,Ja,Nein,Fremdversorgung,http://www.potsdam.de/kita-tipp,http://www.awo-potsdam.de/einrichtungen-und-di...


In [61]:
# names of the childcare places
names = kitas["name_der_kindertagesbetreuungseinrichtung"]

# the capacity of the kitas; which we turn into integers
capac = kitas["platze_unbefristet"].fillna(0).astype(int)

# columns containing address information
addr = ['strasse', 'hausnummer', 'postleitzahl']

# we join the values in the three columns for each row
addresses = kitas[addr].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)

# the dataframe we will use
kitas = pd.DataFrame({'name': names, 'capacity': capac, 'address': addresses})
kitas

Unnamed: 0,name,capacity,address
132,Spielgruppe Waldstadt,15,Ginsterweg 3 14478 Potsdam
21,"Rappelkiste, Kita",70,Liefelds Grund 23 14478 Potsdam
17,"Flotowkids, Hort",404,Flotowstr. 10 14480 Potsdam
28,"Die Buntstifte, Hort",112,Steinstr. 104 14480 Potsdam
135,Tagespflegepersonen FidL - Frauen in der Leben...,90,Alleestr. 1 14469 Potsdam
106,"Nimmerland, Hort",30,Karl-Marx-Str. 72 14482 Potsdam
109,"Zauberstein, Kita",145,Berliner Str. 27 14467 Potsdam
99,Evangelische Kindertagesstätte Regenbogenland,164,Hubertusdamm 50 14480 Potsdam
67,"Am Heiligen See, Kita",121,Seestr. 43 14467 Potsdam
5,"Nuthegeister, AWO, Hort für hör-, sprach-, ler...",130,Bisamkiez 107 14478 Potsdam


Now let's turn addresses into geometries! For each cell in the address column, a query against OSM will be triggered; to spread out the load we use a RateLimiter provided by GeoPy:

In [62]:
from geopy.extra.rate_limiter import RateLimiter

# add a delay of one second between each geocoding request
geocode = RateLimiter(geocoder.geocode, min_delay_seconds=1)

Next we invoke the geocoder and apply it to the address column (depending on how many entries are to be geocoded, this can take a while):

In [63]:
# apply geocoding to address column; store responses in location column 
kitas['location'] = kitas['address'].apply(geocode)

There might be some kitas that have not location information, i.e., for which the geocoder was note able to identify latitude and longitude. We only keep those rows that have location information, i.e., that are `notnull()`:

In [64]:
kitas = kitas[kitas['location'].notnull()]

After that we use one list comprehension to extract latitudes and longitudes from the locations column, which we will then use to transform the DataFrame into a GeoDataFrame featuring its own `geometry` column:

In [65]:
# create empty columns for coordinates
kitas["lat"] = None
kitas["lon"] = None

# extract lat and lon from locations via one list comprehensions
kitas[['lat', 'lon']] = [ (loc.latitude, loc.longitude) for loc in kitas['location'] ]

# # create GeoDataFrame, pointing explicitly to lon and lat columns
kitas = gpd.GeoDataFrame(kitas, geometry=gpd.points_from_xy(kitas.lon, kitas.lat))

# # remove superfluous columns that are not needed anymore
kitas = kitas.drop(columns=['location', 'lat', 'lon'])

kitas

Unnamed: 0,name,capacity,address,geometry
132,Spielgruppe Waldstadt,15,Ginsterweg 3 14478 Potsdam,POINT (13.09033 52.36659)
21,"Rappelkiste, Kita",70,Liefelds Grund 23 14478 Potsdam,POINT (13.09383 52.35947)
17,"Flotowkids, Hort",404,Flotowstr. 10 14480 Potsdam,POINT (13.13760 52.38271)
28,"Die Buntstifte, Hort",112,Steinstr. 104 14480 Potsdam,POINT (13.14418 52.38368)
135,Tagespflegepersonen FidL - Frauen in der Leben...,90,Alleestr. 1 14469 Potsdam,POINT (13.05937 52.40886)
106,"Nimmerland, Hort",30,Karl-Marx-Str. 72 14482 Potsdam,POINT (13.12186 52.39552)
109,"Zauberstein, Kita",145,Berliner Str. 27 14467 Potsdam,POINT (13.07188 52.40378)
99,Evangelische Kindertagesstätte Regenbogenland,164,Hubertusdamm 50 14480 Potsdam,POINT (13.13799 52.38445)
67,"Am Heiligen See, Kita",121,Seestr. 43 14467 Potsdam,POINT (13.07233 52.41026)
5,"Nuthegeister, AWO, Hort für hör-, sprach-, ler...",130,Bisamkiez 107 14478 Potsdam,POINT (13.09907 52.37246)


## 🗺 3. Present

When we have geospatial data readily available as GeoDataFrames, we can now map them with Altair. 

(There are other mapping libraries for Python, such as [Folium](https://python-visualization.github.io/folium/), that provide additional functionalities. Altair's geovis features are basic, but do provide some variety of techniques and have the benefit to work consistently with the other chart types we covered.)


### Markers on maps

A simple start is placing locations on a base map and adding a bit of further information via tooltips. Let's do this with Potsdam's schools! First, we can have another look at the attributes:


In [66]:
schools.head(3)

Unnamed: 0,status,schulnum_1,schulnumme,x_e89_rbs,trager,sozialraum,ort,plz,standort,y_e89_rbs,schulname,strasse,planungsra,schulform,geometry
0,Aktiv,120790,1,369185.2527,öffentlich,VI,Potsdam,14473,Hauptstandort,5805288.0,Humboldt-Gymnasium,Heinrich-Mann-Allee 103,601,OG,POINT (13.07798 52.38221)
1,Aktiv,401262,10/30,370220.0,öffentlich,VI,Potsdam,14478,Hauptstandort,5804340.0,Schule am Nuthetal,An der Alten Zauche 2 c,602,F,POINT (13.09354 52.37393)
2,Aktiv,600027,15,367879.9999,öffentlich,III,Potsdam,14467,Hauptstandort,5807450.0,"Schule des Zweiten Bildungsweges ""Heinrich von...",Friedrich-Ebert-Straße 17,302,ZBW,POINT (13.05796 52.40132)


This gives us plenty of aspects to visualize. 

We will now create a simple map with markers in the form of an  Altair chart consisting of two layers:

1.   The `districts` form the lower layer representing their boundaries and the overall geographic shape of Potsdam
2.   The `schools` are the points of interests that are displayed on top

When putting the two layers together they should actually refer to the same geographic location to make sense. Here the districts and schools both refer to Potsdam. Also note that the order when the charts are added together determines the vertical order: first the basemap and then markers on top.

In [67]:
# 1.  mark_geoshape transparently uses the geometry column
basemap = alt.Chart(districts).mark_geoshape(
    # add some styling to reduce the salience of the basemap
    fill="lightgray", stroke="darkgray"
).properties(width=600, height=600)

# 2.  we use mark_circle to have more control over visual variables
markers = alt.Chart(schools).mark_circle(opacity=1).encode(
    # point latitude & longitude to coordinates in geometry column
    longitude='geometry.coordinates[0]:Q',
    latitude='geometry.coordinates[1]:Q',
    tooltip=['schulname', 'strasse', 'trager'],
)

# combine the two layers 
basemap + markers

✏️ *How about changing the color of the dots according to `trager` or `schulform`?*

### Graduated symbol maps

This technique adjusts the visual features of markers to encode quantitative data dimensions. For example, we can use varying sizes of circles to represent the capacities of Potsdam's kitas. We use the same two-layer structure we used above:

In [68]:
basemap = alt.Chart(districts).mark_geoshape(
    fill="lightgray", stroke="darkgray"
).properties(width=600, height=600)

markers = alt.Chart(kitas).mark_circle(opacity=1).encode(
    longitude='geometry.coordinates[0]:Q',    
    latitude='geometry.coordinates[1]:Q',    
    tooltip=['name:N', 'address:N', 'capacity:Q'],
    size="capacity:Q"
)

basemap + markers

### Dot density maps

When we are dealing with thousands elements, we are reaching perceptual and technical limitations. One way to mitigate the technical limitations is to take a sample of the elements, large enough to see overall patterns. This is what we are now doing with Berlin's trees, of which there are far too many to display them all individually, however, a sample might still be informative:

In [69]:
# by default Altair only handles max number of 5000 items
# the following line disables this limitation, read more here:
# https://altair-viz.github.io/user_guide/faq.html#maxrowserror-how-can-i-plot-large-datasets
alt.data_transformers.disable_max_rows()

treemap = alt.Chart(trees.sample(n=10000)).mark_circle(
    # reduce the visual presence of each element
    size=5,
    # with a low dot opacity we can use overplotting to indicate densities
    opacity=.25,
    # a natural choice
    color="green"
).encode(
    longitude='geometry.coordinates[0]:Q', 
    latitude='geometry.coordinates[1]:Q'    
).properties(width=600, height=600)

treemap

✏️  *Add a baselayer underneath with Berlin's districts; the Technologiestiftung Berlin offers [spatial data in various shapes and sizes](https://lab.technologiestiftung-berlin.de/projects/spatial-units/en/). Hint: you might have to flip the [winding order](https://altair-viz.github.io/user_guide/data.html#winding-order) of the geometries.*

### Choropleth maps

Finally, let's create the geovisualization that uses the fill color of geospatial shapes to encode quantitative data. To illustrate this, we will visualize the population densities around the world. We will use area and population information from GeoNames and get the geographic shapes of countries from DataHub.

In [70]:
# load country data from geonames 
geonames = pd.read_csv("https://www.geonames.org/countryInfoCSV", sep='\t')
# select four columns
geonames = geonames[['name', 'iso alpha3', 'areaInSqKm', 'population']]
# set index to country code
geonames = geonames.set_index("iso alpha3")

geonames.head()

Unnamed: 0_level_0,name,areaInSqKm,population
iso alpha3,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AND,Andorra,468.0,77006
ARE,United Arab Emirates,82880.0,9630959
AFG,Afghanistan,647500.0,37172386
ATG,Antigua and Barbuda,443.0,96286
AIA,Anguilla,102.0,13254


Next we collect the geographic boundaries and `simplify` them a bit, as they have more detail than what we need here:

In [71]:
# load country's polygons from datahub
polygons = gpd.read_file("https://datahub.io/core/geo-countries/r/countries.geojson")
# remove country names, as we have them already
polygons = polygons.drop(columns=["ADMIN"])
# set index to country code
polygons = polygons.set_index("ISO_A3")
# reduce the complexity of the shapes
polygons.geometry = polygons.geometry.simplify(.1)

polygons.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 255 entries, ABW to ZWE
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   geometry  255 non-null    geometry
dtypes: geometry(1)
memory usage: 4.0+ KB


As both DataFrames use the three-letter country codes as indices we can join them like this:

In [72]:
# inner means that we keep only those countries
# for which we have geometric and attribute data
countries = polygons.join(geonames, how='inner')

countries.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 238 entries, ABW to ZWE
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   geometry    238 non-null    geometry
 1   name        238 non-null    object  
 2   areaInSqKm  238 non-null    float64 
 3   population  238 non-null    int64   
dtypes: float64(1), geometry(1), int64(1), object(1)
memory usage: 9.3+ KB


Visualizing area or population in a choropleth map, arguably, makes little sense. So let's compute population densities:

In [73]:
countries["density"] = countries["population"] / countries["areaInSqKm"]

Keep only those countries with valid density value and turn these densities into integers:

In [74]:
countries = countries[countries['density'].notna()]
countries.density = countries.density.round(0).astype(int)

There is one ‘country’ that is not really one, which is Antarctica. We'll remove this from the list here.

In [75]:
countries = countries.drop("ATA")

Finally, we draw the chart using Altair's `mark_geoshape()` method. The distribution of densities is highly skewed, due to very small countries with relatively high population numbers, such as Monaco. To spread out the low and high density values we use a logarithmic scale and set the domain between 1 and 1000. Note that the domain has to end in a multiple of the base, which is by default 10.

In [76]:
alt.Chart(countries).mark_geoshape().encode(
    color=alt.Color('density', scale=alt.Scale(type="log", domain=[1,1000] )),
    tooltip=['name', 'areaInSqKm', 'population', 'density']
).project(
  # enter different projection here
).properties(
    width=800,
    height=600
)

The map is shown in the default Mercator projection, which particularly distorts the area sizes of North America, Europe and Russia in contrast to Africa, Southern Asia and parts of South America.

✏️ *Change the projection used above to one that does not distort area sizes as much ([see this list for options](https://vega.github.io/vega-lite/docs/projection.html#projection-types)).* 

## Sources

Tutorials & Documentation
- [Specifying Geospatial Data in Altair — Altair 4.1.0 documentation](https://altair-viz.github.io/user_guide/data.html#geospatial-data)
- [GeoPandas](https://geopandas.org)
- [OSMPythonTools](https://github.com/mocnik-science/osm-python-tools)
- [GeoPy](https://geopy.readthedocs.io/)

Data
- [Potsdam Open Data-Portal](https://opendata.potsdam.de)
- [OpenStreetMap](https://www.openstreetmap.org/)
- [GeoNames](https://www.geonames.org)
- [DataHub](https://datahub.io)
