<a href="https://colab.research.google.com/github/yoba7/WikidataExampleOfUse/blob/master/Get_Airports_locations_from_Wikidata.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Get airports locations from Wikidata

This code will extract airports locations from [Wikidata](https://www.wikidata.org/).

This code can be executed directly in [Google colab](https://github.com/googlecolab/colabtools) using Chrome and its oppropriate [extension](https://github.com/googlecolab/open_in_colab). 

Keywords: Airports, IATA, ICAO, Latitude, Longitude, Coordinates, GeoDataFrame, Geopandas, GeoJSON, SPARQL, Wikidata, Linked Open Data, URIs, Google Colab, GIS

Author: Youri Baeyens



In [0]:
# This step installs missing packages - Can last a bit

%%capture

!pip install SPARQLWrapper
!pip install geopandas
!pip install folium

In [0]:
import pandas as pd
import json
from SPARQLWrapper import SPARQLWrapper, JSON
from geopandas import GeoDataFrame
from shapely.geometry import Point
from shapely.wkt import loads

## Part 1 - Extract information from Wikidata


### Background information

#### Wikidata

*"[Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page) is a free and open knowledge base that can be read and edited by both humans and machines."*

*"Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others."" *

#### Wikidata query service: the SPARQL endpoint of Wikidata

It is possible to query Wikidata data through the [Wikidata Query Service (WQS)](https://query.wikidata.org/). The query language used by WQS is [SPARQL](https://en.wikipedia.org/wiki/SPARQL).

Here is [an exemple of SPARQL query](http://tinyurl.com/y6zdf4zn) to get info on Belgian airports. You can view it as as map, if you prefer, with [this query](http://tinyurl.com/y6fxydeg).


#### SPARQL: General introduction

If you want to have a first introduction on SPARQL, take a look at this video (11 mintues):

<a href="http://www.youtube.com/watch?feature=player_embedded&v=FvGndkpa4K0
" target="_blank"><img src="http://img.youtube.com/vi/FvGndkpa4K0/0.jpg" 
alt="SPARQL in 11 minutes" width="240" height="180" border="2" /></a>


#### Wikidata SPARQL query tutorial

If you want to have a good introduction on Wikidata Query Service, take a look at this video (16 minutes): 

<a href="http://www.youtube.com/watch?feature=player_embedded&v=1jHoUkj_mKw
" target="_blank"><img src="http://img.youtube.com/vi/1jHoUkj_mKw/0.jpg" 
alt="Wikidata SPARQL query tutorial" width="240" height="180" border="2" /></a>

### Function to submit a SPARQL query

Function signature: *get_sparql_dataframe(service:str, query:str) -> pd.Dataframe*

This function has been created by Ted Lawless. It is available on [GitHub](https://lawlesst.github.io/notebook/sparql-dataframe.html). This function runs a [SPARQL query](https://fr.wikipedia.org/wiki/SPARQL) on a SPARQL endpoint.

Arguments:
- service: SPARQL endpoint URL
- query: SPARQL query text

Returns: a [pandas](https://pandas.pydata.org/) dataframe containing the results of the query.


In [0]:
def get_sparql_dataframe(service, query):
    """
    Helper function to convert SPARQL results into a Pandas data frame.
    """
    sparql = SPARQLWrapper(service)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    result = sparql.query()

    processed_results = json.load(result.response)
    cols = processed_results['head']['vars']

    out = []
    for row in processed_results['results']['bindings']:
        item = []
        for c in cols:
            item.append(row.get(c, {}).get('value'))
        out.append(item)

    return pd.DataFrame(out, columns=cols)

### Submit SPARQL Query to get Belgian airports

Here is the SPARQL used to get coordinates (longitude, latitude) of all Belgian airports known to Wikidata, along with [IATA codes](https://en.wikipedia.org/wiki/IATA_airport_code), [ICAO codes](https://en.wikipedia.org/wiki/ICAO_airport_code) and [geonames](https://en.wikipedia.org/wiki/GeoNames) ids

Coordinates are expressed in "well known text" [wkt](https://en.wikipedia.org/wiki/Well-known_text) form.

In [10]:
# Web address of the SPARQL endpoint

wds = "https://query.wikidata.org/sparql"

# The query

query="""
SELECT DISTINCT ?airport ?airportLabel ?coor ?geonamesId ?IATA ?ICAO
WHERE {
       # ?airport must be an instance (wdt:P31) or a sub-class (wdt:P279) or airport (wd:Q1248784)
       # Example of sub-classes of airport:
       #   -  international airport (Q644371) 
       #   -  commercial airport (Q20977786) 
       ?airport wdt:P31/wdt:P279* wd:Q1248784.
  
       # ?airport must be in Belgium (wd:Q31).
       # remove the following line if you want all airports
       ?airport ?range            wd:Q31.
  
       # ?airport coordinates location (wdt:P625) as stored in ?coor.
       ?airport wdt:P625          ?coor.
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  optional { ?airport wdt:P1566 ?geonamesId}
  optional { ?airport wdt:P238  ?IATA}
  optional { ?airport wdt:P239  ?ICAO}
}
"""

# Run the query

df = get_sparql_dataframe(wds, query)

# Data quality: Only consider coor that are real Point objects.

df=df[df['coor'].str.contains('Point')]

# Inspect DataFrame
# coor is a column containing WKT of locations

df.head()

Unnamed: 0,airport,airportLabel,coor,geonamesId,IATA,ICAO
0,http://www.wikidata.org/entity/Q2875612,Saint-Hubert Airport,Point(5.40417 50.0358),,,EBSH
1,http://www.wikidata.org/entity/Q16894473,Maillen Airport,Point(4.9265 50.375),,,EBML
2,http://www.wikidata.org/entity/Q16890695,Couthuin Airport,Point(5.10719 50.5378),9035031.0,,EBHE
3,http://www.wikidata.org/entity/Q2875370,Amougies Airfield,Point(3.48494 50.7401),,,EBAM
4,http://www.wikidata.org/entity/Q2930455,Ursel Airbase,Point(3.475555 51.144167),,,EBUL


## Part 2 - GIS

### Transform our DataFrame into a GeoDataFrame

[Geopandas](http://geopandas.org/) adds GIS functionalities to [Pandas](https://pandas.pydata.org/).

In [0]:
# loads is used to transform WKT to Shapely points
crs = {'init': 'epsg:4326'}
gdf = GeoDataFrame(df, crs=crs, geometry=df['coor'].apply(loads))

### Build a map with folium

In [13]:
import folium

points = folium.features.GeoJson(gdf.to_json())

mapa = folium.Map([50.46, 4.45],zoom_start=8,tiles='cartodbpositron')
mapa.add_children(points)
mapa




### Function to generate URIs

This function is used to transform geonamesId into URIs

- Argument:  _geonames identifier_ 
- Returns: [URI](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier). 

The geonames web server will redirect this URI to a web page that documents the concept. Brussels south airport, for exemple, is identified by this geonames URI: http://www.geonames.org/6296489. This URI is documented by this web page: http://www.geonames.org/6296489/brussels-south-charleroi-airport.html


In [14]:
def geonamesLink(geonamesId):
    if not pd.isna(geonamesId):
        return('https://www.geonames.org/'+str(geonamesId))
    else:
        return(None)

# Example of use      
print(geonamesLink(6296489))

https://www.geonames.org/6296489


In [0]:
gdf['geonamesLink']=gdf['geonamesId'].apply(geonamesLink)

In [16]:
gdf.head()

Unnamed: 0,airport,airportLabel,coor,geonamesId,IATA,ICAO,geometry,geonamesLink
0,http://www.wikidata.org/entity/Q2875612,Saint-Hubert Airport,Point(5.40417 50.0358),,,EBSH,POINT (5.40417 50.0358),
1,http://www.wikidata.org/entity/Q16894473,Maillen Airport,Point(4.9265 50.375),,,EBML,POINT (4.9265 50.375),
2,http://www.wikidata.org/entity/Q16890695,Couthuin Airport,Point(5.10719 50.5378),9035031.0,,EBHE,POINT (5.10719 50.5378),https://www.geonames.org/9035031
3,http://www.wikidata.org/entity/Q2875370,Amougies Airfield,Point(3.48494 50.7401),,,EBAM,POINT (3.48494 50.7401),
4,http://www.wikidata.org/entity/Q2930455,Ursel Airbase,Point(3.475555 51.144167),,,EBUL,POINT (3.475555 51.144167),


### Export GeoJSON for later use in GIS tools

Export GeoJSON file out of Google colab. You'll be able to import the file in [QGIS](https://www.qgis.org/en/site/), for example. Doing so, you'll be able to build your own maps.

In [0]:
from google.colab import files

gdf.to_file(driver = 'GeoJSON', filename= "airports.geojson",encoding='utf-8')

files.download('airports.geojson')

  with fiona.drivers():
