# ICAO airport codes not matching between routes and geolocation

Getting 2 data sets: flight routes (from airport to airport) and geo location (lat,long) of airports in the world, it has been detected inconsistency, meaning: not every airport in the "routes" is found in "geo"

In order to progress, we define a function which calculates the percentage of routes affected out of the total amount of routes and returns a list with the specific airports code missing

It would be possible to inspect one by one the every airport.

This notebook take example of Germany.

So steps:

- Definition of function calculating missing airports

- Apply function to sample from Germany

- Dictionary of Airports

Some airports were found in the Geo file with other code. For whatevery reason, some airports has changed the ICAO code or have 2 differents.
Some airports were not found

In [1]:
import pandas as pd

pd.set_option('display.max_rows', 10)

### Definition of function


In [2]:
# Function which calculates percentage of routes affected by "missing" airports
# Consider routes are aggregated by months in the original source, 
# meaning the same route along a year will appear 12 times vs routes active only 1 month 
# This is important for estimate the impact of the percentage, 
# since typically routes with missing airport are not very regular destinations

def missing_airports(routes, geo):
    # series of from/to airports & and available in geo
    fr_a = routes.fr_airport
    to_a = routes.to_airport
    airp = geo.icao
    # Check amount of false value. airports not matched
    fr_check = fr_a.isin(airp)
    to_check = to_a.isin(airp)
    print("percentage of missing 'from airports' routes out of the total routes ", round(len(routes.iloc[list(fr_check[fr_check == False].index)].fr_airport)/len(routes),4))
    print("percentage of missing 'to airports' routes out of the total routes ", round(len(routes.iloc[list(to_check[to_check == False].index)].to_airport)/len(routes),4))
    # airports affected "From airports"
    missing_fr_airports = routes.iloc[list(fr_check[fr_check == False].index)].fr_airport.unique()
    # airports affected "To airports"
    missing_to_airports = routes.iloc[list(to_check[to_check == False].index)].to_airport.unique()
    return list(missing_fr_airports) + list(missing_to_airports)


### Apply function to sample from Germany

In [4]:
# import routes from Germany and geo location
path = '../data/'
geo = pd.read_csv(path + "world_airports.csv")
routes = pd.read_csv(path + "flight_data_de.csv")


In [5]:
missing_airports(routes, geo)

percentage of missing 'from airports' routes out of the total routes  0.0001
percentage of missing 'to airports' routes out of the total routes  0.0193


['ED00',
 'LSZM',
 'LYPR',
 'FAJS',
 'DTNZ',
 'SABA',
 'CUUP',
 'ZSSA',
 'RJNN',
 'GMMC',
 'OR99',
 'UAFF',
 'OT99',
 'K999',
 'EN00',
 'ED99',
 'LECP',
 'ENOS',
 'LIVT',
 'LPFU',
 'ES99',
 'ESMM']

### Dictionary of airports

Some airports were found in the Geo file with other code. For whatever reason, some airports has changed the ICAO code or have 2 differents. After manual search, those found are stored in a dictionary
- key: ICAO airport code in the route "missing value"
- value: ICAO airport code identified as equivalent in the geo file

replacing or adding this elements will decrease the percent of missing values

in the example of Germany, 10 airports were found and 11 not, which stay in a list

In [6]:
# dictionary to those found

icao_dict = {"SABA":"TNCS", # Saba, Netherland antilles
             "LSZM":"LFSB", # Basel, Switzerland
             "BKPR":"LYPR", # Pristina, Kosovo
             "FAOR":"FAJS", # Johannesburg, SouthAfrica
             "DTNH":"DTNZ", # Enfidha, Tunisia
             "ZSSA":"ZSSS", # Hongqiao, China
             "RJNN":"RJNA", # Nagoya, Japan
             "GMMC":"CMMN", # Casablanca, Morocco
             "LECP":"LEPA", # Palma de Mallorca, Spain
             "ENOS":"ENGM", # Oslo, Norway
             "LPFU":"LPMA"} # Madeira, Portugal


# list of those not found

icao_not_found = ["ED00", # somewhere in Germany
                  "CUUP", # somewhere in Canada
                  "OR99", # somewhere in Iraq
                  "UAFF", # somewhere in Kyrgyzstan
                  "OT99", # somewhere in Qatar
                  "K999", # somewhere in USA
                  "EN00", # somewhere in Norway
                  "ED99", # somewhere in Germany
                  "LIVT", # somewhere in Italy
                  "ES99", # somewhare in Sweden
                  "ESMM"] # somewhere in Sweden