<a href="https://colab.research.google.com/github/vkrisvasan/GraphRAG_exercise/blob/main/AirlineRouteOptimisationV1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [89]:
# This notebook demonstrates the use of Neo4j for airline route optimization.

# It uses datasets from Kaggle: airports.csv and routes.csv
# (https://www.kaggle.com/datasets/thedevastator/global-air-transportation-network-mapping-the-wo)

# The notebook covers the following:
# - Installing necessary libraries (graphdatascience, neo4j, kaggle)
# - Importing libraries (numpy, pandas, collections, os, math, itertools)
# - Setting pandas display options
# - Setting up environment variables for credentials (GROQ_API_KEY, NEO4J_CONNECTION_URL, NEO4J_USER, NEO4J_PASSWORD)
# - Downloading and unzipping datasets using kaggle API
# - Loading airport data into a pandas DataFrame (airport_df)
# - Cleaning and preprocessing airport data
# - Loading route data into a pandas DataFrame (routes_df)
# - Cleaning and preprocessing route data
# - Calculating distances between airports using Haversine formula
# - Adding distance and revenue columns to routes DataFrame
# - Connecting to Neo4j database
# - Resetting the database (deleting nodes, relationships, indexes, constraints, and properties)
# - Loading airport data into Neo4j
# - Loading route data into Neo4j
# - Defining functions for printing graph data and finding shortest paths
# - Finding shortest path between two airports (with direct connectivity)
# - Finding shortest path between two airports (with multiple stops)
# - Adding the ability to add stops in the route
# - Finding shortest path with distance, duration, and value
# - Accepting user input for source and destination airports
# - Finding the shortest path between the source and destination airports

#Console for Neo4j
#https://console.neo4j.io/

#workspace for neo4j
#https://workspace-preview.neo4j.io/connection/data-source

#APOC (Awesome Procedures On Cypher) Procedures are not used as APOC plugin is not installed in my Neo4j Auro instance
#Graph Data Science Shortest Path with Distance not used as the GDS driver not installed in my Neo4j Auro instance

!pip install neo4j kaggle -q

In [90]:

import numpy as np
import pandas as pd
import collections
import os
import math
from itertools import permutations
import itertools
from neo4j import GraphDatabase

pd.set_option('display.width', 0)
pd.set_option('display.max_colwidth', 500)
pd.set_option('display.max_rows', 50)

In [91]:
import os
import getpass
credential_names = ["NEO4J_CONNECTION_URL","NEO4J_USER","NEO4J_PASSWORD"]
for credential in credential_names:
  if credential not in os.environ:
    os.environ[credential]=getpass.getpass("Provide your..." + credential)

In [92]:
kaggle_username = os.environ.get('KAGGLE_USERNAME')
kaggle_api_key = os.environ.get('KAGGLE_KEY')

!kaggle datasets download thedevastator/global-air-transportation-network-mapping-the-wo -f airports.csv
!kaggle datasets download thedevastator/global-air-transportation-network-mapping-the-wo -f routes.csv
!unzip -o routes.csv.zip



Dataset URL: https://www.kaggle.com/datasets/thedevastator/global-air-transportation-network-mapping-the-wo
License(s): other
airports.csv: Skipping, found more recently modified local copy (use --force to force download)
Dataset URL: https://www.kaggle.com/datasets/thedevastator/global-air-transportation-network-mapping-the-wo
License(s): other
routes.csv.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  routes.csv.zip
  inflating: routes.csv              


In [93]:
airport_df = pd.read_csv("airports.csv")
airport_df = airport_df.drop('index', axis=1)
airport_df = airport_df[airport_df.IATA != '\\N']
print(airport_df.shape)

airport_all = airport_df[['Name','City','Country','Latitude', 'Longitude', 'IATA']]
IATA_array = airport_all["IATA"].tolist()

(6072, 14)


In [94]:


route_cols = ['airline', 'airlineID', 'source', 'sourceAirportID',
              'dest', 'destAirportID', 'codeshare', 'stops', 'equipment']
routes_df = pd.read_csv("routes.csv", skiprows=1, names = route_cols)
routes_df['sourceAirportID'] = pd.to_numeric(routes_df['sourceAirportID'].astype(str), 'coerce')
routes_df['destAirportID'] = pd.to_numeric(routes_df['destAirportID'].astype(str), 'coerce')


routes_df_explore = routes_df
routes_df_explore['source_dest'] = routes_df_explore['source'] + routes_df_explore['dest']

routes_df['flightID'] = routes_df.index +1

# make new route df with route count info
routes_all = pd.DataFrame(routes_df.groupby(['source', 'dest']).size().reset_index(name='counts'))
# only keep route with airports that have IATA code
routes_all = routes_all[routes_all['source'].isin(IATA_array)]
routes_all = routes_all[routes_all['dest'].isin(IATA_array)]

routes_df.head()
print(routes_df.shape)

# only keep route with airports that have IATA code
routes_df = routes_df[routes_df['source'].isin(IATA_array)]
routes_df = routes_df[routes_df['dest'].isin(IATA_array)]
print(routes_df.shape)
routes_df.head()



(67663, 11)
(66934, 11)


Unnamed: 0,airline,airlineID,source,sourceAirportID,dest,destAirportID,codeshare,stops,equipment,source_dest,flightID
0,2B,410,AER,2965.0,KZN,2990.0,,0,CR2,AERKZN,1
1,2B,410,ASF,2966.0,KZN,2990.0,,0,CR2,ASFKZN,2
2,2B,410,ASF,2966.0,MRV,2962.0,,0,CR2,ASFMRV,3
3,2B,410,CEK,2968.0,KZN,2990.0,,0,CR2,CEKKZN,4
4,2B,410,CEK,2968.0,OVB,4078.0,,0,CR2,CEKOVB,5


In [95]:
# Get the dataframe to calculate distance for all routes
lat_long_df = routes_df[['source', 'dest']]
merge_df = pd.merge(lat_long_df, airport_df[['IATA', 'Latitude', 'Longitude']], left_on = 'source', right_on = 'IATA')
merge_df = merge_df.rename(columns = {'Latitude': 'source_lat', 'Longitude': 'source_long'})
merge_df_2 = pd.merge(merge_df, airport_df[['IATA', 'Latitude', 'Longitude']], left_on = 'dest', right_on = 'IATA')
merge_df_2 = merge_df_2.rename(columns = {'Latitude': 'dest_lat', 'Longitude': 'dest_long'})
lat_long_df = merge_df_2.drop(['IATA_x', 'IATA_y'], axis = 1)
lat_long_df.head()

Unnamed: 0,source,dest,source_lat,source_long,dest_lat,dest_long
0,AER,KZN,43.449902,39.9566,55.606201,49.278702
1,ASF,KZN,46.283298,48.006302,55.606201,49.278702
2,CEK,KZN,55.305801,61.5033,55.606201,49.278702
3,DME,KZN,55.408798,37.9063,55.606201,49.278702
4,DME,KZN,55.408798,37.9063,55.606201,49.278702


In [96]:
# add distance column
distances = []

for index, row in lat_long_df.iterrows():
    R = 6371
    phi1 = math.radians(row['source_lat'])
    phi2 = math.radians(row['dest_lat'])
    delta_phi = math.radians(row['dest_lat'] - row['source_lat'])
    delta_lambda = math.radians(row['dest_long'] - row['source_long'])
    a = math.sin(delta_phi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(delta_lambda / 2) ** 2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    distance = R * c
    distance = round(distance * 0.621371,1)
    distances.append(distance)

lat_long_df['distance'] = distances
lat_long_df['source_dest'] = lat_long_df['source'] + lat_long_df['dest']

lat_long_df.head()
lat_long_df_dup = lat_long_df.drop_duplicates()

In [97]:
# Add distance column to routes_df
routes_df = pd.merge(routes_df, lat_long_df_dup[['source_dest', 'distance']], left_on = 'source_dest', right_on = 'source_dest')
print(routes_df.shape)
routes_df.head()

# Add a revenue column (value) - assuming $0.70/mile
routes_df['value'] = routes_df['distance'] * 0.70
routes_df.head()

# Add a time cost for route value - assuming average speed of 500 MPH
routes_df['duration'] = routes_df['distance'] / 500
routes_df.head()

routes_df.shape

(66934, 12)


(66934, 14)

In [98]:
routes_df.head()

Unnamed: 0,airline,airlineID,source,sourceAirportID,dest,destAirportID,codeshare,stops,equipment,source_dest,flightID,distance,value,duration
0,2B,410,AER,2965.0,KZN,2990.0,,0,CR2,AERKZN,1,936.3,655.41,1.8726
1,2B,410,ASF,2966.0,KZN,2990.0,,0,CR2,ASFKZN,2,646.5,452.55,1.293
2,2B,410,ASF,2966.0,MRV,2962.0,,0,CR2,ASFMRV,3,278.5,194.95,0.557
3,2B,410,CEK,2968.0,KZN,2990.0,,0,CR2,CEKKZN,4,478.8,335.16,0.9576
4,2B,410,CEK,2968.0,OVB,4078.0,,0,CR2,CEKOVB,5,831.8,582.26,1.6636


In [99]:
from neo4j import GraphDatabase
import os

# Environment variables for connection details
HOST = os.environ['NEO4J_CONNECTION_URL']
USERNAME = os.environ['NEO4J_USER']
PASSWORD = os.environ['NEO4J_PASSWORD']

# Initialize Neo4j driver and connect
try:
    driver = GraphDatabase.driver(HOST, auth=(USERNAME, PASSWORD))
    print("Connection to Neo4j Successful")
except Exception as e:
    print("Failed to create the driver:", e)
    exit(1)

# Example query
def check_connection():
    try:
        with driver.session() as session:
            result = session.run("RETURN 1 AS result")
            for record in result:
                print("Neo4j is connected, query result:", record["result"])
    except Exception as e:
        print(f"Failed to execute query: {e}")

check_connection()

# Close the driver connection after operations
driver.close()


Connection to Neo4j Successful
Neo4j is connected, query result: 1


In [100]:
# prompt: select only 1st 40 rows of airport_all

airport_all_first_500 = airport_all.head(500)
routes_df_first_2000 = routes_df.head(2000)


In [67]:
from neo4j import GraphDatabase
import os

# Environment variables for connection details
HOST = os.environ['NEO4J_CONNECTION_URL']
USERNAME = os.environ['NEO4J_USER']
PASSWORD = os.environ['NEO4J_PASSWORD']

# Initialize Neo4j driver
driver = GraphDatabase.driver(HOST, auth=(USERNAME, PASSWORD))

# Function to delete all nodes and relationships
def delete_all(tx):
    tx.run("MATCH (n) DETACH DELETE n")

# Function to drop all indexes
def drop_all_indexes(tx):
    # Get a list of all indexes
    result = tx.run("SHOW INDEXES YIELD name")
    for record in result:
        index_name = record['name']
        # Drop each index by its name
        tx.run(f"DROP INDEX {index_name}")

# Function to drop all constraints
def drop_all_constraints(tx):
    # Get a list of all constraints
    result = tx.run("SHOW CONSTRAINTS YIELD name")
    for record in result:
        constraint_name = record['name']
        # Drop each constraint by its name
        tx.run(f"DROP CONSTRAINT {constraint_name}")

# Function to remove all properties from all nodes
def remove_all_node_properties(tx):
    # Get a list of all nodes
    result = tx.run("MATCH (n) RETURN n")
    for record in result:
        node = record['n']
        # Remove all properties from the node
        for key in node.keys():
            tx.run(f"REMOVE {node}.*")

# Function to remove all properties from all relationships
def remove_all_relationship_properties(tx):
    # Get a list of all relationships
    result = tx.run("MATCH ()-[r]->() RETURN r")
    for record in result:
        relationship = record['r']
        # Remove all properties from the relationship
        for key in relationship.keys():
            tx.run(f"REMOVE {relationship}.*")

# Function to reset the database
def reset_database(driver):
    with driver.session() as session:
        # Delete all nodes and relationships
        session.execute_write(delete_all)
        print("All nodes and relationships deleted.")

        # Drop all indexes
        session.execute_write(drop_all_indexes)
        print("All indexes dropped.")

        # Drop all constraints
        session.execute_write(drop_all_constraints)
        print("All constraints dropped.")

        session.execute_write(remove_all_node_properties)
        print("All properties from nodes removed.")

        session.execute_write(remove_all_relationship_properties)
        print("All properties from relationships removed.")

        print("Database reset completed.")


# Run the function to reset the database
reset_database(driver)

# Close the driver connection
driver.close()


All nodes and relationships deleted.
All indexes dropped.
All constraints dropped.
All properties from nodes removed.
All properties from relationships removed.
Database reset completed.


In [68]:
# prompt: loading airports into graph

# Environment variables for connection details
HOST = os.environ['NEO4J_CONNECTION_URL']
USERNAME = os.environ['NEO4J_USER']
PASSWORD = os.environ['NEO4J_PASSWORD']

# Initialize Neo4j driver and connect
try:
    driver = GraphDatabase.driver(HOST, auth=(USERNAME, PASSWORD))
    print("Connection to Neo4j Successful")
except Exception as e:
    print("Failed to create the driver:", e)
    exit(1)

def load_airports(tx):
    for index, row in airport_all_first_500.iterrows():
        tx.run("""
            MERGE (a:Airport {iata: $iata, name: $name, city: $city, country: $country, latitude: $latitude, longitude: $longitude})
            """, iata=row['IATA'], name=row['Name'], city=row['City'], country=row['Country'], latitude=row['Latitude'], longitude=row['Longitude'])

with driver.session() as session:
    session.execute_write(load_airports)

driver.close()


Connection to Neo4j Successful


ERROR:neo4j.io:Failed to write data to connection ResolvedIPv4Address(('34.126.114.186', 7687)) (ResolvedIPv4Address(('34.126.114.186', 7687)))
ERROR:neo4j.io:Failed to write data to connection IPv4Address(('ba89e61b.databases.neo4j.io', 7687)) (ResolvedIPv4Address(('34.126.114.186', 7687)))


In [31]:
# prompt: print all nodes in the graph db

# Environment variables for connection details
HOST = os.environ['NEO4J_CONNECTION_URL']
USERNAME = os.environ['NEO4J_USER']
PASSWORD = os.environ['NEO4J_PASSWORD']

# Initialize Neo4j driver and connect
try:
    driver = GraphDatabase.driver(HOST, auth=(USERNAME, PASSWORD))
    print("Connection to Neo4j Successful")
except Exception as e:
    print("Failed to create the driver:", e)
    exit(1)


def print_all_nodes(tx):
    result = tx.run("MATCH (n) RETURN n")
    for record in result:
        print(record["n"])


with driver.session() as session:
    session.execute_read(print_all_nodes)

driver.close()


Connection to Neo4j Successful
<Node element_id='4:5a742b83-41e6-45c3-b1a5-aabb9f6fa592:2994' labels=frozenset({'Airport'}) properties={'country': 'Papua New Guinea', 'iata': 'GKA', 'city': 'Goroka', 'latitude': -6.0816898345900015, 'name': 'Goroka Airport', 'longitude': 145.391998291}>
<Node element_id='4:5a742b83-41e6-45c3-b1a5-aabb9f6fa592:2995' labels=frozenset({'Airport'}) properties={'country': 'Papua New Guinea', 'iata': 'MAG', 'city': 'Madang', 'latitude': -5.20707988739, 'name': 'Madang Airport', 'longitude': 145.789001465}>
<Node element_id='4:5a742b83-41e6-45c3-b1a5-aabb9f6fa592:2996' labels=frozenset({'Airport'}) properties={'country': 'Papua New Guinea', 'iata': 'HGU', 'city': 'Mount Hagen', 'latitude': -5.826789855957031, 'name': 'Mount Hagen Kagamuga Airport', 'longitude': 144.29600524902344}>
<Node element_id='4:5a742b83-41e6-45c3-b1a5-aabb9f6fa592:2997' labels=frozenset({'Airport'}) properties={'country': 'Papua New Guinea', 'iata': 'LAE', 'city': 'Nadzab', 'latitude':

In [69]:
# prompt: # Load flights into graph

# Environment variables for connection details
HOST = os.environ['NEO4J_CONNECTION_URL']
USERNAME = os.environ['NEO4J_USER']
PASSWORD = os.environ['NEO4J_PASSWORD']

# Initialize Neo4j driver and connect
try:
    driver = GraphDatabase.driver(HOST, auth=(USERNAME, PASSWORD))
    print("Connection to Neo4j Successful")
except Exception as e:
    print("Failed to create the driver:", e)
    exit(1)

def load_flights(tx):
    for index, row in routes_df_first_2000.iterrows():
        tx.run("""
            MATCH (source:Airport {iata: $source}), (dest:Airport {iata: $dest})
            MERGE (source)-[:FLIGHT {airline: $airline, distance: $distance, value: $value, duration: $duration}]->(dest)
            """, source=row['source'], dest=row['dest'], airline=row['airline'], distance=row['distance'], value=row['value'], duration=row['duration'])

with driver.session() as session:
    session.execute_write(load_flights)

driver.close()


Connection to Neo4j Successful


In [70]:
# Function to print the entire graph (nodes and relationships)
from neo4j import GraphDatabase

def print_entire_graph(tx):
    # Cypher query to retrieve all nodes and their relationships
    query = """
    MATCH (a)-[r]->(b)
    RETURN a, r, b
    """
    result = tx.run(query)

    # Loop through the result and print nodes and relationships
    for record in result:
        source = record['a']
        relationship = record['r']
        destination = record['b']
        print(f"Node 1: {source}, Relationship: {relationship.type}, Node 2: {destination}")

def print_airports_graph(driver):
    with driver.session() as session:
        session.execute_read(print_entire_graph)

# Assuming your Neo4j connection details are stored in environment variables
HOST = os.environ['NEO4J_CONNECTION_URL']
USERNAME = os.environ['NEO4J_USER']
PASSWORD = os.environ['NEO4J_PASSWORD']

# Initialize Neo4j driver
driver = GraphDatabase.driver(HOST, auth=(USERNAME, PASSWORD))

# Call the function to print the entire graph
print_airports_graph(driver)

# Close the driver
driver.close()


Node 1: <Node element_id='4:f3583ea5-8511-40e0-b5ce-072163d094c7:56' labels=frozenset({'Airport'}) properties={'country': 'Canada', 'iata': 'YGL', 'city': 'La Grande Riviere', 'latitude': 53.62530136108398, 'name': 'La Grande Rivière Airport', 'longitude': -77.7042007446289}>, Relationship: FLIGHT, Node 2: <Node element_id='4:f3583ea5-8511-40e0-b5ce-072163d094c7:138' labels=frozenset({'Airport'}) properties={'country': 'Canada', 'iata': 'YUL', 'city': 'Montreal', 'latitude': 45.4706001282, 'name': 'Montreal / Pierre Elliott Trudeau International Airport', 'longitude': -73.7407989502}>
Node 1: <Node element_id='4:f3583ea5-8511-40e0-b5ce-072163d094c7:74' labels=frozenset({'Airport'}) properties={'country': 'Canada', 'iata': 'YKL', 'city': 'Schefferville', 'latitude': 54.80530166625977, 'name': 'Schefferville Airport', 'longitude': -66.8052978515625}>, Relationship: FLIGHT, Node 2: <Node element_id='4:f3583ea5-8511-40e0-b5ce-072163d094c7:104' labels=frozenset({'Airport'}) properties={'cou

In [71]:
# Function to print the entire graph (nodes and relationships)
def print_entire_graph(tx):
    query = """
    MATCH (a)-[r]->(b)
    RETURN a, r, b
    """
    result = tx.run(query)

    # Loop through the result and print nodes and relationships
    for record in result:
        source = record['a']
        relationship = record['r']
        destination = record['b']
        print(f"Node 1: {source}, Relationship: {relationship.type}, Node 2: {destination}")

#function to find the shortest path between two airports
def find_shortest_path(tx, source_iata, dest_iata):
    query = """
    MATCH (source:Airport {iata: $source_iata})-[r:FLIGHT]->(dest:Airport {iata: $dest_iata})
    WITH source, dest, r ORDER BY r.distance ASC
    RETURN source, dest, r LIMIT 1
    """
    result = tx.run(query, source_iata=source_iata, dest_iata=dest_iata)

    record = result.single()
    if record:
        source = record['source']
        destination = record['dest']
        relationship = record['r']
        print(f"Shortest Path from {source['iata']} to {destination['iata']}:")
        print(f"  Airline: {relationship['airline']}, Distance: {relationship['distance']} km")
    else:
        print(f"No direct flight exists between {source_iata} and {dest_iata}.")


In [78]:
#to find Finding Shortest Flight Between Airports where there is direct connectivity
from neo4j import GraphDatabase
import os

# Neo4j connection details from environment variables
HOST = os.environ['NEO4J_CONNECTION_URL']
USERNAME = os.environ['NEO4J_USER']
PASSWORD = os.environ['NEO4J_PASSWORD']

# Initialize Neo4j driver
driver = GraphDatabase.driver(HOST, auth=(USERNAME, PASSWORD))

# Function to print all relationships between two nodes and find the shortest path
def print_airports_and_find_shortest_path(driver, source_iata, dest_iata):
    with driver.session() as session:
        print("Printing Entire Graph:")
        #session.execute_read(print_entire_graph)

        print("\nFinding Shortest Flight Between Airports:")
        session.execute_read(find_shortest_path, source_iata, dest_iata)

# Example IATA codes to find the shortest path between
source_iata = "OUA" #"OUA" #"YZV"  # Example source airport
dest_iata = "BOY" #"BOY"   #"YGL" # Example destination airport

# Run the function to find the shortest path
print_airports_and_find_shortest_path(driver, source_iata, dest_iata)

# Close the driver connection
driver.close()


Printing Entire Graph:

Finding Shortest Flight Between Airports:
Shortest Path from OUA to BOY:
  Airline: 2J, Distance: 207.7 km


In [82]:
from neo4j import GraphDatabase
import os

# Neo4j connection details from environment variables
HOST = os.environ['NEO4J_CONNECTION_URL']
USERNAME = os.environ['NEO4J_USER']
PASSWORD = os.environ['NEO4J_PASSWORD']

# Initialize Neo4j driver
driver = GraphDatabase.driver(HOST, auth=(USERNAME, PASSWORD))

def find_all_paths_no_apoc(tx, source_iata, dest_iata, max_depth=6):
    query = f"""
    MATCH (source:Airport {{iata: $source_iata}})
    MATCH (dest:Airport {{iata: $dest_iata}})
    WITH source, dest
    MATCH path = (source)-[:FLIGHT*..{max_depth}]->(dest)
    RETURN path
    """
    result = tx.run(query, source_iata=source_iata, dest_iata=dest_iata)
    for record in result:
        path = record['path']
        print(f"Path: {path}")

with driver.session() as session:
    session.execute_read(find_all_paths_no_apoc, "YZV", "YGL")

driver.close()



Path: <Path start=<Node element_id='4:f3583ea5-8511-40e0-b5ce-072163d094c7:194' labels=frozenset({'Airport'}) properties={'country': 'Canada', 'iata': 'YZV', 'city': 'Sept-iles', 'latitude': 50.22330093383789, 'name': 'Sept-Îles Airport', 'longitude': -66.26560211181639}> end=<Node element_id='4:f3583ea5-8511-40e0-b5ce-072163d094c7:56' labels=frozenset({'Airport'}) properties={'country': 'Canada', 'iata': 'YGL', 'city': 'La Grande Riviere', 'latitude': 53.62530136108398, 'name': 'La Grande Rivière Airport', 'longitude': -77.7042007446289}> size=6>
Path: <Path start=<Node element_id='4:f3583ea5-8511-40e0-b5ce-072163d094c7:194' labels=frozenset({'Airport'}) properties={'country': 'Canada', 'iata': 'YZV', 'city': 'Sept-iles', 'latitude': 50.22330093383789, 'name': 'Sept-Îles Airport', 'longitude': -66.26560211181639}> end=<Node element_id='4:f3583ea5-8511-40e0-b5ce-072163d094c7:56' labels=frozenset({'Airport'}) properties={'country': 'Canada', 'iata': 'YGL', 'city': 'La Grande Riviere', '

In [83]:
from neo4j import GraphDatabase
import os

# Neo4j connection details from environment variables
HOST = os.environ['NEO4J_CONNECTION_URL']
USERNAME = os.environ['NEO4J_USER']
PASSWORD = os.environ['NEO4J_PASSWORD']

# Initialize Neo4j driver
driver = GraphDatabase.driver(HOST, auth=(USERNAME, PASSWORD))

def find_shortest_path_distance(tx, source_iata, dest_iata, max_depth=6):
  query = f"""
  MATCH (source:Airport {{iata: $source_iata}})
  MATCH (dest:Airport {{iata: $dest_iata}})
  WITH source, dest
  MATCH path = (source)-[:FLIGHT*..{max_depth}]->(dest)
  RETURN path, reduce(distance = 0, r IN relationships(path) | distance + r.distance) AS total_distance
  ORDER BY total_distance ASC
  LIMIT 1
  """
  result = tx.run(query, source_iata=source_iata, dest_iata=dest_iata)
  record = result.single()
  if record:
    path = record['path']
    total_distance = record['total_distance']
    print(f"Shortest path: {path}")
    print(f"Total distance: {total_distance}")
  else:
    print(f"No path found between {source_iata} and {dest_iata}")

with driver.session() as session:
  session.execute_read(find_shortest_path_distance, "YZV", "YGL")

driver.close()

Shortest path: <Path start=<Node element_id='4:f3583ea5-8511-40e0-b5ce-072163d094c7:194' labels=frozenset({'Airport'}) properties={'country': 'Canada', 'iata': 'YZV', 'city': 'Sept-iles', 'latitude': 50.22330093383789, 'name': 'Sept-Îles Airport', 'longitude': -66.26560211181639}> end=<Node element_id='4:f3583ea5-8511-40e0-b5ce-072163d094c7:56' labels=frozenset({'Airport'}) properties={'country': 'Canada', 'iata': 'YGL', 'city': 'La Grande Riviere', 'latitude': 53.62530136108398, 'name': 'La Grande Rivière Airport', 'longitude': -77.7042007446289}> size=4>
Total distance: 1641.2


In [101]:
from neo4j import GraphDatabase
import os

# Neo4j connection details from environment variables
HOST = os.environ['NEO4J_CONNECTION_URL']
USERNAME = os.environ['NEO4J_USER']
PASSWORD = os.environ['NEO4J_PASSWORD']

# Initialize Neo4j driver
driver = GraphDatabase.driver(HOST, auth=(USERNAME, PASSWORD))

def find_shortest_path_distance(tx, source_iata, dest_iata, max_depth=6):
    query = f"""
    MATCH (source:Airport {{iata: $source_iata}})
    MATCH (dest:Airport {{iata: $dest_iata}})
    WITH source, dest
    MATCH path = (source)-[:FLIGHT*..{max_depth}]->(dest)
    RETURN
        path,
        reduce(distance = 0, r IN relationships(path) | distance + r.distance) AS total_distance,
        reduce(duration = 0, r IN relationships(path) | duration + r.duration) AS total_duration,
        reduce(value = 0, r IN relationships(path) | value + r.value) AS total_value
    ORDER BY total_distance ASC
    LIMIT 1
    """
    result = tx.run(query, source_iata=source_iata, dest_iata=dest_iata)
    record = result.single()
    if record:
        path = record['path']
        total_distance = record['total_distance']
        total_duration = record['total_duration']
        total_value = record['total_value']

        # Extract nodes from the path
        nodes = [node['iata'] for node in path.nodes]
        path_str = ' -> '.join(nodes)

        print(f"Shortest path: {path_str}")
        print(f"Total distance: {total_distance}")
        print(f"Total duration: {total_duration}")
        print(f"Total value: {total_value}")
    else:
        print(f"No path found between {source_iata} and {dest_iata}")

with driver.session() as session:
    source_iata = input("Enter source airport IATA code: i.e YZV ")
    dest_iata = input("Enter destination airport IATA code: i.e YGL ")

    print(f"\nFinding Shortest Flight Between Airports {source_iata} to {dest_iata}:")
    session.execute_read(find_shortest_path_distance, source_iata, dest_iata)

driver.close()

Enter source airport IATA code: i.e YZV YZV
Enter destination airport IATA code: i.e YGL YGL

Finding Shortest Flight Between Airports YZV to YGL:
Shortest path: YZV -> YKL -> YQB -> YUL -> YGL
Total distance: 1641.2
Total duration: 3.2824
Total value: 1148.84
