# Neo4J GDS 

https://neo4j.com/docs/graph-data-science/current/algorithms/centrality/

## Centrality
Centrality algorithms are used to determine the importance of distinct nodes in a network. The Neo4j GDS library includes the following centrality algorithms, grouped by quality tier:

## Production-quality

- Article Rank
<br>
- Betweenness Centrality
<br>
- CELF
<br>
- Closeness Centrality
<br>
- Degree Centrality
<br>
- Eigenvector Centrality
<br>
- Page Rank

IN Alpha

Harmonic Centrality and HITS

In [1]:
from dotenv import dotenv_values
config = dotenv_values(".env")

In [61]:
from langchain.graphs import Neo4jGraph

graph = Neo4jGraph(
    url="bolt://18.234.164.187:7687",
    username="neo4j",
    password="energies-hope-fuses"
)

r = graph.query("""MATCH (n:Airport {city:"Los Angeles"}) RETURN n""")
print(r)

[{'n': {'altitude': 127, 'descr': 'Los Angeles International Airport', 'longest': 12091, 'iata': 'LAX', 'city': 'Los Angeles', 'icao': 'KLAX', 'location': POINT(-118.4079971 33.94250107), 'id': '13', 'pagerank': 8.193558075446687, 'runways': 4}}]


# PageRank
https://neo4j.com/docs/graph-data-science/current/algorithms/page-rank/

The PageRank algorithm measures the importance of each node within the graph, based on the number incoming relationships and the importance of the corresponding source nodes. The underlying assumption roughly speaking is that a page is only as important as the pages that link to it.


In [3]:
# Remove Property
"""MATCH (m:Country)
remove m.num_airports
return m
"""

'MATCH (m:Country)\nremove m.num_airports\nreturn m\n'

In [62]:
# create a new property in Country with the count of Airports in that country
r = graph.query("""MATCH (m:Country)
OPTIONAL MATCH (p:Airport)-[r:IN_COUNTRY]->(m:Country)
with  m,p, count(r) as num_airports
set m.num_airports = num_airports
set p.country_airports = num_airports
return m""")
print(len(r))

3503


In [64]:
# create a new property in Region with the count of Airports in that Region
r = graph.query("""MATCH (m:Region)
OPTIONAL MATCH (p:Airport)-[r:IN_REGION]->(m:Region)
with  m,p, count(r) as num_airports
set m.num_airports = num_airports
set p.region_airports = num_airports
return m""")
print(len(r))

3503


In [None]:
# create attribute in Relationship IN_COUNTRY
"""
MATCH (p:Airport)-[r:HAS_ROUTE]->(m:Airport)
with  r, p, m, p.region_airports as region_airports, p.country_airports as country_airports, p.runways as runways
merge (p)-[:HAS_ROUTE_WEIGHT {weight : (region_airports* runways)/country_airports }]->(m) 

return p limit 25
"""

In [6]:
# create Projection Num Airports

In [None]:
# Call gds.graph.drop("Graph_Airports")

In [138]:
r = graph.query("""CALL gds.graph.project(
  'Graph_Airports',
  'Airport',
  'HAS_ROUTE_WEIGHT',
  {
    relationshipProperties: 'weight'
  }
)""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'graphName': 'Graph_Airports', 'nodeCount': 3503, 'relationshipCount': 46389, 'projectMillis': 31}]


In [66]:
len(r)

1

In [139]:
r[-1]

{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}},
 'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT',
   'orientation': 'NATURAL',
   'indexInverse': False,
   'properties': {'weight': {'aggregation': 'DEFAULT',
     'property': 'weight',
     'defaultValue': None}},
   'type': 'HAS_ROUTE_WEIGHT'}},
 'graphName': 'Graph_Airports',
 'nodeCount': 3503,
 'relationshipCount': 46389,
 'projectMillis': 31}

# Memory Estimation


In [None]:
# Call gds.graph.drop("AirportGraph")

In [140]:
r = graph.query("""CALL gds.pageRank.write.estimate('Graph_Airports', {
  writeProperty: 'pageRank_weight',
  maxIterations: 20,
  dampingFactor: 0.85
})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
""")

print(r)

[{'nodeCount': 3503, 'relationshipCount': 46389, 'bytesMin': 85336, 'bytesMax': 85336, 'requiredMemory': '83 KiB'}]


# Run the algorithm in stream mode

In the stream execution mode, the algorithm returns the score for each node. This allows us to inspect the results directly or post-process them in Cypher without any side effects. For example, we can order the results to find the nodes with the highest PageRank score.

In [141]:
# unweighted
r = graph.query("""CALL gds.pageRank.stream('Graph_Airports')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).descr AS code, score
ORDER BY score DESC""")

print(r[:5])

[{'code': 'Dallas/Fort Worth International Airport', 'score': 11.97978260670334}, {'code': "Chicago O'Hare International Airport", 'score': 11.162988178920267}, {'code': 'Denver International Airport', 'score': 10.997299338126385}, {'code': 'Hartsfield - Jackson Atlanta International Airport', 'score': 10.389948350302957}, {'code': 'Istanbul International Airport', 'score': 8.425801217705782}]


In [142]:
r[:5]

[{'code': 'Dallas/Fort Worth International Airport',
  'score': 11.97978260670334},
 {'code': "Chicago O'Hare International Airport", 'score': 11.162988178920267},
 {'code': 'Denver International Airport', 'score': 10.997299338126385},
 {'code': 'Hartsfield - Jackson Atlanta International Airport',
  'score': 10.389948350302957},
 {'code': 'Istanbul International Airport', 'score': 8.425801217705782}]

### Weighted
By default, the algorithm is considering the relationships of the graph to be unweighted, to change this behaviour we can use configuration parameter called relationshipWeightProperty. In the weighted case, the previous score of a node send to its neighbors, is multiplied by the relationship weight and then divided by the sum of the weights of its outgoing relationships. If the value of the relationship property is negative it will be ignored during computation

In [127]:
r = graph.query("""CALL gds.pageRank.stream('Graph_Airports', {
  maxIterations: 20,
  dampingFactor: 0.85,
  relationshipWeightProperty: 'weight'
})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).descr AS code, score
ORDER BY score DESC

""")

print(r[:25])

[{'code': 'Dallas/Fort Worth International Airport', 'score': 11.979782606703338}, {'code': "Chicago O'Hare International Airport", 'score': 11.162988178920267}, {'code': 'Denver International Airport', 'score': 10.997299338126385}, {'code': 'Hartsfield - Jackson Atlanta International Airport', 'score': 10.389948350302959}, {'code': 'Istanbul International Airport', 'score': 8.425801217705784}, {'code': 'Paris Charles de Gaulle', 'score': 8.401469085296544}, {'code': 'George Bush Intercontinental', 'score': 8.34114108524013}, {'code': 'Frankfurt am Main', 'score': 8.203204538770198}, {'code': 'Los Angeles International Airport', 'score': 8.193558075446688}, {'code': 'Charlotte Douglas International Airport', 'score': 7.873302960818333}, {'code': 'Amsterdam Airport Schiphol', 'score': 7.812313354381448}, {'code': 'Toronto Pearson International Airport', 'score': 7.378619943321167}, {'code': 'Minneapolis-St.Paul International Airport', 'score': 7.093881625001813}, {'code': 'Dubai Interna

# Write the results
If the results of the algorithm are as expected, the next step can be to write them back to the Neo4j database

In [82]:
r = graph.query("""CALL gds.pageRank.write('Graph_Airports', {
  maxIterations: 20,
  dampingFactor: 0.85,
  writeProperty: 'pagerank_weight'
})
YIELD nodePropertiesWritten, ranIterations
""")

print(r)

[{'nodePropertiesWritten': 3503, 'ranIterations': 20}]


 # Query the Neo4j database

In [143]:
r = graph.query("""MATCH (n:Airport) RETURN n.pagerank_weight, n.descr LIMIT 10
""")

print(r)

[{'n.pagerank_weight': 10.38994835030296, 'n.descr': 'Hartsfield - Jackson Atlanta International Airport'}, {'n.pagerank_weight': 2.651157253813937, 'n.descr': 'Anchorage Ted Stevens'}, {'n.pagerank_weight': 3.3970361709221706, 'n.descr': 'Austin Bergstrom International Airport'}, {'n.pagerank_weight': 2.913092915746176, 'n.descr': 'Nashville International Airport'}, {'n.pagerank_weight': 5.587850199484761, 'n.descr': 'Boston Logan'}, {'n.pagerank_weight': 3.3294218606289148, 'n.descr': 'Baltimore/Washington International Airport'}, {'n.pagerank_weight': 3.6967971368788204, 'n.descr': 'Ronald Reagan Washington National Airport'}, {'n.pagerank_weight': 11.979782606703338, 'n.descr': 'Dallas/Fort Worth International Airport'}, {'n.pagerank_weight': 5.990039569972173, 'n.descr': 'Fort Lauderdale/Hollywood International Airport'}, {'n.pagerank_weight': 5.825059084187616, 'n.descr': 'Washington Dulles International Airport'}]


In [144]:
r = graph.query("""Call gds.graph.drop("Graph_Airports")""")
print(r)

[{'graphName': 'Graph_Airports', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 46389, 'configuration': {'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': 'a654b482-84d3-4fa8-9d64-a03f3e65c773', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 20, 8, 18, 643827085, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.0037814532146957986, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 20, 8, 18, 643827085, tzinfo=<UTC>), 'modificationTime': neo4j.time.DateTime(2024, 4, 3, 20, 8, 18, 677384145, tzinfo=<UTC>), 'schema

## Article Rank

https://neo4j.com/docs/graph-data-science/current/algorithms/article-rank/

ArticleRank is a variant of the Page Rank algorithm, which measures the transitive influence of nodes.

Page Rank follows the assumption that relationships originating from low-degree nodes have a higher influence than relationships from high-degree nodes. Article Rank lowers the influence of low-degree nodes by lowering the scores being sent to their neighbors in each iteration.

In [129]:
r = graph.query("""CALL gds.graph.project(
  'Graph_Airports_AR',
  'Airport',
  'HAS_ROUTE_WEIGHT',
  {
    relationshipProperties: 'weight'
  }
)""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'graphName': 'Graph_Airports_AR', 'nodeCount': 3503, 'relationshipCount': 46389, 'projectMillis': 57}]


In [130]:
r = graph.query("""CALL gds.articleRank.write.estimate('Graph_Airports_AR', {
  writeProperty: 'centrality',
  maxIterations: 20
})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory""")
print(r)

[{'nodeCount': 3503, 'relationshipCount': 46389, 'bytesMin': 85336, 'bytesMax': 85336, 'requiredMemory': '83 KiB'}]


In [132]:
r = graph.query("""CALL gds.articleRank.stream('Graph_Airports_AR')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).descr AS name, score
ORDER BY score DESC""")
print(r[:25])

[{'name': "Chicago O'Hare International Airport", 'score': 1.8939755133996963}, {'name': 'Dallas/Fort Worth International Airport', 'score': 1.8904536902822802}, {'name': 'Frankfurt am Main', 'score': 1.8876473521634138}, {'name': 'Paris Charles de Gaulle', 'score': 1.84972402662708}, {'name': 'Hartsfield - Jackson Atlanta International Airport', 'score': 1.8030872573050167}, {'name': 'Amsterdam Airport Schiphol', 'score': 1.774709069851866}, {'name': 'Istanbul International Airport', 'score': 1.7638492155308716}, {'name': 'Munich International Airport', 'score': 1.6746206317067545}, {'name': 'Denver International Airport', 'score': 1.640439360372361}, {'name': 'Dubai International Airport', 'score': 1.5430537022298554}, {'name': 'George Bush Intercontinental', 'score': 1.4945608138148059}, {'name': 'Los Angeles International Airport', 'score': 1.4787718312744296}, {'name': 'London Gatwick', 'score': 1.4667146715720172}, {'name': 'Beijing Capital International Airport', 'score': 1.4405

In [None]:
# Call gds.graph.drop("Graph_Airports")

In [133]:
r = graph.query("""Call gds.graph.drop("Graph_Airports_AR")""")
print(r)

[{'graphName': 'Graph_Airports_AR', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 46389, 'configuration': {'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': 'a7c684a8-a969-42af-aef4-07b928443a86', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 18, 58, 54, 198268154, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.0037814532146957986, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 18, 58, 54, 198268154, tzinfo=<UTC>), 'modificationTime': neo4j.time.DateTime(2024, 4, 3, 18, 58, 54, 256329738, tzinfo=<UTC>), '

## Betweenness Centrality
https://neo4j.com/docs/graph-data-science/current/algorithms/betweenness-centrality/

Betweenness centrality is a way of detecting the amount of influence a node has over the flow of information in a graph. It is often used to find nodes that serve as a bridge from one part of a graph to another.

The algorithm calculates shortest paths between all pairs of nodes in a graph. Each node receives a score, based on the number of shortest paths that pass through the node. Nodes that more frequently lie on shortest paths between other nodes will have higher betweenness centrality scores.

In [145]:
r = graph.query("""CALL gds.graph.project(
  'Graph_Airports_BC',
  'Airport',
  {HAS_ROUTE_WEIGHT:{properties: 'weight'}}
)""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'graphName': 'Graph_Airports_BC', 'nodeCount': 3503, 'relationshipCount': 46389, 'projectMillis': 101}]


In [146]:
r = graph.query("""CALL gds.betweenness.write.estimate('Graph_Airports_BC', { writeProperty: 'betweenness' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory""")
print(r)

[{'nodeCount': 3503, 'relationshipCount': 46389, 'bytesMin': 2327664, 'bytesMax': 2327664, 'requiredMemory': '2273 KiB'}]


In [147]:
r = graph.query("""CALL gds.betweenness.stream('Graph_Airports_BC')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).descr AS descr, score
ORDER BY score DESC""")
print(r[:5])

[{'descr': 'Dubai International Airport', 'score': 390958.5850961311}, {'descr': 'Los Angeles International Airport', 'score': 368734.10410234286}, {'descr': 'Paris Charles de Gaulle', 'score': 365259.6507074962}, {'descr': 'Beijing Capital International Airport', 'score': 340393.582883937}, {'descr': 'Istanbul International Airport', 'score': 339441.88600743026}]


In [148]:
r = graph.query("""Call gds.graph.drop("Graph_Airports_BC")""")
print(r)

[{'graphName': 'Graph_Airports_BC', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 46389, 'configuration': {'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': 'f3844e00-a841-4ac3-be1e-859888046097', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 20, 23, 40, 544517750, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.0037814532146957986, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 20, 23, 40, 544517750, tzinfo=<UTC>), 'modificationTime': neo4j.time.DateTime(2024, 4, 3, 20, 23, 40, 647084331, tzinfo=<UTC>), '

# CELF
https://neo4j.com/docs/graph-data-science/current/algorithms/celf/

The influence maximization problem asks for a set of k nodes that maximize the expected spread of influence in the network. The set of these initial k is called the seed set


In [100]:
r = graph.query("""CALL gds.graph.project(
  'Graph_Airports_CELF',
  'Airport',
  'HAS_ROUTE_WEIGHT')""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'graphName': 'Graph_Airports_CELF', 'nodeCount': 3503, 'relationshipCount': 46389, 'projectMillis': 105}]


In [101]:
r = graph.query("""CALL gds.influenceMaximization.celf.write.estimate('Graph_Airports_CELF', {
  writeProperty: 'spread',
  seedSetSize: 3
})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory""")
print(r)

[{'nodeCount': 3503, 'relationshipCount': 46389, 'bytesMin': 257232, 'bytesMax': 257232, 'requiredMemory': '251 KiB'}]


In [103]:
# In the stats execution mode, the algorithm returns a single row containing a summary of the algorithm result.
r = graph.query("""CALL gds.influenceMaximization.celf.stats('Graph_Airports_CELF', {seedSetSize: 3})
YIELD totalSpread""")
print(r)

[{'totalSpread': 1152.08}]


In [105]:
# In the stats execution mode, the algorithm returns a single row containing a summary of the algorithm result.
r = graph.query("""CALL gds.influenceMaximization.celf.stream('Graph_Airports_CELF', {seedSetSize: 3})
YIELD nodeId, spread
RETURN gds.util.asNode(nodeId).descr AS name, spread
ORDER BY spread DESC, name ASC""")
print(r[:3])

[{'name': 'Hartsfield - Jackson Atlanta International Airport', 'spread': 1143.31}, {'name': 'Dallas/Fort Worth International Airport', 'spread': 4.8900000000001}, {'name': 'Anchorage Ted Stevens', 'spread': 3.8799999999998818}]


In [106]:
r = graph.query("""Call gds.graph.drop("Graph_Airports_CELF")""")
print(r)

[{'graphName': 'Graph_Airports_CELF', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 46389, 'configuration': {'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': '6fc74ed9-ab99-4551-bb8c-8d8b16f7aee4', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 17, 30, 12, 640343474, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.0037814532146957986, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 17, 30, 12, 640343474, tzinfo=<UTC>), 'modificationTime': neo4j.time.DateTime(2024, 4, 3, 17, 30, 12, 746903031, tzinfo=<UTC>), 'schema': {'graphProperties': {}, 'nodes': {'Airport': {}}, 'relationships': {'

# Closeness Centrality
https://neo4j.com/docs/graph-data-science/current/algorithms/closeness-centrality/

Closeness centrality is a way of detecting nodes that are able to spread information very efficiently through a graph.

The closeness centrality of a node measures its average farness (inverse distance) to all other nodes. Nodes with a high closeness score have the shortest distances to all other nodes.

For each node u, the Closeness Centrality algorithm calculates the sum of its distances to all other nodes, based on calculating the shortest paths between all pairs of nodes. The resulting sum is then inverted to determine the closeness centrality score for that n
ion".ode.

### Use-cases - when to use the Closeness Centrality algorithm
- Closeness centrality is used to research organizational networks, where individuals with high closeness centrality are in a favourable position to control and acquire vital information and resources within the organization. One such study is "Mapping Networks of Terrorist Cells" by Valdis E. Krebs.

- Closeness centrality can be interpreted as an estimated time of arrival of information flowing through telecommunications or package delivery networks where information flows through shortest paths to a predefined target. It can also be used in networks where information spreads through all shortest paths simultaneously, such as infection spreading through a social network. Find more details in "Centrality and network flow" by Stephen P. Borgatti.

- Closeness centrality has been used to estimate the importance of words in a document, based on a graph-based keyphrase extraction process. This process is described by Florian Boudin in "A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction".

In [149]:
r = graph.query("""CALL gds.graph.project(
  'Graph_Airports_CC',
  'Airport',
  'HAS_ROUTE')""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'relationshipProjection': {'HAS_ROUTE': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {}, 'type': 'HAS_ROUTE'}}, 'graphName': 'Graph_Airports_CC', 'nodeCount': 3503, 'relationshipCount': 92778, 'projectMillis': 31}]


In [150]:
r = graph.query("""CALL gds.closeness.stream('Graph_Airports_CC')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).descr AS descr, score
ORDER BY score DESC""")
print(r[:25])

[{'descr': 'Paris Charles de Gaulle', 'score': 0.4843442084519378}, {'descr': 'Frankfurt am Main', 'score': 0.48212728857890147}, {'descr': 'London Heathrow', 'score': 0.4760060253927265}, {'descr': 'Amsterdam Airport Schiphol', 'score': 0.47539222007307114}, {'descr': 'Los Angeles International Airport', 'score': 0.46804909013965296}, {'descr': 'Dubai International Airport', 'score': 0.46686365555086534}, {'descr': 'New York John F. Kennedy International Airport', 'score': 0.46656823454967306}, {'descr': 'Munich International Airport', 'score': 0.46656823454967306}, {'descr': "Chicago O'Hare International Airport", 'score': 0.4633431085043988}, {'descr': 'Beijing Capital International Airport', 'score': 0.4598752598752599}, {'descr': 'Toronto Pearson International Airport', 'score': 0.45977967158594885}, {'descr': 'Newark, Liberty', 'score': 0.45806585214330087}, {'descr': 'Istanbul International Airport', 'score': 0.45693038628382565}, {'descr': 'Adolfo Suarez Barajas Airport Interna

In [151]:
r = graph.query("""Call gds.graph.drop("Graph_Airports_CC")""")
print(r)

[{'graphName': 'Graph_Airports_CC', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 92778, 'configuration': {'relationshipProjection': {'HAS_ROUTE': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {}, 'type': 'HAS_ROUTE'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': 'e8dda7fd-192b-4c9e-bd91-d30ae3df6f0e', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 20, 27, 14, 749330454, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.007562906429391597, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 20, 27, 14, 749330454, tzinfo=<UTC>), 'modificationTime': neo4j.time.DateTime(2024, 4, 3, 20, 27, 14, 781587794, tzinfo=<UTC>), 'schema': {'graphProperties': {}, 'nodes': {'Airport': {}}, 'relationships': {'HAS_ROUTE': {}}},

# Degree Centrality
https://neo4j.com/docs/graph-data-science/current/algorithms/degree-centrality/

The Degree Centrality algorithm can be used to find popular nodes within a graph. Degree centrality measures the number of incoming or outgoing (or both) relationships from a node, depending on the orientation of a relationship projection.

The Degree Centrality algorithm has been shown to be useful in many different applications. For example:

Degree centrality is an important component of any attempt to determine the most important people in a social network. For example, in BrandWatch’s most influential men and women on Twitter 2017 the top 5 people in each category have over 40m followers each, which is a lot higher than the average degree.

Weighted degree centrality has been used to help separate fraudsters from legitimate users of an online auction. The weighted centrality for fraudsters is significantly higher because they tend to collude with each other to artificially increase the price of items. Read more in Two Step graph-based semi-supervised Learning for Online Auction Fraud Detection

In [157]:
r = graph.query("""CALL gds.graph.project(
  'Graph_Airports_DC',
  'Airport',
   {
    HAS_ROUTE: {
      orientation: 'REVERSE',
      properties: ['distance']
    }
  }
)""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'relationshipProjection': {'HAS_ROUTE': {'aggregation': 'DEFAULT', 'orientation': 'REVERSE', 'indexInverse': False, 'properties': {'distance': {'aggregation': 'DEFAULT', 'property': 'distance', 'defaultValue': None}}, 'type': 'HAS_ROUTE'}}, 'graphName': 'Graph_Airports_DC', 'nodeCount': 3503, 'relationshipCount': 92778, 'projectMillis': 142}]


In [153]:
r = graph.query("""CALL gds.degree.write.estimate('Graph_Airports_DC', { writeProperty: 'degree' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory""")
print(r)

[{'nodeCount': 3503, 'relationshipCount': 92778, 'bytesMin': 64, 'bytesMax': 64, 'requiredMemory': '64 Bytes'}]


In [154]:
# Unweigthed
r = graph.query("""CALL gds.degree.stream('Graph_Airports_DC')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).descr AS name, score AS score
ORDER BY score DESC, name DESC""")
print(r[:10])

[{'name': 'Frankfurt am Main', 'score': 606.0}, {'name': 'Paris Charles de Gaulle', 'score': 582.0}, {'name': 'Amsterdam Airport Schiphol', 'score': 560.0}, {'name': 'Istanbul International Airport', 'score': 536.0}, {'name': 'Munich International Airport', 'score': 530.0}, {'name': "Chicago O'Hare International Airport", 'score': 514.0}, {'name': 'Dallas/Fort Worth International Airport', 'score': 496.0}, {'name': 'Hartsfield - Jackson Atlanta International Airport', 'score': 484.0}, {'name': 'Dubai International Airport', 'score': 474.0}, {'name': 'London Gatwick', 'score': 452.0}]


In [115]:
# Weighted Degree Centrality example
r = graph.query("""CALL gds.degree.stream('Graph_Airports_DC', { relationshipWeightProperty: 'distance' })
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).descr AS name, score AS score
ORDER BY score DESC, name DESC""")
print(r[:10])

[{'name': 'Paris Charles de Gaulle', 'score': 689778.0}, {'name': 'Frankfurt am Main', 'score': 684595.0}, {'name': 'Dubai International Airport', 'score': 635287.0}, {'name': 'Los Angeles International Airport', 'score': 610727.0}, {'name': 'London Heathrow', 'score': 600022.0}, {'name': 'Amsterdam Airport Schiphol', 'score': 596370.0}, {'name': 'New York John F. Kennedy International Airport', 'score': 562902.0}, {'name': 'Beijing Capital International Airport', 'score': 528600.0}, {'name': 'Istanbul International Airport', 'score': 513494.0}, {'name': 'Doha, Hamad International Airport', 'score': 485087.0}]


In [156]:
r = graph.query("""Call gds.graph.drop("Graph_Airports_DC")""")
print(r)

[{'graphName': 'Graph_Airports_DC', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 92778, 'configuration': {'relationshipProjection': {'HAS_ROUTE': {'aggregation': 'DEFAULT', 'orientation': 'REVERSE', 'indexInverse': False, 'properties': {'distance': {'aggregation': 'DEFAULT', 'property': 'distance', 'defaultValue': None}}, 'type': 'HAS_ROUTE'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': '670fd800-43af-4225-8f37-f2adf3565e94', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 20, 28, 41, 8776300, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.007562906429391597, 'creationTime': neo4j.time.DateTime(2024, 4, 3, 20, 28, 41, 8776300, tzinfo=<UTC>), 'modificationTime': neo4j.time.DateTime(2024, 4, 3, 20, 28, 41, 146646496, tzinfo=<UTC>), 'schema': {'grap

# Eigenvector Centrality

https://neo4j.com/docs/graph-data-science/current/algorithms/eigenvector-centrality/

Eigenvector Centrality is an algorithm that measures the transitive influence of nodes. Relationships originating from high-scoring nodes contribute more to the score of a node than connections from low-scoring nodes. A high eigenvector score means that a node is connected to many nodes who themselves have high scores.

    

In [117]:
r = graph.query("""CALL gds.graph.project(
  'Graph_Airports_EC',
  'Airport',
  'HAS_ROUTE_WEIGHT',
  {relationshipProperties: 'weight'}
)""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'graphName': 'Graph_Airports_EC', 'nodeCount': 3503, 'relationshipCount': 46389, 'projectMillis': 143}]


In [118]:
r = graph.query("""CALL gds.eigenvector.write.estimate('Graph_Airports_EC', {
  writeProperty: 'centrality',
  maxIterations: 20
})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory""")
print(r)

[{'nodeCount': 3503, 'relationshipCount': 46389, 'bytesMin': 85336, 'bytesMax': 85336, 'requiredMemory': '83 KiB'}]


In [119]:
r = graph.query("""CALL gds.eigenvector.stream('Graph_Airports_EC')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).descr AS name, score
ORDER BY score DESC, name ASC""")
print(r[:15])

[{'name': 'Frankfurt am Main', 'score': 0.1495017186916459}, {'name': 'Amsterdam Airport Schiphol', 'score': 0.1441701063262446}, {'name': 'Paris Charles de Gaulle', 'score': 0.14230832410157165}, {'name': 'Munich International Airport', 'score': 0.14145049423228861}, {'name': 'Leonardo da Vinci-Fiumicino International Airport', 'score': 0.12035136383222038}, {'name': 'Barcelona International Airport', 'score': 0.11995566418115934}, {'name': 'London Heathrow', 'score': 0.11936862692715622}, {'name': 'Adolfo Suarez Barajas Airport International Airport', 'score': 0.11933953567783778}, {'name': 'Vienna International Airport', 'score': 0.11921058374435377}, {'name': 'Manchester Airport', 'score': 0.11776363194734268}, {'name': 'London Gatwick', 'score': 0.11745830797344789}, {'name': 'Istanbul International Airport', 'score': 0.1153900648713748}, {'name': 'Copenhagen Kastrup Airport', 'score': 0.11524796935424696}, {'name': 'Berlin Brandenburg International Airport', 'score': 0.1144834155