Skip to content

Code to collect and analyze traceroute data within a network topology

License

Notifications You must be signed in to change notification settings

zaratsian/network_topology_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Network Topology Analysis

This repo contains the code used to collect network topology data (using a traceroute script), the Apache Spark code used for real-time analysis, and the Zeppelin notebook used for data visualization.

Background:

A network topology consists of many nodes (or hosts) and edges (connections) that link each of the nodes. In communication systems, there are typically many routes that we can take to get from Point A to Point B.

For example, if you are on your home wifi (Point A) and you request a webpage from Google.com (Point B), then your request will be relayed through many hosts along the route. Each time you make the request, a slightly different path may be used based on how the network is optimized, timeouts, failed nodes, etc.

For communication providers, these failed nodes create a problem. Isolating and resolving the issue is critical, since a failed node reduces performance, may cause downtime, cost money if a truck or person needs to manual troubleshoot a node, etc.

This example focuses on a network topology for telecom, but the process and technology can be extended to any use case that involves a topology or hierarchy of information that needs to be analyzed in real-time.

Technology Stack:

Apache Kafka: Streams in the real-time health status of each device in the network topology.
Apache Spark: Spark Streaming was used to process and analyze the device health status in real-time. I also maintained the device state, or current health status, of all devices using mapWithState so that recompute and roll-ups / aggregations could be performed quickly on the real-time stream.
Apache HBase: The NoSQL distributed database, where all health status values are stored. This enables real-time read/write access to large database tables.
Apache Phoenix: Phoenix is the SQL interface to HBase, allowing SQL syntax to be used on top of the NoSQL DB.
Apache Zeppelin: Browser-based code editor and visualization tool (see screenshots below). Zeppelin has many interpreters, or code plugins, that enable a variety of languages/protocols to be used. These include Python, Spark, HBase, JDBC, Hive, Angular, etc. For this example, Angular was used to produce the Google Maps, by leveraging Zeppelin's front-end Angular API and some tricks (thanks Randerzander) used to bind several backend JS variables to a globally accessible object.

To run this project:

1. Clone this repo
2. Navigate to the docker directory
3. Execute ./run.sh (You'll need to have Docker installed on your machine)
4. Enter the Zeppelin container bash (docker exec -it zeppelin bash)
5. cd SparkNetworkAnalysis
6. Build the Spark streaming project (/apache-maven-3.3.9/bin/mvn clean package)
7. Start the Spark streaming project (/spark/bin/spark-submit --master local[*] --class "SparkNetworkAnalysis" --jars /phoenix-spark-4.8.1-HBase-1.1.jar target/SparkStreaming-0.0.1.jar phoenix.dev:2181 mytestgroup dztopic1 1 kafka.dev:9092)
8. Start the Kafka stream, which will simulate the heath status for each device (docker exec kafka python stream_kafka.py)
9. View the results as a Google Map within Zeppelin (also run interactive queries on data stored in HBase, via Phoenix)
   • Open your browser and go to http://localhost:8079/
   • Select the "Dashboard" notebook
   • Run the notebook, and enter in new IP addresses (POI) as desired.

Screenshot #1: Zeppelin notebook screenshot showing the user-input, where IP addresses (or points of interest) can be entered within the Zeppelin. This input is fed into a Spark job that fetches the data from HBase, performs data processing, then feeds the results to angular where it is rendered within Google Maps.




Screenshot #2: Zeppelin notebook screenshot showing the IP traceroute from my home wifi in Raleigh to Google.com servers (in Mountain View, CA).



References:
Apache Zeppelin - Angular (front-end API)
Apache Zeppelin - Angular (back-end API)
Randerzander's Data Apps in Zeppelin
Apache Spark - mapWithState

About

Code to collect and analyze traceroute data within a network topology

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published