# Station Ranks

We have seen how the PageRank algorithm works on the trivial problem of identification of a celebrity from a list of people invited to a party. Its now time to put our PageRank algorithm through a real task. The task of obtaining a ranking of all the railway stations of India, based on the amount of traffic flowing through them.

![](assets/Images/railway_station.jpg)

There are more than 8000 railway stations in India. Over 10000 trains pass through these railawy stations to form a large connected network. Our task requires us to rank all the railway stations of India in the order of how busy they are throughout the year.

As you might have obsereved, given such a large sized network, consisting of numerous trains and railway stations, it might prove difficult to assign a metric of ranking to each railway staton which defines how busy it remains throughout the year.
Luckily for us, the PageRank algorithm is quite capable of solving such complex ranking problems, and can provide us with a pretty good ranking of all the train stations in India.

### Importing datasets

To build our network, we make use of two datasets given to us in the form of csv files, <b>trains.csv</b> and <b>stations.csv</b>. These dataset were made available to us courtesy of <a href="https://github.com/ayushdubey003/">Ayush Dubey</a>.

In [1]:
import numpy as np
import pandas as pd

stations_dataset = pd.read_csv('assets/Datasets/stations.csv')
trains_dataset = pd.read_csv('assets/Datasets/trains.csv')

Pre-viewing stations datatset

In [2]:
stations_dataset.head()

Unnamed: 0,Station Names,Station Codes,Latitude,Longitude
0,A N DEV NAGAR,ACND,26.7753° N,82.1575° E
1,ABADA,ABB,22.5488° N,88.2035° E
2,ABHAIPUR,AHA,25.2167° N,86.3206° E
3,ABHANPUR JN,AVP,21.0529° N,81.7441° E
4,ABHAYAPURI ASAM,AYU,26.3589° N,90.6485° E


Pre-viewing trains dataset

In [3]:
trains_dataset.iloc[:, :6].head()

Unnamed: 0,Train Number,Train Name,Running Days,Available Classes,Type,Zone
0,12723,TELANGANA EXP,Daily,1A 2A 3A SL GN,Super Fast,SCR
1,22416,A P EXP,Daily,1A 2A 3A,Super Fast,NR
2,12724,TELANGANA EXP,Daily,1A 2A 3A SL GN,Super Fast,SCR
3,12707,A P SMPRK KRNTI,MON WED FRI,2A 3A SL GN,Super Fast,SCR
4,54582,DLPC NLDM PASS,Daily,UNRESERVED,Passenger,NR


### Building our railway network

We first need to list all the stations in our railway network.

In [4]:
station_table = stations_dataset.values
train_table = trains_dataset.values

station_mapping = {}
for i in range(np.size(station_table, axis=0)):
    station_mapping[station_table[i, 1]] = {'name': station_table[i, 0]}

Next, we will build our railway network by using all the train routes.

In [5]:
edges = []
for i in range(np.size(train_table, axis=0)):
    station_list = train_table[i, 7][:-1].split('$')
    for i in range(len(station_list)):
        if station_list[i] not in station_mapping:
            continue
        else:   
            for j in range(i + 1, len(station_list)):
                if station_list[j] not in station_mapping:
                    continue
                else:
                    edges.append((station_list[i], station_list[j], 1))
print(len(edges))

2424929


### Modeling our graph

In [6]:
import sys
sys.path.append('../Implementation')
from graph import Graph

graph = Graph(list(station_mapping.keys()), edges)

### Getting top 100 Rankings

In [7]:
ranks = graph.rank(50, 0.85)
rank_list = sorted([(vertex, ranks[vertex]) for vertex in ranks], key=lambda x: -x[1])
for vertex, rank in rank_list[:100]:
    print(station_mapping[vertex]['name'], ":", round(rank, 5))

HOWRAH JN : 0.00166
VIJAYAWADA JN : 0.00144
KALYAN JN : 0.00131
KANPUR CENTRAL : 0.00131
ITARSI JN : 0.00126
LUCKNOW NR : 0.00122
AHMEDABAD JN : 0.00119
VADODARA JN : 0.00119
AMBALA CANT JN : 0.00117
THANE : 0.00114
GHAZIABAD : 0.00113
SURAT : 0.00111
LUDHIANA JN : 0.00109
JHANSI JN : 0.00107
BHOPAL  JN : 0.00104
MORADABAD : 0.00104
MATHURA JN : 0.00102
DELHI : 0.00102
VARANASI JN : 0.00101
DD UPADHYAYA JN : 0.00101
BHUSAVAL JN : 0.001
BARDDHAMAN JN : 0.00098
DADAR : 0.00098
H NIZAMUDDIN : 0.00095
NEW DELHI : 0.00095
ANAND JN : 0.00093
GORAKHPUR JN : 0.00092
ASANSOL JN : 0.00092
VISAKHAPATNAM : 0.00091
NAGPUR : 0.00091
SEALDAH : 0.0009
BORIVALI : 0.00089
PATNA JN : 0.00087
KHURDA ROAD JN : 0.00086
JAIPUR : 0.00084
KATPADI JN : 0.00084
BAREILLY : 0.00084
MANMAD JN : 0.00084
C SHIVAJI MAH T : 0.00084
BINA JN : 0.00083
BHUBANESWAR : 0.00083
KHARAGPUR JN : 0.00083
SALEM JN : 0.00083
PUNE JN : 0.00083
GWALIOR : 0.00082
AGRA CANTT : 0.00081
THRISUR : 0.00079
ERODE JN : 0.00079
BANDEL JN : 0.

As we can see, the PageRank algorithm does a pretty decent job of ranking the railway stations. It ranks <b>Howrah Jn</b> as the busiest railway station of India, closely followed by stations like <b>Kanpur Central</b>, <b>Kalyan Jn</b>, <b>Vijaywada Jn</b>, <b>Itarsi Jn</b>, <b>Ahemdabad Jn</b>, <b>Vadodara Jn</b> and <b>Lucknow NR</b>. These predictions seem to be pretty reasonable, and infact align very nicely with the rankings provided by this [website](http://www.walkthroughindia.com/walkthroughs/trains/top-12-busiest-railway-stations-india/).

### Making PageRank specific for our problem

Originating and terminating stations of trains are more likely to be important stations. Further, unreserved trains are more likely to pass through less important, relatively smaller stations, and less likely to pass through a bunch of important stations. On the other hand, reserved trains are much more likely to pass through important stations. The PageRank algorithm cannot capture all these properties of our data on its own. However, we can model our graph of the railway network in a specific way, such that it can incorporate all of this information, so that our PageRank algorithm can provide us with better results.

Specifically, we can add self loop edges with relatively larger weights to mark originating and terminating stations of trains. Further, we can also add extra multipliers to the weights of our graph, depending on whether the train we are considering for adding edges to our graph is unreserved or not.  

#### Modeling new graph

In [8]:
edges = []
for i in range(np.size(train_table, axis=0)):
    station_list = train_table[i, 7][:-1].split('$')
    for i in range(len(station_list)):
        if station_list[i] not in station_mapping:
            continue
        else:
            if i == 0 or i == len(station_list) - 1:
                if train_table[i, 3] != 'UNRESERVED':
                    edges.append((station_list[i], station_list[i], 10 * 10))
                else:
                    edges.append((station_list[i], station_list[i], 10 * 2))
            for j in range(i + 1, len(station_list)):
                if station_list[j] not in station_mapping:
                    continue
                else:
                    if train_table[i, 3] != 'UNRESERVED':
                        edges.append((station_list[i], station_list[j], 1 * 2))
                    else:
                        edges.append((station_list[i], station_list[j], 1 * 1))

#### Getting new rankings

In [9]:
graph = Graph(list(station_mapping.keys()), edges)
ranks = graph.rank(50, 0.85)
rank_list = sorted([(vertex, ranks[vertex]) for vertex in ranks], key=lambda x: -x[1])
for vertex, rank in rank_list[:100]:
    print(station_mapping[vertex]['name'], ":", round(rank, 10))

HOWRAH JN : 0.0032299914
NEW DELHI : 0.0019528751
SEALDAH : 0.0019370082
THANE : 0.0018996189
DELHI : 0.0018904029
VIJAYAWADA JN : 0.001728963
AHMEDABAD JN : 0.001698128
KANPUR CENTRAL : 0.0016731561
H NIZAMUDDIN : 0.0016620263
LUCKNOW NR : 0.0016309597
C SHIVAJI MAH T : 0.0015898468
KALYAN JN : 0.0015394498
PUNE JN : 0.0015036637
GORAKHPUR JN : 0.001480323
BORIVALI : 0.0014553636
AMBALA CANT JN : 0.0014540411
PATNA JN : 0.001433949
BARDDHAMAN JN : 0.0014216573
VISAKHAPATNAM : 0.0014024297
VARANASI JN : 0.0013835786
SECUNDERABAD JN : 0.0013380281
BANDEL JN : 0.0013250892
ASANSOL JN : 0.0013169761
KSR BENGALURU : 0.0012933683
AMRITSAR JN : 0.0012816097
YESVANTPUR JN : 0.0012684241
VADODARA JN : 0.0012499792
MORADABAD : 0.0012385353
NAINPUR JN : 0.0012338774
MGR CHENNAI CTL : 0.0012255331
LUDHIANA JN : 0.0012160656
JAIPUR : 0.0012070135
ITARSI JN : 0.0011942137
ANAND JN : 0.0011781346
JHANSI JN : 0.0011696259
GHAZIABAD : 0.0011598021
MATHURA JN : 0.0011517399
SURAT : 0.0011370754
BHOPAL 

<i>Indeed, these set of rankings are better than the previous set. New Delhi, a pretty important railway station of India, appeared much lower in the previous set of rankings. This is rectified by our current set of rankings.