# Label Correcting Algorithm

*Feb, 2017 by Kanglong Wu*

In this report, two datasets are involved:
- [**/data/Label-correcting-test.csv**](https://github.com/wklchris/Reports/blob/master/data/Label-correcting-test.csv): Test data. Based on a small network.
- [**/data/Label-correcting-data.csv**](https://github.com/wklchris/Reports/blob/master/data/Label-correcting-data.csv): Data based on a relatively larger network. *(Data Source: anaheim.xlsx)*

I use Python 3 to implement this algorithm, and [the Python code file can be accessed here](https://github.com/wklchris/Reports/blob/master/Label-correcting-algo.py). To ensure the code can be run on another computer, please install:
- Python 3
- Packages: Numpy and Pandas

## Import data

In the test dataset, we only need to know:

- Start node of each link
- End node of each link
- Cost of each link

**Costs are constant** under this test. So we take "Free Flow Travel Time" as costs of links.

In [1]:
import os
import numpy as np
import pandas as pd

dt_raw = pd.read_csv(r"{}/data/Label-correcting-test.csv".format(os.getcwd()))
dt_raw.head()

Unnamed: 0,start,end,Capacity,Length (ft),Speed (ft/min),Free Flow Travel Time (min),Free Flow Speed (mph)
0,a,b,0,0,0,3,0
1,a,c,0,0,0,8,0
2,a,d,0,0,0,5,0
3,b,c,0,0,0,5,0
4,b,f,0,0,0,7,0


In [2]:
# Delete unnecessary variables (columns)
dt_test = dt_raw.loc[:,["start", "end", "Free Flow Travel Time (min)"]]
dt_test.rename(columns={"Free Flow Travel Time (min)": "cost"}, inplace=True)
dt_test.head()

Unnamed: 0,start,end,cost
0,a,b,3
1,a,c,8
2,a,d,5
3,b,c,5
4,b,f,7


In [3]:
dt_test.shape # Number of rows and columns

(18, 3)

## Implement the Algorithm

Then we should define a function to implement the label correcting algorithm.

In [4]:
def label_correcting_algo(dt, ori_node, des_node, do_return=False):
    """
    Find the shortest path from Origin to Destination under a constant-link-costs network.
    
    Args:
        dt: Network representation. At least 3 columns:
              "start": start nodes of links
              "end": end nodes of links
              "cost": constant costs of links
        ori_node: Origin node.
        des_node: Destination node.
        do_return: Boolean. 
            If True, Return a dataframe as described below.
            If False, Return the shortest path string instead.
        
    Returns:
        A dataframe of two columns:
            "Front-Node": The node visited before current node on the shortest path.
            "Distance": Total distance from origin to current node.
    """
    # Convert all labels to string
    ori = str(ori_node)
    des = str(des_node)
    dt[["start", "end"]] = dt[["start", "end"]].astype(str) 
    
    # Initialization
    nodes = set(dt.loc[:,"start"].unique()) | set(dt.loc[:,"end"].unique())
    dist = {}.fromkeys(nodes, np.inf)
    dist[ori] = 0
    points = {}.fromkeys(nodes, ori)
    iter_set = {ori}
    
    # Main Algo
    while iter_set:
        i = iter_set.pop()  # Randomly pop out a node i
        A_i = dt[dt.start == i]
        for row in A_i.index: 
            j = A_i.loc[:, "end"][row]
            c_ij = A_i.loc[:, "cost"][row]
            if dist[j] > dist[i] + c_ij:
                dist[j] = dist[i] + c_ij
                points[j] = i
                iter_set = iter_set | set([j])  # Union
    
    # Print & Return the Answer
    x = pd.concat([pd.Series(points), pd.Series(dist)], axis=1)
    x.columns = ["Front-node", "Distance"]

    current_node = des
    front_node = ""
    sp = des
    while front_node != ori:
        front_node = str(x.loc[current_node, "Front-node"])
        sp = "{} -> {}".format(front_node, sp)
        current_node = front_node
    
    sp = "From node {} to node {}, total Distance: {}\n{}\n".format(ori, des, x.loc[des, "Distance"], sp)
    if do_return:
        print(sp)
        return x
    else:
        return sp

Use this function to compute the shortest path between node $a$ and node $j$:

In [5]:
label_correcting_algo(dt_test, "a", "j", do_return=True)

From node a to node j, total Distance: 18
a -> b -> f -> h -> j



Unnamed: 0,Front-node,Distance
a,a,0
b,a,3
c,a,8
d,a,5
e,f,15
f,b,10
g,d,9
h,f,16
i,g,13
j,h,18


## Apply to a complex network

Aforementioned result is just a simple check for algorithm implementation.

In [6]:
dt_raw = pd.read_csv(r"{}/data/Label-correcting-data.csv".format(os.getcwd()))
dt_app = dt_raw.loc[:,["start", "end", "Free Flow Travel Time (min)"]]
dt_app.rename(columns={"Free Flow Travel Time (min)": "cost"}, inplace=True)
dt_app.head()

Unnamed: 0,start,end,cost
0,416,300,1.09
1,415,330,1.09
2,414,343,1.09
3,413,184,1.09
4,412,252,1.09


Assume that we have 4 nodes: origin is picked from the first two nodes, destination is picked from the last two nodes.

- Node 1
- Node 2
- Node 411
- Node 300

Compute these four O-D pairs:

In [7]:
app = []
app.append(label_correcting_algo(dt_app, 1, 411))
app.append(label_correcting_algo(dt_app, 1, 300))
app.append(label_correcting_algo(dt_app, 2, 411))
app.append(label_correcting_algo(dt_app, 2, 300))

So our final results are as following:

In [8]:
print("\n".join(app))

From node 1 to node 411, total Distance: 6.26
1 -> 10 -> 9 -> 8 -> 250 -> 251 -> 411

From node 1 to node 300, total Distance: 13.93
1 -> 10 -> 9 -> 206 -> 207 -> 208 -> 25 -> 24 -> 23 -> 381 -> 39 -> 56 -> 384 -> 80 -> 388 -> 109 -> 122 -> 123 -> 124 -> 328 -> 329 -> 416 -> 300

From node 2 to node 411, total Distance: 6.56
2 -> 11 -> 379 -> 10 -> 9 -> 8 -> 250 -> 251 -> 411

From node 2 to node 300, total Distance: 14.23
2 -> 11 -> 379 -> 10 -> 9 -> 206 -> 207 -> 208 -> 25 -> 24 -> 23 -> 381 -> 39 -> 56 -> 384 -> 80 -> 388 -> 109 -> 122 -> 123 -> 124 -> 328 -> 329 -> 416 -> 300

