### Welcome to the Data Creation Notebook

`Author:` Victor RADERMECKER, Marco ANTONIOLI

`Date`: Friday, October 14, 2022.

For the data creation of the project, we will use the `Python` programming language. The `Julia` language will be use for the optimization part.

*Undergraduates:*

- The data about residences can be found here:
https://studentlife.mit.edu/housing/undergraduate-housing/residence-halls
- Capacities are aviailable here: 
https://mitguidetoresidences.mit.edu/housing-grid

*Graduates:*
- The data about residences can be found here: 
https://studentlife.mit.edu/grad-residences
- Capacities are available here:



In [57]:
import pandas as pd
import plotly.express as px
import random

# remove warnings
import warnings
warnings.filterwarnings('ignore')

In [58]:
# read origins and destinatinos
origins = pd.read_csv('../data/origins.csv')
destinations = pd.read_csv('../data/destinations.csv')
origins

Unnamed: 0,Name,Long,Lat
0,Baker House,-71.095697,42.356768
1,Burton Conner,-71.097894,42.356145
2,East Campus,-71.088397,42.360036
3,MacGregor House,-71.099563,42.355645
4,Maseeh Hall,-71.093448,42.357843
5,McCormick Hall,-71.094512,42.35752
6,New House,-71.100627,42.355291
7,Next House,-71.101894,42.35497
8,New Vassar,-71.097559,42.359233
9,Random Hall,-71.098345,42.362004


### Data Generation

We need to generate the demand. How can we do that? The demand is modeled by Origin-Destination (OD) pairs on time-frames between 7am and 10am. The origin is always one of the MIT residences. For the destination, we will selecte specific points on MIT's campus. These points can be:

- Another MIT Residence
- An MIT building
- An MIT Shuttle stop
- Important MIT buildings (To selecte manually)

### Now, we need to compute the distances between each OD pair. 

This is done using the BingMaps API.

In [59]:
times = pd.read_csv("../data/origin_destination_times.csv")
times.head()

Unnamed: 0,Name,Kendall Square T,Wadsworth at Amherst,Media Lab,Media Lab at Ames,Amherst at Kresge,Burton/MacGregor,Tang/Westgate,W98 at Vassar St,W92 at Amesbury St,...,Veterans Memorial Pool,Great Dome,MIT Sloan School of Management,MIT Medical Urgent Care Service,Walker Memorial,MIT Department of Mathematics,MIT Sailing Pavilion,The Muddy Charles Pub,Tudor Dog Park,MIT Plasma Science and Fusion Center
0,Baker House,7.6167,7.7833,8.5167,8.35,6.8,3.2833,4.2667,3.3333,3.4833,...,5.0333,7.65,8.4167,10.15,8.6167,6.9167,6.7833,8.35,6.1833,7.1833
1,Burton Conner,8.0167,8.1833,8.9167,8.7667,7.2,1.7667,2.7333,3.75,3.8833,...,5.4333,8.0667,8.8167,10.5667,9.0167,7.3167,7.2,8.7667,6.5833,7.5833
2,East Campus,1.85,1.85,0.6,0.7667,2.9833,3.2833,5.1667,4.1167,4.25,...,5.8333,4.75,3.3333,2.15,0.2333,1.2333,4.4833,0.7667,7.25,5.0167
3,MacGregor House,7.9667,8.1333,8.8667,8.7,7.15,0.1167,0.9667,3.7,3.8333,...,5.3833,8.0,8.7667,10.5,8.9667,7.2667,7.15,8.7,6.5333,7.5333
4,Maseeh Hall,3.9333,3.2333,3.9667,4.2,0.2667,2.65,4.55,3.5,3.6333,...,5.2167,3.0167,3.8667,5.6167,4.6,2.4167,2.3,4.2,5.5167,3.2833


## Generate the data

Time windows are defined as follows:

- From 7am to 11am, in minutes.
- We will draw the desired departure time from a Gaussian distribution, centered at 9am (minutes = 120), with a standard deviation of 30 minutes.
- We consider that the shuttle can take max 10 minutes to arrive. (-5 + 5 minutes of desired time).
- We consider that the trip cannot take more than three the driving distance.


In [60]:
# get a gaussian distribution centered on 120 with a standard deviation of 30
def get_time():
    return random.gauss(120, 30)

# plot a gaussian distribution of mean 120 and  standard deviation of 30
fig = px.histogram(x=[get_time() for i in range(10000)])
fig.show()


In [61]:
# We want to generate 100 origin-destination pairs
N = 30
origins_names = times["Name"]
origins_names = origins_names[origins_names != "Depot"]

destinations_names = times.iloc[:, 1:].columns
destinations_names = destinations_names[destinations_names != "Depot"]

depot_coords = [42.36381606868144, -71.0885840857672]

# Initialize a dataframe with columns names: passenger	loc_x	loc_y	load	service_duration	pick_time	pick_time_down	pick_time_up
pick_ups = pd.DataFrame(
    columns=[
        "passenger",
        "loc_x",
        "loc_y",
        "load",
        "service_duration",
        "pick_time",
        "pick_time_down",
        "pick_time_up",
        "name"
    ]
)
drop_offs = pick_ups.copy(deep=True)
pick_ups.loc[0] = [int(0), depot_coords[0], depot_coords[1], 0, 0.0, 0, 0, 1000, "Depot"]

for i in range(1, N):

    # get a random element of the origins list
    orr = random.choice(origins_names)
    des = random.choice(destinations_names)

    # generate a random number between 1 and 3 with highest probability for 3
    num_passengers = random.choices([1, 2, 3], weights=[0.9, 0.08, 0.02], k=1)[0]

    orr_lat = origins[origins["Name"] == orr]["Lat"].values[0]
    orr_lon = origins[origins["Name"] == orr]["Long"].values[0]

    des_lat = destinations[destinations["Name"] == des]["Lat"].values[0]
    des_lon = destinations[destinations["Name"] == des]["Long"].values[0]

    pick_time = get_time()
    pick_time_down = pick_time - 15
    pick_time_up = pick_time + 15

    trip_length = times[times["Name"] == orr][des].values[0]
    drop_time = pick_time + 5 * trip_length

    # add the row to the demand dataframe as
    pick_ups = pick_ups.append(
        {
            "passenger": int(i),
            "loc_x": orr_lat,
            "loc_y": orr_lon,
            "load": num_passengers,
            "service_duration": 0.5,
            "pick_time": pick_time,
            "pick_time_down": pick_time_down,
            "pick_time_up": pick_time_up,
            "name": orr,
        },
        ignore_index=True,
    )

    drop_offs = drop_offs.append(
        {
            "passenger": int(i),
            "loc_x": des_lat,
            "loc_y": des_lon,
            "load": -num_passengers,
            "service_duration": 0.5,
            "pick_time": pick_time + trip_length,
            "pick_time_down": pick_time,
            "pick_time_up": pick_time + 3 * trip_length,
            "name":des,
        },
        ignore_index=True,
    )

# concatenate the pick ups and drop offs
demand = pd.concat([pick_ups, drop_offs], ignore_index=True)

# add a lost row to demand
demand = demand.append(
    {
        "passenger": int(0),
        "loc_x": depot_coords[0], 
        "loc_y": depot_coords[1],
        "load": 0,
        "service_duration": 0.0,
        "pick_time": 0,
        "pick_time_down": 0,
        "pick_time_up": 1000,
        "name":"Depot"
    },
    ignore_index=True,
)

In [62]:
# save demand to CSV
demand.to_csv("../data/demand.csv", index=False)

In [63]:
origins_names

0             Baker House
1           Burton Conner
2             East Campus
3         MacGregor House
4             Maseeh Hall
5          McCormick Hall
6               New House
7              Next House
8              New Vassar
9             Random Hall
10      70 Amherst Street
11          Ashdown House
12         Edgerton House
13         Graduate Tower
14         Sidney-Pacific
15              Tang Hall
16          The Warehouse
17    Westgate Apartments
Name: Name, dtype: object