### Importing Libraries

We start by importing `pandas` for data manipulation and `KeplerGl` for geospatial visualization.


In [2]:
# 1. Import necessary libraries
import pandas as pd
from keplergl import KeplerGl
import warnings
warnings.filterwarnings("ignore", category=UserWarning)


### Loading Trip Data

We load the cleaned CitiBike trip sample data collected during the year 2022. This includes station names, coordinates, and trip timestamps.


In [3]:
# 2. Load CitiBike trip data
df = pd.read_csv('/Users/muhammaddildar/Desktop/citibike_2022_dashboard/citibike_trip_sample.csv')
df.head()


Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,AD280D4AE55D3506,electric_bike,2022-06-17 17:32:55.309,2022-06-17 17:45:46.076,E 47 St & 2 Ave,6498.1,E 2 St & Avenue C,5476.03,40.753231,-73.970325,40.720874,-73.980858,member
1,734318BA808A46DC,electric_bike,2022-09-20 17:04:00.975,2022-09-20 17:18:40.884,Monroe St & Bedford Ave,4368.05,Wythe Ave & Metropolitan Ave,5348.02,40.685129,-73.953813,40.716887,-73.963198,member
2,DE53B4E2A0F3A27A,classic_bike,2022-10-20 19:05:14.263,2022-10-20 19:12:11.338,8 Ave & W 38 St,6526.05,W 35 St & Dyer Ave,6569.08,40.75461,-73.99177,40.754692,-73.997402,member
3,E39D5C4183A3403C,electric_bike,2022-02-03 17:04:12.668,2022-02-03 17:13:41.827,E 84 St & Park Ave,7243.04,Columbus Ave & W 95 St,7520.07,40.778627,-73.957721,40.791956,-73.968087,member
4,4C7D7975092F14F7,electric_bike,2022-03-15 12:47:17.204,2022-03-15 12:54:00.503,Greenwich St & Hubert St,5470.1,Centre St & Chambers St,5207.01,40.721319,-74.010065,40.712733,-74.004607,member


### Aggregating Route Data

To analyze trips between stations, we create a new column with a constant value of `1`.  
Then we group the data by start and end station coordinates to calculate the number of trips between each pair of stations.


In [4]:
# 3. Add a column with value 1 to count trips
df['trip'] = 1

# 4. Aggregate by route: start and end station + coordinates
df_routes = df.groupby(
    ['start_station_name', 'end_station_name', 'start_lat', 'start_lng', 'end_lat', 'end_lng']
)['trip'].count().reset_index(name='trip_count')

df_routes.head()


Unnamed: 0,start_station_name,end_station_name,start_lat,start_lng,end_lat,end_lng,trip_count
0,1 Ave & E 110 St,2 Ave & E 104 St,40.792327,-73.9383,40.789211,-73.943708,1
1,1 Ave & E 110 St,E 106 St & Madison Ave,40.792327,-73.9383,40.793434,-73.94945,1
2,1 Ave & E 110 St,E 114 St & 1 Ave,40.792327,-73.9383,40.794566,-73.936254,1
3,1 Ave & E 110 St,E 91 St & Park Ave,40.792327,-73.9383,40.783502,-73.955327,1
4,1 Ave & E 110 St,Lenox Ave & W 130 St,40.792428,-73.938206,40.810792,-73.943068,1


### Initializing Kepler.gl Map

We create a KeplerGl map and add the aggregated route data to it.  
This will allow us to visualize bike trip flows across New York City using arc layers.


In [8]:
# 5. Initialize Kepler map
map_routes = KeplerGl(height=700)

# 6. Add route data to the map
map_routes.add_data(data=df_routes, name='Trip Routes')
map_routes


User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(data={'Trip Routes': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,…

### Map Customization: Arc Layer & Colors

In the Kepler.gl interface, I added an **Arc Layer** to visualize the flow of trips between stations.  
Settings used:

- **Source Position**: `start_lat`, `start_lng`
- **Target Position**: `end_lat`, `end_lng`
- **Color scale**: `trip_count` (orange-red palette)
- **Tooltip**: Enabled to show station names and trip count
- **Filter**: Added slider on `trip_count` to isolate high-volume routes

These settings help identify the most popular bike routes and commuting patterns in NYC.


### Observations and Insights

By applying the `trip_count` filter, the most common trips occur in **Midtown Manhattan**, near transportation hubs like Penn Station and Grand Central.  
Other busy zones include:

- Downtown Brooklyn and Lower Manhattan
- Central Park area during weekends

This reflects known commuter and tourism patterns in NYC.  
According to NYC Open Data, weekday rush hours are the busiest for CitiBike usage.


In [6]:
# 7. Save config file
config = map_routes.config
import json
with open('kepler_trip_routes_config.json', 'w') as f:
    json.dump(config, f)

# 8. Export map as HTML
map_routes.save_to_html(file_name='citibike_trip_routes.html')


Map saved to citibike_trip_routes.html!


### Saving Map Output

We save the customized map's configuration as a `.json` file and export the entire interactive map as an `.html` file to be included in the project submission and dashboard.


### Final Submission Notes

Before pushing to GitHub, large files such as the dataset (over 25 MB) were removed from the folder to comply with GitHub's file size limit.

Files submitted:
- This notebook (`kepler_trip_routes_dashboard.ipynb`)
- Map HTML file (`citibike_trip_routes.html`)
- Config JSON file (`kepler_trip_routes_config.json`)
- GitHub link: *[insert your repo URL here]*

Mentor can open the HTML file directly in browser to view the interactive Kepler map.
