### Explore the GTFS data in Philadelphia

The General Transit Feed Specification (GTFS) is an Open Standard used to distribute relevant information about transit systems to riders. The GTFS data of Philadelphia can be downloaded from, https://transitfeeds.com/p/septa, for both bus and regional rail datasets.

The tutorial was developed based on https://medium.com/@sureshd_731/working-with-gtfs-how-to-find-schedule-for-a-bus-5bd95fdf646a. The goal of this example is to extract route details, stops, and schedule information for one trip of a particular route. 

Last modified by Xiaojiang li, Temple University, Dec 2, 2023



In [1]:
import pandas as pd

#### bus and rail data folders in Philadelphia

In [2]:
bus_dir_path="gtfs_bus"
rail_dir_path = "gtfs_rail"

#### Check the route for bus

In [3]:
routes_file = bus_dir_path+"/routes.txt"
df = pd.read_csv(routes_file)

df

Unnamed: 0,route_id,route_short_name,route_long_name,route_type,route_color,route_text_color,route_url
0,1,1,Parx Casino to 54th-City,3,,,
1,10,10,13th-Market to 63rd-Malvern,0,66CC33,000000,
2,101,101,Media to 69th St TC,0,66CC33,000000,
3,102,102,Sharon Hill to 69th St TC,0,66CC33,000000,
4,103,103,Ardmore to 69th St TC,3,,,
...,...,...,...,...,...,...,...
161,MFL,MFL,Frankford TC to 69th St TC,1,0000DF,FFFFFF,
162,MFO,MFO,Midnight-5am Service (Bus) for MFL,3,,,
163,NHSL,NHSL,Norristown TC to 69th St TC,0,A033B3,FFFFFF,
164,R,R,Henry-Mid or WissTC to FrankfordTC,3,,,


#### Let's load the trips.txt file.

In [5]:
trips_file = bus_dir_path + "/trips.txt"
bus_trips = pd.read_csv(trips_file)

print(bus_trips.shape)
bus_trips

(49321, 7)


Unnamed: 0,route_id,service_id,trip_id,trip_headsign,block_id,direction_id,shape_id
0,1,10,87152,Decatur-Drummond,1003,0,275147
1,1,10,87153,Parx Casino,2151,0,275145
2,1,10,87154,Parx Casino via Decatur-Drummond,2155,0,275152
3,1,10,87155,Decatur-Drummond,1426,0,275151
4,1,10,87156,Parx Casino via Decatur-Drummond,2153,0,275152
...,...,...,...,...,...,...,...
49316,15,55,36251,Richmond-Westmoreland,9007,1,274394
49317,15,55,36252,Richmond-Westmoreland,9009,1,274394
49318,15,55,36253,Richmond-Westmoreland,9008,1,274394
49319,15,55,36254,Richmond-Westmoreland,9003,1,274394


##### Get the trips for number 10, use '10' not 10 here

In [6]:
selected_trip = bus_trips[(bus_trips['route_id'] == '10') & (bus_trips['direction_id'] == 0) ] #& (bus_trips['trip_headsign'] == 'Temple University')
print(selected_trip.shape)

(397, 7)


In [7]:
selected_trip

Unnamed: 0,route_id,service_id,trip_id,trip_headsign,block_id,direction_id,shape_id
47813,10,52,33584,13th-Market,9062,0,274383
47814,10,52,33585,13th-Market,9063,0,274383
47815,10,52,33586,13th-Market,9062,0,274383
47816,10,52,33587,13th-Market,9061,0,274383
47817,10,52,33588,13th-Market,9051,0,274383
...,...,...,...,...,...,...,...
49110,10,55,34639,40th-Market,9060,0,274385
49111,10,55,34640,40th-Market,9058,0,274385
49112,10,55,34641,40th-Market,9059,0,274385
49113,10,55,34642,40th-Market,9060,0,274385


Get a bus trip information of route 10

In [8]:
a_bus_trip = selected_trip.iloc[2]
a_bus_trip

route_id                  10
service_id                52
trip_id                33586
trip_headsign    13th-Market
block_id                9062
direction_id               0
shape_id              274383
Name: 47815, dtype: object

#### Get the stop time information

In [10]:
stop_times_file = bus_dir_path + "/stop_times.txt"

stop_times_df = pd.read_csv(stop_times_file)
stop_times_df

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence
0,33584,07:14:00,07:14:00,31294,1
1,33584,07:14:00,07:14:00,20610,2
2,33584,07:15:00,07:15:00,20611,3
3,33584,07:16:00,07:16:00,20612,4
4,33584,07:18:00,07:18:00,20613,5
...,...,...,...,...,...
3007413,89980,15:17:00,15:17:00,23718,53
3007414,89980,15:17:00,15:17:00,23633,54
3007415,89980,15:18:00,15:18:00,22402,55
3007416,89980,15:19:00,15:19:00,31356,56


In [23]:
## Let's extract the rows from stop_times dataframe that correspond to this particular trip.
a_bus_trip_stop_times = stop_times_df.loc[stop_times_df.trip_id == a_bus_trip.trip_id]

# print(a_bus_trip_stop_times.iloc[0].arrival_time)
a_bus_trip_stop_times.head()

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence
72,33586,08:37:00,08:37:00,31294,1
73,33586,08:37:00,08:37:00,20610,2
74,33586,08:38:00,08:38:00,20611,3
75,33586,08:39:00,08:39:00,20612,4
76,33586,08:41:00,08:41:00,20613,5


#### Get the stops location information
Now that we've arrival, departure times of this trip, we need to figure out the names of the stops along the way. Stops for this particular trip are in the stop_id of this data. The stop names are stored in stops.txt. Let's load that file.

In [13]:
stops_txt = bus_dir_path + "/stops.txt"

stops_df = pd.read_csv(stops_txt)
stops_df.head()

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station,zone_id,wheelchair_boarding
0,2,Ridge Av & Wissahickon Transfer Center,40.014986,-75.206826,,31032.0,1,1
1,4,Roosevelt Blvd & Broad St - FS,40.018128,-75.148887,,,1,1
2,5,Roosevelt Blvd & 5th St,40.023465,-75.133487,,,1,1
3,7,Roosevelt Blvd & Pratt St,40.028754,-75.089524,,,1,1
4,8,Roosevelt Blvd & Cottman Av - FS,40.04461,-75.053657,,,1,1


Let's examine the detailed information of our selected trip for route 10

In [18]:
stops = stops_df[stops_df.stop_id.isin(a_bus_trip_stop_times.stop_id)]

stops.head()

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station,zone_id,wheelchair_boarding
157,277,Lancaster Av & Girard Av,39.973059,-75.218615,,,1,2
159,283,13th St Trolley Station,39.952532,-75.162559,,,1,2
2007,15271,Lansdowne Av & 60th St,39.975851,-75.240378,,,1,2
5839,20610,Malvern Av & 63rd St,39.983697,-75.247009,,,1,2
5840,20611,63rd St & Lebanon Av,39.980912,-75.246661,,,1,2


#### We can only get the coordinate info of those bus stops

In [19]:
stop_names = stops_df[['stop_id','stop_name', 'stop_lat', 'stop_lon']]

In [21]:
stop_names.head()

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon
0,2,Ridge Av & Wissahickon Transfer Center,40.014986,-75.206826
1,4,Roosevelt Blvd & Broad St - FS,40.018128,-75.148887
2,5,Roosevelt Blvd & 5th St,40.023465,-75.133487
3,7,Roosevelt Blvd & Pratt St,40.028754,-75.089524
4,8,Roosevelt Blvd & Cottman Av - FS,40.04461,-75.053657


So now we have the stop_ids, and arrival/departure time in one data frame, and the stop_ids, and stop_names in another. Let's combine them to get comprehensive information about the schedule for this particular trip.

In [22]:
a_bus_trip_stop_times.merge(stop_names, on='stop_id')

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_name,stop_lat,stop_lon
0,33586,08:37:00,08:37:00,31294,1,Malvern Av & 63rd St Loop - 2 onsite,39.983838,-75.245957
1,33586,08:37:00,08:37:00,20610,2,Malvern Av & 63rd St,39.983697,-75.247009
2,33586,08:38:00,08:38:00,20611,3,63rd St & Lebanon Av,39.980912,-75.246661
3,33586,08:39:00,08:39:00,20612,4,63rd St & Jefferson St,39.97835,-75.246184
4,33586,08:41:00,08:41:00,20613,5,Lansdowne Av & 63rd - FS,39.975332,-75.244986
5,33586,08:41:00,08:41:00,20614,6,Lansdowne Av & 62nd St,39.975499,-75.243651
6,33586,08:42:00,08:42:00,20615,7,Lansdowne Av & 61st St,39.975693,-75.241879
7,33586,08:43:00,08:43:00,15271,8,Lansdowne Av & 60th St,39.975851,-75.240378
8,33586,08:43:00,08:43:00,20616,9,Lansdowne Av & 59th St,39.976054,-75.238653
9,33586,08:44:00,08:44:00,20617,10,Lansdowne Av & 58th St,39.976248,-75.236904


Voila! There we have it! The schedule for Bus stop of 10. 