# Steps:
- Import series & Read in data
- Basic cleaning & Series Formatting. 
- Remove outliers and abnormal counts (i.e. from subway maintenance)
- Group by unique turnstyle
- Find the "traffic" amount by getting the difference in hours
- Group turnstyles based on being in the same station -- note: you can't rely just on station names, there needs to be syncronicity in station lines. (i.e. there are a few 42nd streets)
- Output

# - Import series & Read in data

In [51]:
import pandas as pd
import datetime
from dateutil.parser import parse
from datetime import datetime

In [99]:
subway_week = pd.read_csv("../Resources/Data/Raw/mta-subway-volume/turnstile_160319_mod.csv")

# - Basic cleaning & Series Formatting. 
a) remove entries which aren't "Regular" (i.e. counts which are made for maintenance purposes and not on the hourly schedule)

b) create date-type objects

In [100]:
# a) remove entries which aren't "Regular" (i.e. counts which are made for maintenance purposes and not on the hourly schedule)
subway_week = subway_week[subway_week["DESC"]=="REGULAR"]

# This leads to some missing indicies indicies (e.g. 14733) which is leading into problems down the road.
# ... so we will fix that
# subway_week[14730:14740]
subway_week.index = range(len(subway_week))

In [115]:
# Improvement Opportunity: Make date parsing faster via: http://ze.phyr.us/faster-strptime/
subway_week["DATE"] = [parse(date_string).date() for date_string in subway_week["DATE"]]
subway_week["TIME"] = [parse(date_string).time() for date_string in subway_week["TIME"]]

In [116]:
subway_week["DATE_TIME"] = [datetime.combine(subway_week["DATE"][i], subway_week["TIME"][i]) for i in range(len(subway_week))]

In [84]:
subway_week.to_pickle("../Resources/Pickles/subway_traffic.pickle")
#subway_week = pd.read_pickle("../Resources/Pickles/subway_traffic.pickle")

In [117]:
small = subway_week.head()
small

Unnamed: 0,C/A,UNIT,SCP,STATION,LINENAME,DIVISION,DATE,TIME,DESC,ENTRIES,EXITS,DATE_TIME
0,A002,R051,02-00-00,59 ST,NQR456,BMT,2016-03-12,03:00:00,REGULAR,7468622,1884949,2016-03-12 03:00:00
1,A002,R051,02-00-00,59 ST,NQR456,BMT,2016-03-12,07:00:00,REGULAR,7468657,1884968,2016-03-12 07:00:00
2,A002,R051,02-00-00,59 ST,NQR456,BMT,2016-03-12,11:00:00,REGULAR,7468852,1885067,2016-03-12 11:00:00
3,A002,R051,02-00-00,59 ST,NQR456,BMT,2016-03-12,15:00:00,REGULAR,7469194,1885157,2016-03-12 15:00:00
4,A002,R051,02-00-00,59 ST,NQR456,BMT,2016-03-12,19:00:00,REGULAR,7469732,1885250,2016-03-12 19:00:00
