# MTA Data Challenges

In [1]:
import csv
from datetime import datetime
from collections import Counter
from collections import defaultdict
from itertools import islice
import matplotlib.pyplot as plt
%matplotlib inline

**Challenge 1**

Open up a new IPython notebook
Download a few MTA turnstile data files
Open up a file, use csv reader to read it, make a python dict where there is a key for each (C/A, UNIT, SCP, STATION). These are the first four columns. The value for this key should be a list of lists. Each list in the list is the rest of the columns in a row. For example, one key-value pair should look like
{ ('A002','R051','02-00-00','LEXINGTON AVE'): [ ['NQR456', 'BMT', '01/03/2015', '03:00:00', 'REGULAR', '0004945474', '0001675324'], ['NQR456', 'BMT', '01/03/2015', '07:00:00', 'REGULAR', '0004945478', '0001675333'], ['NQR456', 'BMT', '01/03/2015', '11:00:00', 'REGULAR', '0004945515', '0001675364'], ... ] }

In [64]:
!curl -O http://web.mta.info/developers/data/nyct/turnstile/turnstile_170513.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24.5M    0 24.5M    0     0   9.9M      0 --:--:--  0:00:02 --:--:-- 10.1M


In [65]:
with open('turnstile_170513.txt') as file:
    reader = csv.reader(file)
    header = next(reader)
    strip_rows = [x.strip() for x in row]
    rows = [row for row in reader]

    

In [66]:
# assert rows.pop(0) == ['C/A', 'UNIT', 'SCP', 'STATION', 'LINENAME',
#                        'DIVISION', 'DATE', 'TIME', 'DESC', 'ENTRIES',
#                        'EXITS']

In [67]:
mta_dict = defaultdict(list)

In [68]:
rows[:2]

[['A002',
  'R051',
  '02-00-00',
  '59 ST',
  'NQR456W',
  'BMT',
  '05/06/2017',
  '00:00:00',
  'REGULAR',
  '0006166390',
  '0002088519                                            '],
 ['A002',
  'R051',
  '02-00-00',
  '59 ST',
  'NQR456W',
  'BMT',
  '05/06/2017',
  '04:00:00',
  'REGULAR',
  '0006166420',
  '0002088525                                            ']]

In [69]:
for row in rows:
        mta_dict[tuple(row[:4])].append(list(row[4:]))

In [70]:
list(mta_dict.items())[0]

(('A002', 'R051', '02-00-00', '59 ST'),
 [['NQR456W',
   'BMT',
   '05/06/2017',
   '00:00:00',
   'REGULAR',
   '0006166390',
   '0002088519                                            '],
  ['NQR456W',
   'BMT',
   '05/06/2017',
   '04:00:00',
   'REGULAR',
   '0006166420',
   '0002088525                                            '],
  ['NQR456W',
   'BMT',
   '05/06/2017',
   '08:00:00',
   'REGULAR',
   '0006166433',
   '0002088557                                            '],
  ['NQR456W',
   'BMT',
   '05/06/2017',
   '12:00:00',
   'REGULAR',
   '0006166514',
   '0002088663                                            '],
  ['NQR456W',
   'BMT',
   '05/06/2017',
   '16:00:00',
   'REGULAR',
   '0006166791',
   '0002088720                                            '],
  ['NQR456W',
   'BMT',
   '05/06/2017',
   '20:00:00',
   'REGULAR',
   '0006167090',
   '0002088765                                            '],
  ['NQR456W',
   'BMT',
   '05/07/2017',
   '00:00:00',
   'REGULA

**Challenge 2**

Let's turn this into a time series.
For each key (basically the control area, unit, device address and station of a specific turnstile), have a list again, but let the list be comprised of just the point in time and the count of entries.

This basically means keeping only the date, time, and entries fields in each list. You can convert the date and time into datetime objects -- That is a python class that represents a point in time. You can combine the date and time fields into a string and use the dateutil module to convert it into a datetime object. For an example check this StackOverflow question.

Your new dict should look something like

{ ('A002','R051','02-00-00','LEXINGTON AVE'): [ [datetime.datetime(2013, 3, 2, 3, 0), 3788], [datetime.datetime(2013, 3, 2, 7, 0), 2585], [datetime.datetime(2013, 3, 2, 12, 0), 10653], [datetime.datetime(2013, 3, 2, 17, 0), 11016], [datetime.datetime(2013, 3, 2, 23, 0), 10666], [datetime.datetime(2013, 3, 3, 3, 0), 10814], [datetime.datetime(2013, 3, 3, 7, 0), 10229], ... ], .... }