# More Unit Testing

In [None]:
import numpy as np
import matplotlib
%matplotlib inline
import pandas as pd
import sys
libraries = (('Matplotlib', matplotlib), ('Numpy', np), ('Pandas', pd))

print("Python Version:", sys.version, '\n')
for lib in libraries:
    print('{0} Version: {1}'.format(lib[0], lib[1].__version__))

In [None]:
!mkdir data
!wget -nc -P data https://s3.amazonaws.com/gamma-datasets/P2/mta_turnstile_160903.txt https://s3.amazonaws.com/gamma-datasets/P2/mta_turnstile_160910.txt https://s3.amazonaws.com/gamma-datasets/P2/mta_turnstile_160917.txt

## Exercise: UnitTesting with Real Data

We're going to revisit the MTA data and get started with building some unit tests together. I'm providing the tests in the TestDataLoader class, you need to write a function that 
* takes in a list of week IDs as input
* loads the dataframe corresponding to those week IDs (check out the data folder) and combines them
* returns the single dataframe

You should be able to pass all of the tests. Note that some of them require some minimal cleaning already before returning things!

In [None]:
def load_data_into_dataframe(week_ids):
    pass

In [None]:
import unittest

class TestDataLoader(unittest.TestCase):
    
    def test_fails_without_file_list(self):
        with self.assertRaises(TypeError):
            load_data_into_dataframe()
        with self.assertRaises(TypeError):
            load_data_into_dataframe(160903)
    
    def test_output_type(self):
        self.assertIs(type(load_data_into_dataframe([160903])), type(pd.DataFrame()))
        
    def test_column_names(self):
        df = load_data_into_dataframe([160903])
        bool_cols = (df.columns == ['C/A', 'UNIT', 'SCP', 'STATION', 'LINENAME', 'DIVISION', 'DATE', 'TIME',
       'DESC', 'ENTRIES','EXITS'])
        self.assertTrue(bool_cols.all())
        
    def test_multiple_files_of_data(self):
        df = load_data_into_dataframe([160903,160910])
        self.assertIs(type(df), type(pd.DataFrame()))

unittest.main(TestDataLoader(), argv=['first-arg-is-ignored'], exit=False)
# Note that this time I added the name of the testing class as an arg so it only runs that
# tester instead of all the possible testers currently defined!

## Exercise 2: Writing the function and the Tests

Now your goal is to write both the functions and the tests. The goal here is that we're going to write a function to clean and prepare our data. The function should:

* Take in a dataframe that already contains a Date and Time column
* Create a DATE_TIME column using the DATE and TIME columns
* Make sure that each grouping of ["C/A", "UNIT", "SCP", "STATION", "DATE_TIME"] is unique

For tests, you should write tests to check the output types of columns, check that the uniqueness values are being handled properly, as well as any other tests you can think of. 

In ~15 minutes, we'll have someone come up and present both their code and their tests and other folks can chime in about the types of tests they've written as well.

In [None]:
df = load_data_into_dataframe([160917])

In [None]:
# YOU NEED TO CODE THIS!
class TestDataCleaner(unittest.TestCase):
    
    pass

unittest.main(TestDataCleaner(), argv=['first-arg-is-ignored'], exit=False)