## T. Martz-Oberlander, 2015-11-13 
## Test-driven development (TDD) for checking if data is properly aligned in columns

I had problems with my pitch data lining up (some rows got shifted into different columns). 
So, I want to know— is my data lined up? I can find that out by asking more specifically: are the data in column 'n' floats or integers? I assert that they are, otherwise say “data is an object"

In [118]:
#I call in necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import re
import numpy as np

%matplotlib inline
from pandas import set_option

#I want to be able to easily scroll through this notebook so I limit the length of the appearance of my dataframes 
set_option('display.max_rows', 5)

#First, I'll import my pitch data and define a variable name for it 
pitches = pd.read_csv('pitches.csv', sep=',')

#display pitches
pitches

Unnamed: 0,time,div,note,freq,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11
0,2010-04-13 8:37,pedal,c3,131.17,131.20,131.18,131.11,131.17,131.14,131.21,,
1,2010-04-13 8:37,pedal,c4,262.08,262.12,262.09,262.05,262.07,262.10,262.08,,
...,...,...,...,...,...,...,...,...,...,...,...,...
55,2010-04-17 10:35,pedal,c4,,261.95,261.95,262.02,262.00,261.97,262.01,261.95,261.97
56,2010-04-17 10:37,great,c4,,261.69,261.69,261.68,261.71,261.74,261.66,261.68,261.69


In [119]:
#Changing column labelling:

#Make Date Time column
pitches['time']= pd.to_datetime(pitches['time'])

#I check the type of data in the dataframe columns
print(pitches.dtypes)


time           datetime64[ns]
div                    object
                    ...      
Unnamed: 10           float64
Unnamed: 11           float64
dtype: object


All columns appear to be in order; however, if I was importing a larger file I may not know this. So I write a test function to see if column[3]--the first frequency column with float values in it-- really is filled with floats. This will tell me if I can proceed to perform math on the values in the dataframe.


In [142]:
#I define a variable for one column
freq = pitches.iloc[:,3]

#I check the data type of the freq variable
freq.dtype

#I know this column should pass my test, because it is made of floats

dtype('float64')

In [143]:
#TEST FUNCTION
# I pass the freq column data through the test function

def test_data_type(freq):
    '''Check to see if a column contains only floats'''
    obs = data_type(freq) #I pass the dtype checking function through my test function
    #print(obs)
    exp = 'float64'
    assert obs == 'float64' , 'Data is not a float'
    
test_data_type(freq)

In [133]:
#return data type function

def data_type(freq):
    '''Display data type of a column'''
    freq_type = freq.dtype
    return freq_type

print('Data type is:', freq)
    

Data type is: 0     131.17
1     262.08
       ...  
55       NaN
56       NaN
Name: freq, dtype: float64


## Test my test function with a known non-float column

In [161]:
#test function for a non-float column


#I define the variable div, for the 2nd column
div = pitches.iloc[:,1]
print(div)
#I can see div is made of words, not floats so it should fail my test

#My function should tell me "O" for object
div.dtype 

0     pedal
1     pedal
      ...  
55    pedal
56    great
Name: div, dtype: object


dtype('O')

In [162]:
#I run the test
def test_data_type(div):
    '''Check to see if a column contains only floats'''
    obs2 = data_type2(div)
    print(obs2)
    exp = 'float64'
    assert obs2 == 'float64' , 'Data is not a float'
    
test_data_type(div)

object


AssertionError: Data is not a float


Because my test works when I pass through a 'float64' column, and doesn't work (outputs my assert statement) when I pass an 'object' column through, I know that my test function and main funciton work. Now I can test whether my dataframe columns are usable for computational analysis.