### Data Prep

Write a script to transform input CSV to desired output CSV. 

You will find a CSV file in the files folder under [data.csv](files/data.csv). There are two steps (plus an optional bonus - Date offset) to this part of the test. Each step concerns manipulating the values for a single field according to the step's requirements. The steps are as follows:

**String cleaning** - The bio field contains text with arbitrary padding, spacing and line breaks. Normalize these values to a space-delimited string.

**Code swap** - There is a supplementary CSV in the files folder under [state_abbreviations.csv](files/state_abbreviations.csv). This "data dictionary" contains state abbreviations alongside state names. For the state field of the input CSV, replace each state abbreviation with its associated state name from the data dictionary.

(Optional) **Date offset** - The start_date field contains data in a variety of formats. These may include e.g., "June 23, 1912" or "5/11/1930" (month, day, year). But not all values are valid dates. Invalid dates may include e.g., "June 2018", "3/06" (incomplete dates) or even arbitrary natural language. Add a start_date_description field adjacent to the start_date column to filter invalid date values into. Normalize all valid date values in start_date to ISO 8601 (i.e., YYYY-MM-DD).

Your script should take [data.csv](files/data.csv) as input and produce a cleansed "enriched.csv" file according to the step requirements above. 

_Please commit a _enriched.csv_ file along with your solution code in the `solution/csv` folder._



In [19]:
import sys
import csv
import pandas as pd
from dateutil import parser
from dateutil.parser import isoparse
import datetime
import re


data = pd.read_csv('../../files/data.csv')
state = pd.read_csv('../../files/state_abbreviations.csv')
data['start_date_description']="na"

for index, row in data.iterrows():
# String cleaning
    data.at[index,'bio'] = re.sub( '\s+', ' ', row['bio'] ).strip()
# Code Swap
    data.at[index,'state'] = state.loc[state['state_abbr'] == row['state']].state_name.to_string().split()[1]
# Date Offset
    try:
        d = parser.parse(row['start_date'])
        data.at[index,'start_date_description'] = d.strftime("%Y-%m-%d")
    except ValueError:
        data.at[index,'start_date_description'] = "Invalid Date"
    if len(row['start_date'].split()) == 2:
        data.at[index,'start_date_description'] = "Invalid Date"
    if len(row['start_date']) < 8:
        data.at[index,'start_date_description'] = "Invalid Date"
        
data.to_csv('enriched.csv')
data.head(100)


Unnamed: 0,name,gender,birthdate,address,city,state,zipcode,email,bio,job,start_date,start_date_description
0,Leslee Corwin,M,1974-02-01,4933 Weber Walks,Lake Carey,Kansas,32725,hansen.kennedy@yahoo.com,At aut velit unde minus recusandae molestias. ...,Education administrator,10/06,Invalid Date
1,Orris Kuvalis,M,1997-01-08,092 Kanye Forge,South Doshiamouth,Tennessee,8955,nicky.brown@yahoo.com,Corporis non harum doloribus ab provident. Ali...,Industrial buyer,Voluptatem odio.,Invalid Date
2,Afton Hirthe,M,1970-08-25,355 Shaquille Centers Suite 834,Lorriborough,Oklahoma,12027,noelle.gibson@lebsack.biz,Sed vitae dolorem quae totam sequi fuga odit. ...,Multimedia specialist,Est sed et suscipit.,Invalid Date
3,Olinda Wisoky,F,2007-01-11,48328 Rudolph Harbors,Braunport,Nebraska,19620,desirae.ritchie@yahoo.com,Nostrum impedit nulla vero ullam ad repudianda...,Financial controller,06/71,Invalid Date
4,Dr. Annmarie Schmitt PhD,M,1995-09-29,47861 Satterfield Meadow Suite 420,Bergstromshire,North,44200,johnson.betsey@yahoo.com,Commodi quia facere dolores facere. Sed culpa ...,Chartered management accountant,05/02,Invalid Date
5,Mathew Grady,F,1973-04-30,59765 Berge Coves Suite 085,Port Jarretbury,South,94969,legros.kamryn@yahoo.com,Non omnis fugit molestias. Dolor eum et evenie...,"Designer, multimedia",10/95,Invalid Date
6,Ailene Abernathy,F,2002-06-19,26158 Tea Crest Suite 735,Ivannachester,Minnesota,41350,diallo72@huels.com,Eius quas expedita ut culpa doloribus. Et labo...,Personal assistant,Ea nostrum et.,Invalid Date
7,Justen Carroll,M,2003-08-27,2194 Parker Cove Apt. 737,North Marlo,Maryland,936,floy.adams@lindgrenmoen.net,Optio molestias accusamus quos aut beatae laud...,"Engineer, mining",10/20/1994,1994-10-20
8,Nikita Torphy,M,1979-08-02,65938 Alvira Prairie,Mariyahfort,New,97295,ocarter@hotmail.com,Recusandae quod sed provident consequatur. Ad ...,Photographer,09/74,Invalid Date
9,Murry Waelchi PhD,F,2013-12-03,4431 Sheridan Divide,Port Markellview,Connecticut,14333,georgia.rice@hotmail.com,Perspiciatis aut autem ea et odit. Quo nulla q...,Software engineer,Quibusdam similique.,Invalid Date
