### Gather data: 
We are going to use [credit card data](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients) from UCI's Machine Learning Repository for this project. Let us gather required data into a folder, `datasets`. 

Let's first load the python modules needed for gather credit card data.

In [3]:
# Modules required to gather data
import wget
import os 
import pandas as pd

In here, I am reading complete weblinks into variables url_data and url_names.  

In [4]:
# Weblink directing to credit card dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening'

# Names of files in credit card dataset
data_file = 'crx.data'
names_file = 'crx.names'

Organizing data from the beginning will save time for productive data analysis in the later stages of the project. Downloaded (2) data files are given the same name as specified above.    

In [8]:
# Local directory to save data
path_data = '../datasets'

OK. We are all set to gather the data we need for this project. I am going to write a very simple python function to check and download the data needed. 

In [22]:
# Function to download data files
def check_download(url, filename, path_data):
    """
        READS: url, filename, and path to download data 
        Checks for datafile, and downloads if NOT present
    """
    # Join file name to download
    url_data = url + '/' + filename
    
    # Check and create directory if doesn't exist
    if not os.path.exists(path_data):
        os.mkdir(path_data)
        print(" Folder created:",path_data)
    else:
        print(" Folder exists:",path_data)
    
    # Check and download file if doesn't exist
    if os.path.isfile(path_data+'/'+filename):
        print(" Datafile already present:",path_data+'/'+filename)
    else:
        print(" Downloading data... <START>")
        datafile = wget.download(url_data, path_data)
        print(" - {}".format(filename))
        print(" Downloading data... <FINISH>")

Let us call the check_download function twice on two datafiles needed.

In [23]:
# Call function on data_file
check_download(url, data_file, path_data)

 Folder exists: ../datasets
 Downloading data... <START>
 - crx.data
 Downloading data... <FINISH>


In [24]:
# Call function on names_file
check_download(url, names_file, path_data)

 Folder exists: ../datasets
 Downloading data... <START>
 - crx.names
 Downloading data... <FINISH>


Required datafiles for this project are now downloaded into `datasets` folder. How about a quick sneak-peak into the `datasets` folder? 

In [42]:
def show_files_in_datasets(path):
    """
        Reads the path and prints out file names if present 
    """
    print(" datasets/")
    for ifile in os.listdir(path):
        print("       -",ifile)

In [43]:
show_files_in_datasets(path_data)

 datasets/
       - crx.data
       - crx.names


***

### Assess data:
In here, we load the credit card data and assess the data for `Quality` and `Tidiness`.

In [47]:
ccdata = pd.read_table(path_data + '/' + data_file)
ccdata.head(5)

Unnamed: 0,b,30.83,0,u,g,w,v,1.25,t,t.1,01,f,g.1,00202,0.1,+
0,a,58.67,4.46,u,g,q,h,3.04,t,t,6,f,g,43,560,+
1,a,24.5,0.5,u,g,q,h,1.5,t,f,0,f,g,280,824,+
2,b,27.83,1.54,u,g,w,v,3.75,t,t,5,t,g,100,3,+
3,b,20.17,5.625,u,g,w,v,1.71,t,f,0,f,s,120,0,+
4,b,32.08,4.0,u,g,m,v,2.5,t,f,0,t,g,360,0,+
