# Deliverables:

- Submit a single zip-compressed file that has the name: YourLastName_Exercise_1 that has the following files:

 1. Your **PDF document** that has your Source code and output
 2. Your **ipynb script** that has your Source code and output


# Objectives:

In this exercise, you will:

 - Perform data analysis tasks on data read from a CSV file and loaded into a DataFrame object
 - Use sqlalchemy to load data stored in a DatFrame object into sqlite database engine
 - Use sqlalchemy to connect to sqlite database engine to execute  SQL  queries




# Submission Formats :

Create a folder or directory with all supplementary files with your last name at the beginning of the folder name, compress that folder with zip compression, and post the zip-archived folder under the assignment link in Canvas. The following files should be included in an archive folder/directory that is uploaded as a single zip-compressed file. (Use zip, not StuffIt or any 7z or any other compression method.)


1. Complete IPYNB script that has the source code in Python used to access and analyze the data. The code should be submitted as an IPYNB script that can be be loaded and run in Jupyter Notebook for Python
2. Output from the program, such as console listing/logs, text files, and graphics output for visualizations. If you use the Data Science Computing Cluster or School of Professional Studies database servers or systems, include Linux logs of your sessions as plain text files. Linux logs may be generated by using the script process at the beginning of your session, as demonstrated in tutorial handouts for the DSCC servers.
3. List file names and descriptions of files in the zip-compressed folder/directory.


Formatting Python Code
When programming in Python, refer to Kenneth Reitz’ PEP 8: The Style Guide for Python Code:
http://pep8.org/ (Links to an external site.)Links to an external site.
There is the Google style guide for Python at
https://google.github.io/styleguide/pyguide.html (Links to an external site.)Links to an external site.
Comment often and in detail.


In [129]:
import os

import  pickle

import pandas as pd  # panda's nickname is pd

import numpy as np  # numpy as np

from pandas import DataFrame, Series     # for convenience


In [130]:
xyzcust10=pd.read_csv('xyzcust10.csv')

In [131]:
(xyzcust10).dtypes

ACCTNO                    object
ZIP                        int64
ZIP4                       int64
LTD_SALES                float64
LTD_TRANSACTIONS           int64
YTD_SALES_2009           float64
YTD_TRANSACTIONS_2009      int64
CHANNEL_ACQUISITION       object
BUYER_STATUS              object
ZIP9_Supercode             int64
ZIP9_SUPERCODE             int64
dtype: object

In [132]:
type(xyzcust10)

pandas.core.frame.DataFrame

In [133]:
pickle.dump(xyzcust10,open('xyzcust10.p','wb'))

In [134]:
xyzcust10=pickle.load(open('xyzcust10.p','rb'))	

xyzcust10red = xyzcust10.copy()	# by default makes a “deep” copy

xyzcust10rev1=xyzcust10.copy()	# by default makes a “deep” copy

In [135]:
type(xyzcust10)

pandas.core.frame.DataFrame

In [136]:
xyzcust10.head()

Unnamed: 0,ACCTNO,ZIP,ZIP4,LTD_SALES,LTD_TRANSACTIONS,YTD_SALES_2009,YTD_TRANSACTIONS_2009,CHANNEL_ACQUISITION,BUYER_STATUS,ZIP9_Supercode,ZIP9_SUPERCODE
0,WDQQLLDQL,60084,5016,90.0,1,0.0,0,IB,INACTIVE,600845016,600845016
1,WQWAYHYLA,60091,1750,4227.0,9,1263.0,3,RT,ACTIVE,600911750,600911750
2,GSHAPLHAW,60067,900,420.0,3,129.0,1,RT,ACTIVE,600670900,600670900
3,PGGYDYWAD,60068,3838,6552.0,6,0.0,0,RT,INACTIVE,600683838,600683838
4,LWPSGPLLS,60090,3932,189.0,3,72.0,1,RT,ACTIVE,600903932,600903932


In [137]:
xyzcust10.columns

Index(['ACCTNO', 'ZIP', 'ZIP4', 'LTD_SALES', 'LTD_TRANSACTIONS',
       'YTD_SALES_2009', 'YTD_TRANSACTIONS_2009', 'CHANNEL_ACQUISITION',
       'BUYER_STATUS', 'ZIP9_Supercode', 'ZIP9_SUPERCODE'],
      dtype='object')

In [138]:
xyzcust10.dtypes

ACCTNO                    object
ZIP                        int64
ZIP4                       int64
LTD_SALES                float64
LTD_TRANSACTIONS           int64
YTD_SALES_2009           float64
YTD_TRANSACTIONS_2009      int64
CHANNEL_ACQUISITION       object
BUYER_STATUS              object
ZIP9_Supercode             int64
ZIP9_SUPERCODE             int64
dtype: object

In [139]:
(xyzcust10.ZIP9_Supercode!=xyzcust10.ZIP9_SUPERCODE).sum()

0

In [140]:
 xyzcust10['ZIP9_Supercode']

0        600845016
1        600911750
2        600670900
3        600683838
4        600903932
           ...    
30466    600983951
30467    600989681
30468    600983858
30469    600987927
30470    600984160
Name: ZIP9_Supercode, Length: 30471, dtype: int64

In [141]:
del xyzcust10['ZIP9_Supercode']
del xyzcust10rev1['ZIP9_Supercode']

In [142]:
xyzcust10red.drop('ZIP9_Supercode',axis=1,inplace=True)

In [143]:
os.getcwd()

'/Users/zachtsouprakos/Documents/MSDS/MSDS-420/Module-5/Exercise_3'

In [144]:
import sqlalchemy

In [145]:
from sqlalchemy import create_engine

In [146]:
engine=create_engine('sqlite:///xyz.db')

In [147]:
xyztrans=pd.read_sql('xyztrans', engine)

In [148]:
xyztrans.dtypes

index             int64
ACCTNO           object
QTY               int64
TRANDATE         object
TRAN_CHANNEL     object
PRICE           float64
TOTAMT          float64
ORDERNO          object
DEPTDESCR        object
dtype: object

In [149]:
xyztrans.columns

Index(['index', 'ACCTNO', 'QTY', 'TRANDATE', 'TRAN_CHANNEL', 'PRICE', 'TOTAMT',
       'ORDERNO', 'DEPTDESCR'],
      dtype='object')

In [150]:
from sqlalchemy import schema

In [151]:
xyzMetaData=schema.MetaData(bind=engine)
xyzMetaData.reflect()

In [152]:
xyzMetaData.tables

immutabledict({'xyztrans': Table('xyztrans', MetaData(bind=Engine(sqlite:///xyz.db)), Column('index', BIGINT(), table=<xyztrans>), Column('ACCTNO', TEXT(), table=<xyztrans>), Column('QTY', BIGINT(), table=<xyztrans>), Column('TRANDATE', TEXT(), table=<xyztrans>), Column('TRAN_CHANNEL', TEXT(), table=<xyztrans>), Column('PRICE', FLOAT(), table=<xyztrans>), Column('TOTAMT', FLOAT(), table=<xyztrans>), Column('ORDERNO', TEXT(), table=<xyztrans>), Column('DEPTDESCR', TEXT(), table=<xyztrans>), schema=None)})

In [153]:
xyzMetaData.tables.keys()

dict_keys(['xyztrans'])

In [154]:
xyzcust10rev1.duplicated().sum()

292

In [155]:
xyzcustUnDup=xyzcust10rev1.drop_duplicates()

xyzcustUnDup.duplicated().sum()

0

In [156]:
xyzcust10rev1.duplicated('ACCTNO').sum()

292

In [157]:
xyzcust10rev1.ACCTNO.duplicated().sum()

292

In [158]:
xyzcustUnDup.to_sql('xyzcust', engine)

In [159]:
pd.read_sql_table('xyzcust', engine).columns

Index(['index', 'ACCTNO', 'ZIP', 'ZIP4', 'LTD_SALES', 'LTD_TRANSACTIONS',
       'YTD_SALES_2009', 'YTD_TRANSACTIONS_2009', 'CHANNEL_ACQUISITION',
       'BUYER_STATUS', 'ZIP9_SUPERCODE'],
      dtype='object')

In [160]:
xyzMetaData.tables.keys()

dict_keys(['xyztrans'])

In [161]:
xyzMetaData

MetaData(bind=Engine(sqlite:///xyz.db))

In [162]:
from sqlalchemy import inspect

In [163]:
insp=inspect(engine)

In [164]:
 insp.get_table_names()

['xyzcust', 'xyztrans']

In [165]:
rttrans=pd.read_sql_query("SELECT * FROM xyztrans WHERE TRAN_CHANNEL='RT'", engine)

In [166]:
rttrans

Unnamed: 0,index,ACCTNO,QTY,TRANDATE,TRAN_CHANNEL,PRICE,TOTAMT,ORDERNO,DEPTDESCR
0,0,WGDQLA,1,09JUN2009,RT,599.85,599.85,CCXXNNXXXXUX,Home Audio
1,1,WGDQLA,1,09JUN2009,RT,39.00,39.00,CCXXNNXXXXUX,Small Appliances
2,2,WGDQLA,1,28NOV2009,RT,15.00,15.00,CCXNXXKXXXRI,Small Appliances
3,3,WGDQLA,1,28NOV2009,RT,69.00,69.00,CCXNXXKXXXRI,Small Appliances
4,4,WGDQLA,1,28NOV2009,RT,84.00,84.00,CCXNXXKXXXRI,Small Appliances
...,...,...,...,...,...,...,...,...,...
53806,62376,GYLYSQQSG,1,14NOV2009,RT,45.00,45.00,CCXCXIKXXXNI,Mobile Electronic Accessories
53807,62377,GYLYSQQSG,1,14NOV2009,RT,15.00,15.00,CCXCXIKXXXNI,Mobile Electronics
53808,62378,GYLYSQQSG,1,29NOV2009,RT,42.00,42.00,CCXCRZEXXXNI,Mobile Electronic Accessories
53809,62379,GYLYSQQSG,1,29NOV2009,RT,74.85,74.85,CCXCRZIXXXNI,Small Appliances


In [167]:
custtrans=pd.read_sql_query("SELECT * FROM xyzcust", engine)

In [168]:
custtrans.head()

Unnamed: 0,index,ACCTNO,ZIP,ZIP4,LTD_SALES,LTD_TRANSACTIONS,YTD_SALES_2009,YTD_TRANSACTIONS_2009,CHANNEL_ACQUISITION,BUYER_STATUS,ZIP9_SUPERCODE
0,0,WDQQLLDQL,60084,5016,90.0,1,0.0,0,IB,INACTIVE,600845016
1,1,WQWAYHYLA,60091,1750,4227.0,9,1263.0,3,RT,ACTIVE,600911750
2,2,GSHAPLHAW,60067,900,420.0,3,129.0,1,RT,ACTIVE,600670900
3,3,PGGYDYWAD,60068,3838,6552.0,6,0.0,0,RT,INACTIVE,600683838
4,4,LWPSGPLLS,60090,3932,189.0,3,72.0,1,RT,ACTIVE,600903932


In [169]:
allrttrans=pd.read_sql_query("SELECT * FROM xyztrans", engine)

In [170]:
allrttrans.head()

Unnamed: 0,index,ACCTNO,QTY,TRANDATE,TRAN_CHANNEL,PRICE,TOTAMT,ORDERNO,DEPTDESCR
0,0,WGDQLA,1,09JUN2009,RT,599.85,599.85,CCXXNNXXXXUX,Home Audio
1,1,WGDQLA,1,09JUN2009,RT,39.0,39.0,CCXXNNXXXXUX,Small Appliances
2,2,WGDQLA,1,28NOV2009,RT,15.0,15.0,CCXNXXKXXXRI,Small Appliances
3,3,WGDQLA,1,28NOV2009,RT,69.0,69.0,CCXNXXKXXXRI,Small Appliances
4,4,WGDQLA,1,28NOV2009,RT,84.0,84.0,CCXNXXKXXXRI,Small Appliances


# Requirements :
1. Get a list of all records in xyzcust table where YTD_SALES_2009 > 1000
2. Get a list of all records in xyzcust table where YTD_SALES_2009 > 1000 and CHANNEL_ACQUISITION = 'RT' 
3. What is the total number of records in in xyzcust table where YTD_SALES_2009 > 1000, CHANNEL_ACQUISITION = 'RT', and ZIP = 60056


In [179]:
# Write your python code that meets the above requirements in this cell
# 1. Get a list of all records in xyzcust table where YTD_SALES_2009 > 1000
# Python Version
custtrans[custtrans['YTD_SALES_2009'] > 1000]

Unnamed: 0,index,ACCTNO,ZIP,ZIP4,LTD_SALES,LTD_TRANSACTIONS,YTD_SALES_2009,YTD_TRANSACTIONS_2009,CHANNEL_ACQUISITION,BUYER_STATUS,ZIP9_SUPERCODE
1,1,WQWAYHYLA,60091,1750,4227.0,9,1263.0,3,RT,ACTIVE,600911750
12,12,WLDAYHQLW,60091,2813,3240.0,7,2064.0,3,RT,ACTIVE,600912813
24,24,ASDHAYAW,60062,6077,3411.0,19,1875.0,5,RT,ACTIVE,600626077
31,31,HDWAWLH,60069,3402,25476.0,93,1623.0,4,RT,ACTIVE,600693402
40,40,GSHLHGHWW,60070,2352,3576.0,10,1398.0,3,IB,ACTIVE,600702352
...,...,...,...,...,...,...,...,...,...,...,...
30066,30358,LWWAWAPQD,60098,8091,21030.0,20,5322.0,5,RT,ACTIVE,600988091
30087,30379,AYQWWQLHY,60098,7943,4092.0,9,2625.0,3,RT,ACTIVE,600987943
30114,30406,WWQYYPSA,60098,3133,2100.0,3,1800.0,2,IB,ACTIVE,600983133
30116,30408,WLLWDLLYD,60098,7807,1827.0,2,1827.0,2,RT,ACTIVE,600987807


In [180]:
# Write your python code that meets the above requirements in this cell
# 1. Get a list of all records in xyzcust table where YTD_SALES_2009 > 1000
# SQL Version
pd.read_sql_query('''SELECT * 
                    FROM xyzcust 
                    WHERE YTD_SALES_2009 > 1000''', engine)

Unnamed: 0,index,ACCTNO,ZIP,ZIP4,LTD_SALES,LTD_TRANSACTIONS,YTD_SALES_2009,YTD_TRANSACTIONS_2009,CHANNEL_ACQUISITION,BUYER_STATUS,ZIP9_SUPERCODE
0,1,WQWAYHYLA,60091,1750,4227.0,9,1263.0,3,RT,ACTIVE,600911750
1,12,WLDAYHQLW,60091,2813,3240.0,7,2064.0,3,RT,ACTIVE,600912813
2,24,ASDHAYAW,60062,6077,3411.0,19,1875.0,5,RT,ACTIVE,600626077
3,31,HDWAWLH,60069,3402,25476.0,93,1623.0,4,RT,ACTIVE,600693402
4,40,GSHLHGHWW,60070,2352,3576.0,10,1398.0,3,IB,ACTIVE,600702352
...,...,...,...,...,...,...,...,...,...,...,...
1628,30358,LWWAWAPQD,60098,8091,21030.0,20,5322.0,5,RT,ACTIVE,600988091
1629,30379,AYQWWQLHY,60098,7943,4092.0,9,2625.0,3,RT,ACTIVE,600987943
1630,30406,WWQYYPSA,60098,3133,2100.0,3,1800.0,2,IB,ACTIVE,600983133
1631,30408,WLLWDLLYD,60098,7807,1827.0,2,1827.0,2,RT,ACTIVE,600987807


In [181]:
# 2.Get a list of all records in xyzcust table where YTD_SALES_2009 > 1000 and CHANNEL_ACQUISITION = 'RT'
# Python Version
custtrans[(custtrans['YTD_SALES_2009'] > 1000) & (custtrans['CHANNEL_ACQUISITION'] == 'RT')]

Unnamed: 0,index,ACCTNO,ZIP,ZIP4,LTD_SALES,LTD_TRANSACTIONS,YTD_SALES_2009,YTD_TRANSACTIONS_2009,CHANNEL_ACQUISITION,BUYER_STATUS,ZIP9_SUPERCODE
1,1,WQWAYHYLA,60091,1750,4227.0,9,1263.0,3,RT,ACTIVE,600911750
12,12,WLDAYHQLW,60091,2813,3240.0,7,2064.0,3,RT,ACTIVE,600912813
24,24,ASDHAYAW,60062,6077,3411.0,19,1875.0,5,RT,ACTIVE,600626077
31,31,HDWAWLH,60069,3402,25476.0,93,1623.0,4,RT,ACTIVE,600693402
76,77,LGDGQPGDH,60061,4540,2364.0,17,1359.0,7,RT,ACTIVE,600614540
...,...,...,...,...,...,...,...,...,...,...,...
30044,30336,PLHHGGQYH,60098,8075,6681.0,16,2985.0,7,RT,ACTIVE,600988075
30066,30358,LWWAWAPQD,60098,8091,21030.0,20,5322.0,5,RT,ACTIVE,600988091
30087,30379,AYQWWQLHY,60098,7943,4092.0,9,2625.0,3,RT,ACTIVE,600987943
30116,30408,WLLWDLLYD,60098,7807,1827.0,2,1827.0,2,RT,ACTIVE,600987807


In [182]:
# 2.Get a list of all records in xyzcust table where YTD_SALES_2009 > 1000 and CHANNEL_ACQUISITION = 'RT'
# SQL Version
pd.read_sql_query('''SELECT * 
                    FROM xyzcust 
                    WHERE YTD_SALES_2009 > 1000 
                    AND  CHANNEL_ACQUISITION = 'RT' ''', engine)

Unnamed: 0,index,ACCTNO,ZIP,ZIP4,LTD_SALES,LTD_TRANSACTIONS,YTD_SALES_2009,YTD_TRANSACTIONS_2009,CHANNEL_ACQUISITION,BUYER_STATUS,ZIP9_SUPERCODE
0,1,WQWAYHYLA,60091,1750,4227.0,9,1263.0,3,RT,ACTIVE,600911750
1,12,WLDAYHQLW,60091,2813,3240.0,7,2064.0,3,RT,ACTIVE,600912813
2,24,ASDHAYAW,60062,6077,3411.0,19,1875.0,5,RT,ACTIVE,600626077
3,31,HDWAWLH,60069,3402,25476.0,93,1623.0,4,RT,ACTIVE,600693402
4,77,LGDGQPGDH,60061,4540,2364.0,17,1359.0,7,RT,ACTIVE,600614540
...,...,...,...,...,...,...,...,...,...,...,...
1202,30336,PLHHGGQYH,60098,8075,6681.0,16,2985.0,7,RT,ACTIVE,600988075
1203,30358,LWWAWAPQD,60098,8091,21030.0,20,5322.0,5,RT,ACTIVE,600988091
1204,30379,AYQWWQLHY,60098,7943,4092.0,9,2625.0,3,RT,ACTIVE,600987943
1205,30408,WLLWDLLYD,60098,7807,1827.0,2,1827.0,2,RT,ACTIVE,600987807


In [183]:
# 3. What is the total number of records in in xyzcust table where YTD_SALES_2009 > 1000, 
## CHANNEL_ACQUISITION = 'RT', and ZIP = 60056
# Python Version
custtrans[(custtrans['YTD_SALES_2009'] > 1000) & (custtrans['CHANNEL_ACQUISITION'] == 'RT') & (custtrans['ZIP'] == 60056)]

Unnamed: 0,index,ACCTNO,ZIP,ZIP4,LTD_SALES,LTD_TRANSACTIONS,YTD_SALES_2009,YTD_TRANSACTIONS_2009,CHANNEL_ACQUISITION,BUYER_STATUS,ZIP9_SUPERCODE
1002,1012,AGDDLWSWL,60056,2137,1806.0,5,1806.0,5,RT,ACTIVE,600562137
1249,1263,GLAPQGYWQ,60056,3610,2559.0,5,1164.0,2,RT,ACTIVE,600563610
3815,3847,WAAYQSSWL,60056,3122,5895.0,38,1863.0,10,RT,ACTIVE,600563122
3880,3912,WGQHYLAWY,60056,3217,3753.0,6,1926.0,2,RT,ACTIVE,600563217
4553,4592,ADGGWWHHL,60056,3707,5958.0,5,1677.0,4,RT,ACTIVE,600563707
5539,5586,APSGALYPL,60056,4343,2619.0,6,2271.0,3,RT,ACTIVE,600564343
5847,5896,SWYDPWSWH,60056,3657,1776.0,7,1536.0,5,RT,ACTIVE,600563657
7162,7235,LQQLHSHAQ,60056,3245,1689.0,4,1347.0,1,RT,ACTIVE,600563245
9490,9588,WGSLDLWL,60056,2120,1968.0,8,1467.0,3,RT,ACTIVE,600562120
10378,10484,PSGHWQADH,60056,2509,2319.0,2,1923.0,1,RT,ACTIVE,600562509


In [184]:
# 3. What is the total number of records in in xyzcust table where YTD_SALES_2009 > 1000, 
## CHANNEL_ACQUISITION = 'RT', and ZIP = 60056
# SQL Version
pd.read_sql_query(''' SELECT * 
                  FROM xyzcust 
                  WHERE YTD_SALES_2009 > 1000 
                  AND  CHANNEL_ACQUISITION = 'RT'
                  AND ZIP = 60056 ''', engine)

Unnamed: 0,index,ACCTNO,ZIP,ZIP4,LTD_SALES,LTD_TRANSACTIONS,YTD_SALES_2009,YTD_TRANSACTIONS_2009,CHANNEL_ACQUISITION,BUYER_STATUS,ZIP9_SUPERCODE
0,1012,AGDDLWSWL,60056,2137,1806.0,5,1806.0,5,RT,ACTIVE,600562137
1,1263,GLAPQGYWQ,60056,3610,2559.0,5,1164.0,2,RT,ACTIVE,600563610
2,3847,WAAYQSSWL,60056,3122,5895.0,38,1863.0,10,RT,ACTIVE,600563122
3,3912,WGQHYLAWY,60056,3217,3753.0,6,1926.0,2,RT,ACTIVE,600563217
4,4592,ADGGWWHHL,60056,3707,5958.0,5,1677.0,4,RT,ACTIVE,600563707
5,5586,APSGALYPL,60056,4343,2619.0,6,2271.0,3,RT,ACTIVE,600564343
6,5896,SWYDPWSWH,60056,3657,1776.0,7,1536.0,5,RT,ACTIVE,600563657
7,7235,LQQLHSHAQ,60056,3245,1689.0,4,1347.0,1,RT,ACTIVE,600563245
8,9588,WGSLDLWL,60056,2120,1968.0,8,1467.0,3,RT,ACTIVE,600562120
9,10484,PSGHWQADH,60056,2509,2319.0,2,1923.0,1,RT,ACTIVE,600562509
