# Exercise 1
Import the `SalesData.csv` file into an appropriate data structure in pandas. Using the functions available in pandas and NumPy (where they are more appropriate) output answers to the following questions:

- What were the sales for P2 and B8 in Nov-18, Feb-18 and Mar-19?
- In the third quarter (Oct-18 to Dec-18) what were the sales figures for London (L3 & 2) and what was the monthly percentage increase?
- What were the top three months for New York (N6 and N4) and in which stores did they occur?
- What was the overall lowest sales figure in which store and which month?

In [None]:
import pandas as pd
import numpy as np
data = pd.read_csv("data/SalesData.csv", index_col=0)

In [None]:
P2_B8 = data.loc[['P2', 'B8'], ['Nov-18', 'Feb-19', 'Mar-19']]
print(f"Q1:\n{P2_B8}")

Q1:
    Nov-18  Feb-19  Mar-19
P2     489     298     269
B8     464     459     659


In [23]:
London = data.loc[['L3','L1'], ['Oct-18', 'Nov-18', 'Dec-18']]
print(London)
percent_changes = London.transpose().pct_change()
print(percent_changes.replace(np.nan, 0))

    Oct-18  Nov-18  Dec-18
L3     363     459     539
L1     340     446     573
              L3        L1
Oct-18  0.000000  0.000000
Nov-18  0.264463  0.311765
Dec-18  0.174292  0.284753


In [43]:
nyc = data.loc[['N6', 'N4']]
nyc.transpose().nlargest(3, ['N6', 'N4'])

Unnamed: 0,N6,N4
Dec-18,640,676
Jan-19,495,408
May-18,480,480


In [63]:
# returns multiindex of the smallest value (215)
min_store, min_month = data.stack().idxmin()
min_value = data.loc[min_store, min_month]
print(f'Store, Month of minimum value: {min_store}, {min_month}\nMinimum value: {min_value}')

Store, Month of minimum value: N6, Feb-19
Minimum value: 215


# Exercise 2
Using the data set from exercise 2, above, create a new data table, including headings, that gives the percentage increase for each month, based on the stores average for the year. From this ascertain:

- Which store had the largest and smallest increase and in which months did they occur?
- Across all the stores which months showed the smallest average increase and the largest average increase?

In [68]:
percents = data.transpose().pct_change()

In [76]:
max_month, max_store = percents.stack().idxmax()
min_month, min_store = percents.stack().idxmin()

print(f'''Month on Month Percentage changes:
      
Maximum values:
    Month: {max_month}
    Store: {max_store}
    Value: {percents.loc[max_month, max_store]:.4f}%

Minimum values:
    Month: {min_month}
    Store: {min_store}
    Value: {percents.loc[min_month, min_store]:.4f}%''')

Month on Month Percentage changes:
      
Maximum values:
    Month: Nov-18
    Store: P2
    Value: 0.5980%

Minimum values:
    Month: Feb-19
    Store: N6
    Value: -0.5657%


In [88]:
max_avg_increase = percents.mean(axis=1).idxmax()
min_avg_increase = percents.mean(axis=1).idxmin()

print(f'''Mean Monthly percent increases:
Maximum:
    Month: {max_avg_increase}
    % increase: {percents.loc[max_avg_increase].max():.4f}%

Minimum:
    Month: {min_avg_increase}
    % increase: {percents.loc[min_avg_increase].min():.4f}%
''')

Mean Monthly percent increases:
Maximum:
    Month: Nov-18
    % increase: 0.5980%

Minimum:
    Month: Feb-19
    % increase: -0.5657%



# Exercise 1 - Data Cleaning and Preparation
Example the `Favourites.csv` file to determine how the date needs to be cleaned. This file does contain duplicates and it does contain inconsistencies. Selecting appropriate structures and techniques perform the following tasks to prepare the data:

- Anonymize the data by removing the customer names (first and last name) replacing with a unique identifier.
- A number of different symbols (including space) have been used for null values, select and apply a consist ant approach to their representation.
- Remove any customer data that has missing data in any of the following first name, last name and email.
- Once complete export the data to a new file

In [147]:
faves = pd.read_csv('data/Favourites.csv')

In [148]:
# Convert to Category data type, get unique code for each name
faves['cust_id'] = (faves['First Name'] + faves['Last Name']).astype('category').cat.codes
faves = faves.drop(['First Name', 'Last Name'], axis=1)

In [149]:
# Some entries include 'none' and 0 as null values
faves.replace(['none', 0], np.nan, inplace=True)

In [151]:
# Remove entries with no name (cust_id = -1) or email address
faves = faves[~( # Not
    (faves['cust_id'] == -1) | # No name
    faves['Email'].isna())] # No email address
faves
# Feels a bit silly leaving email addresses in since those
# are personally identifiable indicators anyway

Unnamed: 0,Job Title,Interests,Interests.1,Education Level,City,Food,Food.1,Car,Income,Movie,Movie.1,Email,cust_id
0,Chef Manager,Music,Accounting,Junior college,Fremont,McDonald's,Sun-Pat,Audi Q7,801401.0,Animation,Animation,Gil_Abbey1862@irrepsy.com,369.0
1,Machine Operator,Sociology,Health,Technical college,Honolulu,Heinz,Magnum,Citroen Nemo,29159.0,Animation,Romance,Noah_Abbey3942@ubusive.com,745.0
3,Executive Director,Modern Literature,,High school,Boston,KFC,,Audi Q1,316511.0,Adventure,0,Judith_Adams7583@irrepsy.com,500.0
4,Accountant,Latin,Language Arts,Middle school,Fayetteville,Domino,Papadopoulos,BMW X6,442632.0,Musical,Musical,Alessia_Addis5500@cispeto.com,31.0
5,Systems Administrator,Speech,,Junior college,Houston,Betty Crocker,,BMW X3,235058.0,Drama,,Melania_Addis8320@guentu.biz,684.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
997,Stockbroker,Modern Literature,Economics,Gymnasium,Ontario,Baskin robbins,Baskin robbins,Opel Astra,348532.0,Musical,Drama,Mason_York2697@fuliss.net,661.0
998,Auditor,Design and technology,Grammar,Graduate school,Valetta,Taco Bell,LEonidas,Fiat 500,81510.0,Drama,Musical,Elijah_York8639@yahoo.com,301.0
999,Cook,Accounting,Science,Junior college,Berna,Lavazza,Subway,Hyundai Tucson,642577.0,Animation,Thriller,Ryan_Young5697@vetan.org,870.0
1000,Food Technologist,Spanish,Art,Middle school,Dallas,Bewley's,Baskin robbins,Hyundai i20,467918.0,Family,Horror,Paige_Young7597@bauros.biz,768.0


In [154]:
# Export to file - pickle
pd.to_pickle(faves, 'out/Favourites.pkl')

re_read = pd.read_pickle('out/Favourites.pkl')
re_read

Unnamed: 0,Job Title,Interests,Interests.1,Education Level,City,Food,Food.1,Car,Income,Movie,Movie.1,Email,cust_id
0,Chef Manager,Music,Accounting,Junior college,Fremont,McDonald's,Sun-Pat,Audi Q7,801401.0,Animation,Animation,Gil_Abbey1862@irrepsy.com,369.0
1,Machine Operator,Sociology,Health,Technical college,Honolulu,Heinz,Magnum,Citroen Nemo,29159.0,Animation,Romance,Noah_Abbey3942@ubusive.com,745.0
3,Executive Director,Modern Literature,,High school,Boston,KFC,,Audi Q1,316511.0,Adventure,0,Judith_Adams7583@irrepsy.com,500.0
4,Accountant,Latin,Language Arts,Middle school,Fayetteville,Domino,Papadopoulos,BMW X6,442632.0,Musical,Musical,Alessia_Addis5500@cispeto.com,31.0
5,Systems Administrator,Speech,,Junior college,Houston,Betty Crocker,,BMW X3,235058.0,Drama,,Melania_Addis8320@guentu.biz,684.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
997,Stockbroker,Modern Literature,Economics,Gymnasium,Ontario,Baskin robbins,Baskin robbins,Opel Astra,348532.0,Musical,Drama,Mason_York2697@fuliss.net,661.0
998,Auditor,Design and technology,Grammar,Graduate school,Valetta,Taco Bell,LEonidas,Fiat 500,81510.0,Drama,Musical,Elijah_York8639@yahoo.com,301.0
999,Cook,Accounting,Science,Junior college,Berna,Lavazza,Subway,Hyundai Tucson,642577.0,Animation,Thriller,Ryan_Young5697@vetan.org,870.0
1000,Food Technologist,Spanish,Art,Middle school,Dallas,Bewley's,Baskin robbins,Hyundai i20,467918.0,Family,Horror,Paige_Young7597@bauros.biz,768.0
