**Phase 1: Basic Exploration and Cleaning**

1.  **"Convert the 'Date' column to datetime objects. This will allow us to perform time-based analysis. Then tell me the earliest and latest date in the dataset."**
    * Use `pd.to_datetime()` to convert the column.
    * Use `.min()` and `.max()` on the datetime column.

2.  **"Check for unique values in the 'Country', 'Product', and 'Sales Person' columns. How many unique values are there in each column?"**
    * Use `.nunique()` to get the count.
    * Use `.unique()` to see the unique values themselves if needed.

3.  **"Analyze the 'Amount' and 'Boxes Shipped' columns. Calculate the descriptive statistics (mean, median, standard deviation, min, max, etc.). Are there any outliers?"**
    * Use `.describe()` to get the statistics.
    * Use histograms or box plots to visualize the distributions and identify outliers.

4.  **"Check for any inconsistencies in the 'Currency' column. Are all transactions in the same currency? If not, investigate and handle the differences if needed. If there are multiple currencies, inform me of the conversion rates to a single currency, for example, USD."**
    * Use `.unique()` and `.value_counts()` to understand the currencies.

5.  **"Create a new column called 'Amount per Box' by dividing the 'Amount' column by the 'Boxes Shipped' column. Then analyze the distribution of this new column."**
    * This helps understand the price per box.

In [6]:
import pandas as pd

In [4]:
df = pd.read_csv('Cleaned Chocolate Sales.csv')
df.head()

Unnamed: 0,Sales Person,Country,Product,Date,Amount($),Boxes Shipped
0,Jehu Rudeforth,UK,Mint Chip Choco,04-Jan-22,5320,180
1,Van Tuxwell,India,85% Dark Bars,01-Aug-22,7896,94
2,Gigi Bohling,India,Peanut Butter Cubes,07-Jul-22,4501,91
3,Jan Morforth,Australia,Peanut Butter Cubes,27-Apr-22,12726,342
4,Jehu Rudeforth,UK,Peanut Butter Cubes,24-Feb-22,13685,184


## Date Conversion

In [10]:
df['Date'] = pd.to_datetime(df['Date'])
df.head()

Unnamed: 0,Sales Person,Country,Product,Date,Amount($),Boxes Shipped
0,Jehu Rudeforth,UK,Mint Chip Choco,2022-01-04,5320,180
1,Van Tuxwell,India,85% Dark Bars,2022-08-01,7896,94
2,Gigi Bohling,India,Peanut Butter Cubes,2022-07-07,4501,91
3,Jan Morforth,Australia,Peanut Butter Cubes,2022-04-27,12726,342
4,Jehu Rudeforth,UK,Peanut Butter Cubes,2022-02-24,13685,184


### Ealiest Shipment Date

In [13]:
df['Date'].min()

Timestamp('2022-01-03 00:00:00')

### Latest Shipment Date

In [14]:
df['Date'].max()

Timestamp('2022-08-31 00:00:00')

## Unique Values Check

### Number of Unique Countries and the Unique Values

In [15]:
df['Country'].nunique()

6

In [16]:
df['Country'].unique()

array(['UK', 'India', 'Australia', 'New Zealand', 'USA', 'Canada'],
      dtype=object)

### Number of Unique Products and the Unique Values

In [17]:
df['Product'].nunique()

22

In [18]:
df['Product'].unique()

array(['Mint Chip Choco', '85% Dark Bars', 'Peanut Butter Cubes',
       'Smooth Sliky Salty', '99% Dark & Pure', 'After Nines',
       '50% Dark Bites', 'Orange Choco', 'Eclairs', 'Drinking Coco',
       'Organic Choco Syrup', 'Milk Bars', 'Spicy Special Slims',
       'Fruit & Nut Bars', 'White Choc', 'Manuka Honey Choco',
       'Almond Choco', 'Raspberry Choco', 'Choco Coated Almonds',
       "Baker's Choco Chips", 'Caramel Stuffed Bars', '70% Dark Bites'],
      dtype=object)

### Number of Unique Sales Persons and the Unique Values

In [19]:
df['Sales Person'].nunique()

25

In [20]:
df['Sales Person'].unique()

array(['Jehu Rudeforth', 'Van Tuxwell', 'Gigi Bohling', 'Jan Morforth',
       'Oby Sorrel', 'Gunar Cockshoot', 'Brien Boise',
       'Rafaelita Blaksland', 'Barr Faughny', 'Mallorie Waber',
       'Karlen McCaffrey', "Marney O'Breen", 'Beverie Moffet',
       'Roddy Speechley', 'Curtice Advani', 'Husein Augar', 'Kaine Padly',
       'Dennison Crosswaite', "Wilone O'Kielt", 'Andria Kimpton',
       'Kelci Walkden', 'Camilla Castle', 'Madelene Upcott',
       'Dotty Strutley', 'Ches Bonnell'], dtype=object)