### Activity 1: Aggregating Data

#### Description

In this activity, students will learn how to aggregate data using the `groupby` function in Pandas. They will practice calculating summary statistics such as mean, sum, and count for different groups in the dataset.

**Aggregating Sales Data**

In [10]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'B', 'C'],
    'Product': ['Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples'],
    'Sales': [100, 150, 200, 120, 90, 80, 130, 110, 95],
    'Date': ['2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-03', '2024-07-03', '2024-07-03']
}

# Create DataFrame
df = pd.DataFrame(data)

# Convert Date column to datetime
df['Date'] = #YOUR CODE

# Display the DataFrame
print("Initial DataFrame:")
display(df)


Initial DataFrame:


Unnamed: 0,Store,Product,Sales,Date
0,A,Apples,100,2024-07-01
1,A,Oranges,150,2024-07-02
2,B,Apples,200,2024-07-01
3,B,Oranges,120,2024-07-02
4,C,Apples,90,2024-07-01
5,C,Oranges,80,2024-07-02
6,A,Apples,130,2024-07-03
7,B,Oranges,110,2024-07-03
8,C,Apples,95,2024-07-03


#### Task

1. Group the data by `Store` and `Product`.
2. Calculate the total sales for each group.
3. Calculate the mean sales for each group.
4. Count the number of sales records for each group.

In [3]:
# Group by Store and Product
#YOUR CODE

# Calculate total sales for each group
total_sales = #YOUR CODE
total_sales.rename(columns={'Sales': 'Total Sales'}, inplace=True)

# Calculate mean sales for each group
mean_sales = #YOUR CODE
mean_sales.rename(columns={'Sales': 'Mean Sales'}, inplace=True)

# Count the number of sales records for each group
sales_count = #YOUR CODE
sales_count.rename(columns={'Sales': 'Sales Count'}, inplace=True)

# Display the aggregated results
print("Total Sales:")
display(total_sales)
print("\nMean Sales:")
display(mean_sales)
print("\nSales Count:")
display(sales_count)


Total Sales:


Unnamed: 0,Store,Product,Total Sales
0,A,Apples,230
1,A,Oranges,150
2,B,Apples,200
3,B,Oranges,230
4,C,Apples,185
5,C,Oranges,80



Mean Sales:


Unnamed: 0,Store,Product,Mean Sales
0,A,Apples,115.0
1,A,Oranges,150.0
2,B,Apples,200.0
3,B,Oranges,115.0
4,C,Apples,92.5
5,C,Oranges,80.0



Sales Count:


Unnamed: 0,Store,Product,Sales Count
0,A,Apples,2
1,A,Oranges,1
2,B,Apples,1
3,B,Oranges,2
4,C,Apples,2
5,C,Oranges,1


### Activity 2: Feature Engineering

#### Description

In this activity, students will practice creating new columns or features based on existing data. They will use operations such as basic arithmetic, conditional statements, and functions to derive new features.

**Creating New Features from Sales Data**

In [9]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'B', 'C'],
    'Product': ['Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples'],
    'Sales': [100, 150, 200, 120, 90, 80, 130, 110, 95],
    'Date': ['2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-03', '2024-07-03', '2024-07-03']
}

# Create DataFrame
df = #YOUR CODE

# Convert Date column to datetime
df['Date'] = #YOUR CODE

# Display the DataFrame
print("Initial DataFrame:")
display(df)


Initial DataFrame:


Unnamed: 0,Store,Product,Sales,Date
0,A,Apples,100,2024-07-01
1,A,Oranges,150,2024-07-02
2,B,Apples,200,2024-07-01
3,B,Oranges,120,2024-07-02
4,C,Apples,90,2024-07-01
5,C,Oranges,80,2024-07-02
6,A,Apples,130,2024-07-03
7,B,Oranges,110,2024-07-03
8,C,Apples,95,2024-07-03


#### Task

1. Create a new column `Sales_Difference` which is the difference between each `Sales` value and the average sales of the corresponding `Product`.
2. Create a new column `High_Sales` which indicates whether the sales are higher than 100 (True) or not (False).
3. Create a new column `Sales_Per_Day` which is the sales divided by the number of days since the start of the month.

In [6]:
# Calculate the average sales for each product
product_avg_sales = #YOUR CODE

# Create Sales_Difference column
df['Sales_Difference'] = #YOUR CODE

# Create High_Sales column
df['High_Sales'] = #YOUR CODE

# Calculate the number of days since the start of the month
df['Days_Since_Start'] = (df['Date'] - df['Date'].min()).dt.days + 1

# Create Sales_Per_Day column
df['Sales_Per_Day'] = #YOUR CODE

# Display the DataFrame with new features
print("DataFrame with New Features:")
display(df)


DataFrame with New Features:


Unnamed: 0,Store,Product,Sales,Date,Sales_Difference,High_Sales,Days_Since_Start,Sales_Per_Day
0,A,Apples,100,2024-07-01,-23.0,False,1,100.0
1,A,Oranges,150,2024-07-02,35.0,True,2,75.0
2,B,Apples,200,2024-07-01,77.0,True,1,200.0
3,B,Oranges,120,2024-07-02,5.0,True,2,60.0
4,C,Apples,90,2024-07-01,-33.0,False,1,90.0
5,C,Oranges,80,2024-07-02,-35.0,False,2,40.0
6,A,Apples,130,2024-07-03,7.0,True,3,43.333333
7,B,Oranges,110,2024-07-03,-5.0,True,3,36.666667
8,C,Apples,95,2024-07-03,-28.0,False,3,31.666667


### Activity 3: Using Rolling Window

#### Description

In this activity, students will practice using the rolling window method to calculate moving averages. They will apply a rolling window to smooth out sales data and understand trends over time.

#### Title

**Calculating Moving Averages for Sales Data**

In [35]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'B', 'C'],
    'Product': ['Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples'],
    'Sales': [100, 150, 200, 120, 90, 80, 130, 110, 95],
    'Date': ['2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-03', '2024-07-03', '2024-07-03']
}

# Create DataFrame
df = #YOUR CODE

# Convert Date column to datetime
df['Date'] = #YOUR CODE

# Sort DataFrame by Date
df = #YOUR CODE

# Display the DataFrame
print("Initial DataFrame:")
display(df)


Initial DataFrame:


Unnamed: 0,Store,Product,Sales,Date
0,A,Apples,100,2024-07-01
2,B,Apples,200,2024-07-01
4,C,Apples,90,2024-07-01
1,A,Oranges,150,2024-07-02
3,B,Oranges,120,2024-07-02
5,C,Oranges,80,2024-07-02
6,A,Apples,130,2024-07-03
7,B,Oranges,110,2024-07-03
8,C,Apples,95,2024-07-03


#### Task

1. Calculate the 3-day moving average of sales for each product.
2. Add the moving average as a new column `Moving_Avg_3_Days`.

In [36]:
# Calculate the 3-day moving average for each product
# hint: consider using Group by and .transform with a rolling inside 
# hint: This is a template only => df.groupby(<some column>)['<some column>'].transform(lambda x: x.rolling(<some values>).mean())

df['Moving_Avg_3_Days'] = #YOUR CODE

# Display the DataFrame with the moving average
print("DataFrame with 3-Day Moving Average:")

display(#YOUR CODE)


DataFrame with 3-Day Moving Average:


Unnamed: 0,Store,Product,Sales,Date,Moving_Avg_3_Days
0,A,Apples,100,2024-07-01,100.0
2,B,Apples,200,2024-07-01,150.0
4,C,Apples,90,2024-07-01,130.0
6,A,Apples,130,2024-07-03,140.0
8,C,Apples,95,2024-07-03,105.0
1,A,Oranges,150,2024-07-02,150.0
3,B,Oranges,120,2024-07-02,135.0
5,C,Oranges,80,2024-07-02,116.666667
7,B,Oranges,110,2024-07-03,103.333333


### Activity 4: Using Shift for Lag in Time Series Data

#### Description

In this activity, students will practice using the `shift` function to create lagged features in time series data. They will create a new column representing the previous day's sales for each product.

#### Title

**Creating Lagged Features in Sales Data**

In [39]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'B', 'C'],
    'Product': ['Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples'],
    'Sales': [100, 150, 200, 120, 90, 80, 130, 110, 95],
    'Date': ['2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-03', '2024-07-03', '2024-07-03']
}

# Create DataFrame
df =#YOUR CODE

# Convert Date column to datetime
df['Date'] = #YOUR CODE

# Sort DataFrame by Date
df =#YOUR CODE

# Display the DataFrame
display(df)


Unnamed: 0,Store,Product,Sales,Date
0,A,Apples,100,2024-07-01
2,B,Apples,200,2024-07-01
4,C,Apples,90,2024-07-01
1,A,Oranges,150,2024-07-02
3,B,Oranges,120,2024-07-02
5,C,Oranges,80,2024-07-02
6,A,Apples,130,2024-07-03
7,B,Oranges,110,2024-07-03
8,C,Apples,95,2024-07-03


#### Task

1. Create a new column `Prev_Day_Sales` that contains the sales value of the previous day for each product.
2. Handle missing values appropriately for the new column.

In [42]:
# Create Prev_Day_Sales column
# hint: Group by then Shift

df['Prev_Day_Sales'] = #YOUR CODE

# Handle missing values by filling with 0 (or you can choose another appropriate method)
df['Prev_Day_Sales'] = #YOUR CODE

# Display the DataFrame with the new lagged feature
display(#YOUR CODE)


Unnamed: 0,Store,Product,Sales,Date,Prev_Day_Sales
0,A,Apples,100,2024-07-01,0.0
2,B,Apples,200,2024-07-01,100.0
4,C,Apples,90,2024-07-01,200.0
6,A,Apples,130,2024-07-03,90.0
8,C,Apples,95,2024-07-03,130.0
1,A,Oranges,150,2024-07-02,0.0
3,B,Oranges,120,2024-07-02,150.0
5,C,Oranges,80,2024-07-02,120.0
7,B,Oranges,110,2024-07-03,80.0


### Activity 5: Datetime Manipulation

#### Description

In this activity, students will practice manipulating datetime data. They will extract different components from a datetime column and create new columns based on these components.

#### Title

**Manipulating Datetime Data in Sales Data**

In [43]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'B', 'C'],
    'Product': ['Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples'],
    'Sales': [100, 150, 200, 120, 90, 80, 130, 110, 95],
    'Date': ['2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-03', '2024-07-03', '2024-07-03']
}

# Create DataFrame
#YOUR CODE

# Convert Date column to datetime
#YOUR CODE

# Display the DataFrame
#YOUR CODE


Unnamed: 0,Store,Product,Sales,Date
0,A,Apples,100,2024-07-01
1,A,Oranges,150,2024-07-02
2,B,Apples,200,2024-07-01
3,B,Oranges,120,2024-07-02
4,C,Apples,90,2024-07-01
5,C,Oranges,80,2024-07-02
6,A,Apples,130,2024-07-03
7,B,Oranges,110,2024-07-03
8,C,Apples,95,2024-07-03


#### Task

1. Create a new column `Year` that extracts the year from the `Date` column.
2. Create a new column `Month` that extracts the month from the `Date` column.
3. Create a new column `Day` that extracts the day from the `Date` column.
4. Create a new column `Day_of_Week` that extracts the day of the week from the `Date` column.

In [16]:
# Extract year from Date
#YOUR CODE

# Extract month from Date
#YOUR CODE

# Extract day from Date
#YOUR CODE

# Extract day of the week from Date
#YOUR CODE

# Display the DataFrame with new datetime features
display(df)


Unnamed: 0,Store,Product,Sales,Date,Year,Month,Day,Day_of_Week
0,A,Apples,100,2024-07-01,2024,7,1,Monday
1,A,Oranges,150,2024-07-02,2024,7,2,Tuesday
2,B,Apples,200,2024-07-01,2024,7,1,Monday
3,B,Oranges,120,2024-07-02,2024,7,2,Tuesday
4,C,Apples,90,2024-07-01,2024,7,1,Monday
5,C,Oranges,80,2024-07-02,2024,7,2,Tuesday
6,A,Apples,130,2024-07-03,2024,7,3,Wednesday
7,B,Oranges,110,2024-07-03,2024,7,3,Wednesday
8,C,Apples,95,2024-07-03,2024,7,3,Wednesday


### Activity 6: Stacking Data

#### Description

In this activity, students will practice stacking data using the `stack` method in Pandas. They will transform a DataFrame from a wide format to a long format.

#### Title

**Stacking Sales Data**

In [44]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Product_A_Sales': [100, 150, 200, 120, 90, 80],
    'Product_B_Sales': [130, 160, 210, 130, 95, 85]
}

# Create DataFrame
#YOUR CODE

# Display the DataFrame
#YOUR CODE


Unnamed: 0,Store,Product_A_Sales,Product_B_Sales
0,A,100,130
1,A,150,160
2,B,200,210
3,B,120,130
4,C,90,95
5,C,80,85


#### Task

1. Stack the sales data so that the products are in a single column and their corresponding sales in another column.
2. Reset the index after stacking to convert the resulting Series back to a DataFrame.

In [18]:
# Stack the data
# Hint: Watch the indexes. Try to understand how Stack works 
stacked_df = #YOUR CODE

# Rename the columns for better understanding
stacked_df.columns = #YOUR CODE

# Display the stacked DataFrame
display(stacked_df)


Unnamed: 0,Store,Product,Sales
0,A,Product_A_Sales,100
1,A,Product_B_Sales,130
2,A,Product_A_Sales,150
3,A,Product_B_Sales,160
4,B,Product_A_Sales,200
5,B,Product_B_Sales,210
6,B,Product_A_Sales,120
7,B,Product_B_Sales,130
8,C,Product_A_Sales,90
9,C,Product_B_Sales,95


### Activity 7: Melting Data

#### Description

In this activity, students will practice melting data using the `melt` method in Pandas. They will transform a DataFrame from a wide format to a long format, focusing on converting multiple columns into a single column.

#### Title

**Melting Sales Data**

In [45]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Product_A_Sales': [100, 150, 200, 120, 90, 80],
    'Product_B_Sales': [130, 160, 210, 130, 95, 85]
}

# Create DataFrame
#YOUR CODE

# Display the DataFrame
#YOUR CODE


Unnamed: 0,Store,Product_A_Sales,Product_B_Sales
0,A,100,130
1,A,150,160
2,B,200,210
3,B,120,130
4,C,90,95
5,C,80,85


#### Task

1. Melt the data so that `Product_A_Sales` and `Product_B_Sales` columns are converted into a single `Product` column with their respective sales in another column `Sales`.

In [20]:
# Melt the data
melted_df = pd.melt(#YOUR CODE)

# Display the melted DataFrame
display(melted_df)

Unnamed: 0,Store,Product,Sales
0,A,Product_A_Sales,100
1,A,Product_A_Sales,150
2,B,Product_A_Sales,200
3,B,Product_A_Sales,120
4,C,Product_A_Sales,90
5,C,Product_A_Sales,80
6,A,Product_B_Sales,130
7,A,Product_B_Sales,160
8,B,Product_B_Sales,210
9,B,Product_B_Sales,130


### Activity 8: Pivoting Data

#### Description

In this activity, students will practice pivoting data using the `pivot` method in Pandas. They will transform a DataFrame from a long format to a wide format, focusing on creating a matrix-like structure.

#### Title

**Pivoting Sales Data**

In [21]:
import pandas as pd

# Sample melted sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Product': ['Product_A', 'Product_B', 'Product_A', 'Product_B', 'Product_A', 'Product_B'],
    'Sales': [100, 150, 200, 120, 90, 80]
}

# Create DataFrame
#YOUR CODE

# Display the DataFrame
#YOUR CODE


Unnamed: 0,Store,Product,Sales
0,A,Product_A,100
1,A,Product_B,150
2,B,Product_A,200
3,B,Product_B,120
4,C,Product_A,90
5,C,Product_B,80


#### Task

1. Pivot the data so that the `Product` column becomes columns, and their corresponding sales are the values.
2. Use the `Store` column as the index.

In [22]:
# Pivot the data
pivoted_df = #YOUR CODE

# Display the pivoted DataFrame
display(pivoted_df)


Product,Store,Product_A,Product_B
0,A,100,150
1,B,200,120
2,C,90,80


### Activity 9: Using `nlargest` and `nsmallest`

#### Description

In this activity, students will practice using the `nlargest` and `nsmallest` methods in Pandas. They will identify the top and bottom sales records in the dataset.

#### Title

**Finding Top and Bottom Sales Records**

In [46]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'B', 'C'],
    'Product': ['Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples'],
    'Sales': [100, 150, 200, 120, 90, 80, 130, 110, 95],
    'Date': ['2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-03', '2024-07-03', '2024-07-03']
}

# Create DataFrame
df = pd.DataFrame(data)

# Display the DataFrame
display(df)


Unnamed: 0,Store,Product,Sales,Date
0,A,Apples,100,2024-07-01
1,A,Oranges,150,2024-07-02
2,B,Apples,200,2024-07-01
3,B,Oranges,120,2024-07-02
4,C,Apples,90,2024-07-01
5,C,Oranges,80,2024-07-02
6,A,Apples,130,2024-07-03
7,B,Oranges,110,2024-07-03
8,C,Apples,95,2024-07-03


#### Task

1. Find the top 3 sales records using `nlargest`.
2. Find the bottom 3 sales records using `nsmallest`.

In [24]:
# Find the top 3 sales records
top_3_sales = #YOUR CODE

# Find the bottom 3 sales records
bottom_3_sales = #YOUR CODE

# Display the top 3 sales records
display(top_3_sales)

# Display the bottom 3 sales records
display(bottom_3_sales)


Unnamed: 0,Store,Product,Sales,Date
2,B,Apples,200,2024-07-01
1,A,Oranges,150,2024-07-02
6,A,Apples,130,2024-07-03


Unnamed: 0,Store,Product,Sales,Date
5,C,Oranges,80,2024-07-02
4,C,Apples,90,2024-07-01
8,C,Apples,95,2024-07-03


### Activity 10: Ranking Data

#### Description

In this activity, students will practice ranking data using the `rank` method in Pandas. They will create a new column that ranks the sales within each store.

#### Title

**Ranking Sales Data**

In [47]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'B', 'C'],
    'Product': ['Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples'],
    'Sales': [100, 150, 200, 120, 90, 80, 130, 110, 95],
    'Date': ['2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-03', '2024-07-03', '2024-07-03']
}

# Create DataFrame
df = pd.DataFrame(data)

# Display the DataFrame
display(df)


Unnamed: 0,Store,Product,Sales,Date
0,A,Apples,100,2024-07-01
1,A,Oranges,150,2024-07-02
2,B,Apples,200,2024-07-01
3,B,Oranges,120,2024-07-02
4,C,Apples,90,2024-07-01
5,C,Oranges,80,2024-07-02
6,A,Apples,130,2024-07-03
7,B,Oranges,110,2024-07-03
8,C,Apples,95,2024-07-03


#### Task

1. Create a new column `Sales_Rank` that ranks the sales within each store in descending order.

In [26]:
# Rank the sales within each store
df['Sales_Rank'] = #YOUR CODE

# Display the DataFrame with the Sales_Rank column
display(df)


Unnamed: 0,Store,Product,Sales,Date,Sales_Rank
0,A,Apples,100,2024-07-01,3.0
1,A,Oranges,150,2024-07-02,1.0
2,B,Apples,200,2024-07-01,1.0
3,B,Oranges,120,2024-07-02,2.0
4,C,Apples,90,2024-07-01,2.0
5,C,Oranges,80,2024-07-02,3.0
6,A,Apples,130,2024-07-03,2.0
7,B,Oranges,110,2024-07-03,3.0
8,C,Apples,95,2024-07-03,1.0


### Activity 11: Binning Data

#### Description

In this activity, students will practice binning data using the `cut` method in Pandas. They will create bins for the sales data and categorize the sales into different levels.

#### Title

**Binning Sales Data**

In [48]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'B', 'C'],
    'Product': ['Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples'],
    'Sales': [100, 150, 200, 120, 90, 80, 130, 110, 95],
    'Date': ['2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-03', '2024-07-03', '2024-07-03']
}

# Create DataFrame
df = pd.DataFrame(data)

# Display the DataFrame
display(df)


Unnamed: 0,Store,Product,Sales,Date
0,A,Apples,100,2024-07-01
1,A,Oranges,150,2024-07-02
2,B,Apples,200,2024-07-01
3,B,Oranges,120,2024-07-02
4,C,Apples,90,2024-07-01
5,C,Oranges,80,2024-07-02
6,A,Apples,130,2024-07-03
7,B,Oranges,110,2024-07-03
8,C,Apples,95,2024-07-03


#### Task

1. Create bins for the `Sales` data with the following categories: 'Low', 'Medium', 'High'.
2. Add a new column `Sales_Level` to categorize each sale into one of the bins.

In [28]:
# Define the bins and labels
bins = #YOUR CODE
labels = #YOUR CODE

# Create the Sales_Level column
df['Sales_Level'] = #YOUR CODE

# Display the DataFrame with the Sales_Level column
display(df)


Unnamed: 0,Store,Product,Sales,Date,Sales_Level
0,A,Apples,100,2024-07-01,Low
1,A,Oranges,150,2024-07-02,Medium
2,B,Apples,200,2024-07-01,High
3,B,Oranges,120,2024-07-02,Medium
4,C,Apples,90,2024-07-01,Low
5,C,Oranges,80,2024-07-02,Low
6,A,Apples,130,2024-07-03,Medium
7,B,Oranges,110,2024-07-03,Medium
8,C,Apples,95,2024-07-03,Low


### Challenge Activity: Comprehensive Sales Analysis

#### Description

In this final challenge, students will apply multiple data transformation concepts they have learned to perform a comprehensive analysis of sales data. They will aggregate data, create new features, manipulate datetime data, use rolling windows, create lagged features, rank data, and categorize sales into bins.

#### Title

**Comprehensive Sales Data Analysis**

In [49]:
import pandas as pd

# Sample sales data
data = {
    'Store': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'Product': ['Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges', 'Apples', 'Oranges'],
    'Sales': [100, 150, 200, 120, 90, 80, 130, 110, 95, 105, 210, 70],
    'Date': ['2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-01', '2024-07-02', '2024-07-03', '2024-07-03', '2024-07-03', '2024-07-04', '2024-07-04', '2024-07-04']
}

# Create DataFrame
df = pd.DataFrame(data)

# Convert Date column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Display the DataFrame
display(df)


Unnamed: 0,Store,Product,Sales,Date
0,A,Apples,100,2024-07-01
1,A,Oranges,150,2024-07-02
2,B,Apples,200,2024-07-01
3,B,Oranges,120,2024-07-02
4,C,Apples,90,2024-07-01
5,C,Oranges,80,2024-07-02
6,A,Apples,130,2024-07-03
7,B,Oranges,110,2024-07-03
8,C,Apples,95,2024-07-03
9,A,Oranges,105,2024-07-04


#### Task

1. Aggregate the data to calculate total sales for each store and product combination.
2. Create new features:
   - `Sales_Difference` which is the difference between each `Sales` value and the average sales of the corresponding `Product`.
   - `High_Sales` which indicates whether the sales are higher than 100 (True) or not (False).
3. Manipulate datetime data to extract `Year`, `Month`, `Day`, and `Day_of_Week`.
4. Calculate a 3-day moving average of sales for each product.
5. Create a lagged feature `Prev_Day_Sales` that contains the sales value of the previous day for each product.
6. Rank the sales within each store in descending order.
7. Categorize sales into bins with the categories: 'Low', 'Medium', 'High'.
8. Create a summary report showing total sales, average sales, highest sale, and lowest sale for each store.

In [50]:
# 1. Aggregate the data to calculate total sales for each store and product combination
aggregated_sales = #YOUR CODE
aggregated_sales.rename(columns={'Sales': 'Total_Sales'}, inplace=True)

# 2. Create new features
# Calculate the average sales for each product
product_avg_sales = #YOUR CODE

# Create Sales_Difference column
df['Sales_Difference'] = #YOUR CODE

# Create High_Sales column
df['High_Sales'] = #YOUR CODE

# 3. Manipulate datetime data
df['Year'] = #YOUR CODE
df['Month'] = #YOUR CODE
df['Day'] = #YOUR CODE
df['Day_of_Week'] = #YOUR CODE

# 4. Calculate a 3-day moving average of sales for each product
df['Moving_Avg_3_Days'] = #YOUR CODE

# 5. Create a lagged feature Prev_Day_Sales
df['Prev_Day_Sales'] = #YOUR CODE
df['Prev_Day_Sales'] = #YOUR CODE

# 6. Rank the sales within each store in descending order
df['Sales_Rank'] = #YOUR CODE

# 7. Categorize sales into bins
bins = #YOUR CODE
labels = #YOUR CODE
df['Sales_Level'] = #YOUR CODE

# 8. Create a summary report
summary_report = #YOUR CODE

# Display the DataFrame with all transformations
print('Fully Transformed Data:')
display(df)

# Display the summary report
print('Summary Report:')
display(summary_report)


Fully Transformed Data:


Unnamed: 0,Store,Product,Sales,Date,Sales_Difference,High_Sales,Year,Month,Day,Day_of_Week,Moving_Avg_3_Days,Prev_Day_Sales,Sales_Rank,Sales_Level
0,A,Apples,100,2024-07-01,-37.5,False,2024,7,1,Monday,100.0,0.0,4.0,Low
1,A,Oranges,150,2024-07-02,44.166667,True,2024,7,2,Tuesday,150.0,0.0,1.0,Medium
2,B,Apples,200,2024-07-01,62.5,True,2024,7,1,Monday,150.0,100.0,2.0,High
3,B,Oranges,120,2024-07-02,14.166667,True,2024,7,2,Tuesday,135.0,150.0,3.0,Medium
4,C,Apples,90,2024-07-01,-47.5,False,2024,7,1,Monday,130.0,200.0,2.0,Low
5,C,Oranges,80,2024-07-02,-25.833333,False,2024,7,2,Tuesday,116.666667,120.0,3.0,Low
6,A,Apples,130,2024-07-03,-7.5,True,2024,7,3,Wednesday,140.0,90.0,2.0,Medium
7,B,Oranges,110,2024-07-03,4.166667,True,2024,7,3,Wednesday,103.333333,80.0,4.0,Medium
8,C,Apples,95,2024-07-03,-42.5,False,2024,7,3,Wednesday,105.0,130.0,1.0,Low
9,A,Oranges,105,2024-07-04,-0.833333,True,2024,7,4,Thursday,98.333333,110.0,3.0,Medium


Summary Report:


Unnamed: 0,Store,Total_Sales,Average_Sales,Highest_Sale,Lowest_Sale
0,A,485,121.25,150,100
1,B,640,160.0,210,110
2,C,335,83.75,95,70
