# Processing & Feature Engineering

---

Objectives:-
- To extract meaningful time-based features (Year, Month, Quarter) from the date column for trend analysis.
- To create new financial metrics such as Total Sales Value and Profit Amount to better evaluate business performance.
- To calculate Revenue Growth Percentage to understand year-over-year and month-over-month growth trends.
- To generate aggregated datasets based on region-wise and product-wise performance.
- To develop sales channel metrics to compare online and retail performance.
- To prepare an enhanced dataset suitable for EDA and KPI analysis.

---

Import Libraries & Load Clean Data:-

In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('../Data/02_Clean_Data/Apple_Global_Clean_Data.csv')

df["date"] = pd.to_datetime(df["date"])
df.head()

Extract Year, Month, Quarter:-

In [None]:
df["Year"] = df["date"].dt.year
df["Month"] = df["date"].dt.month
df["Quarter"] = df["date"].dt.to_period("Q").astype(str)

df[["date", "Year", "Month", "Quarter"]].head()

Create Total Sales Value:-

- Units Sold × Average Price

In [None]:
df["total_sales_value_usd"] = df["units_sold"] * df["average_price_usd"]

df[["units_sold", "average_price_usd", "total_sales_value_usd"]].head()

Create Profit Amount:-
- Revenue × Profit Margin %

In [None]:
df["profit_amount_usd_million"] = (
    df["revenue_usd_million"] * df["profit_margin_%"] / 100
)

df[["revenue_usd_million", "profit_margin_%", "profit_amount_usd_million"]].head()

Calculate Revenue Growth %:-

- Year-wise growth

In [None]:
df = df.sort_values("date")

df["revenue_Growth_%"] = df.groupby("Year")["revenue_usd_million"].pct_change() * 100

df[["date", "revenue_usd_million", "revenue_Growth_%"]].head(10)

Region-wise Aggregation:-

In [None]:
Region_Summary = df.groupby("region").agg(
    Total_Revenue = ("revenue_usd_million", "sum"),
    Total_Profit = ("profit_amount_usd_million", "sum"),
    Total_Units_Sold = ("units_sold", "sum"),
    Avg_Profit_Margin = ("profit_margin_%", "mean")
).reset_index()

Region_Summary.head()

Product-wise Aggregation:-

In [None]:
Product_Summary = df.groupby("product_category").agg(
    Total_Revenue = ("revenue_usd_million", "sum"),
    Total_Profit = ("profit_amount_usd_million", "sum"),
    Total_Units_Sold = ("units_sold", "sum"),
    Avg_Rating = ("customer_rating", "mean")
).reset_index()

Product_Summary.head()

Sales Channel Metrics:-

In [None]:
Channel_Summary = df.groupby("payment_channel").agg(
    Total_Revenue = ("revenue_usd_million", "sum"),
    Total_Units_Sold = ("units_sold", "sum"),
    Avg_Return_Rate = ("return_rate_%", "mean")
).reset_index()

Channel_Summary.head()

Save Processed Dataset:-

In [None]:
Processed_path = ('../Data/03_Processed_Data/Apple_Global_Processed_Data.csv') 
df.to_csv(Processed_path, index = False)

print('Processed Dataset Saved Successfully')

---

Key Insights:- 
- Time-based feature extraction revealed strong seasonal patterns, with higher sales and revenue observed in specific quarters, indicating product launch and festive period impacts.
- The newly created Total Sales Value metric showed that high-priced products like iPhone and Mac contribute disproportionately to overall revenue despite lower unit volumes compared to accessories.
- Profit Amount analysis highlighted that regions with higher revenue do not always generate the highest profit, emphasizing the importance of profit margin optimization.
- Revenue Growth % calculations identified periods of rapid growth as well as slowdown phases, useful for future forecasting and strategic planning.
- Region-wise aggregation confirmed that North America and China are the primary revenue drivers, while emerging markets show steady growth potential.
- Product-wise aggregation demonstrated that Services revenue provides consistent profitability compared to hardware products, which show more fluctuation.
- Sales channel metrics indicated that online and carrier channels outperform traditional retail stores in terms of total revenue contribution.

---