# Data Loading & Understanding

---

Import Libraries:

In [None]:
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)

Load Raw Data:

In [None]:
file_path = ('../Data/01_Raw_Data/Phoenix_Global_Sales_Raw_Data.csv')

df = pd.read_csv(file_path)
df.head()

Dataset Shape:

In [None]:
df.shape

Column Overview & Data Types:

In [None]:
df.info()

Column Names:

In [None]:
df.columns.tolist()

Checking The Missing Values:

In [None]:
df.isnull().sum()

Duplicate Records:

In [None]:
df.duplicated().sum()

Summary Statistics:

In [None]:
df.describe()

Date Range Validation:

In [None]:
df['Date'] = pd.to_datetime(df['Date'])

df['Date'].min(), df['Date'].max()

Country & Region Distribution:

In [None]:
df['Country'].value_counts()

df['Region'].value_counts()

Product & Sales Channel Overview:

In [None]:
df['Product_Type'].value_counts()

df['Sales_Channel'].value_counts()

Revenue, Cost, Profit Check:

In [None]:
df[['Revenue', 'Cost', 'Profit']].describe()

Memory Usage & Performance Check:

In [None]:
df.memory_usage(deep=True).sum() / 1024**2

---

# Observations:-
1. Dataset contains 500,000 records across multiple years.
2. No / minimal missing values.
3. Multiple countries and regions present.
4. Revenue and cost vary significantly across transactions.
5. Profit includes both positive and negative values.
6. Date range spans several years.


---