# Data Overview

---

Objectives:-
- Load the raw CSV dataset into Python for analysis
- Examine the number of Rows and columns.
- Review Column Names and Data Types to understand variable formats.
- Inspect initial and final records to verify data consistency.
- Generate basic statistical summaries to identify trends, distributions, and potential anomalies

---

Import required libraries:-


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)

Load Raw Dataset:-

In [None]:
file_path = ('../Data/01_Raw_Data/Apple_Global_Raw_Data.csv')
df = pd.read_csv(file_path)

df.head()

Random Rows:-

In [None]:
df.sample(5)

Dataset Shape:-

In [None]:

print("Number of Rows:", df.shape[0])
print("Number of Columns:", df.shape[1])

Column Names:-

In [None]:
print("Column Names:")
for col in df.columns:
    print(col)

Data Types:-

In [None]:
df.info()

Basic Statistical Summary:-

In [None]:
df.describe()

Check for Missing Values:-

In [None]:
df.isnull().sum()

Quick Overview of Categorical Columns:-



In [None]:
Categorical_Columns = df.select_dtypes(include = [ 'object']).columns

for col in Categorical_Columns:
    print(f"\n{col} - Unique Values:")
    print(df[col].value_counts().head())

---
Key Insights:- 
- The dataset contains **200,000 rows and 17 columns**, indicating a large-scale global business dataset suitable for in-depth analysis.
- The data includes a mix of **numerical, categorical, and date variables, covering sales performance, financial metrics, customer behavior, and stock information**.
- Key business dimensions such as **Region, Product_Category, and Payment_Channel show multiple unique values**, enabling comparative analysis across global markets and sales platforms.
- Financial variables like **Revenue, Units Sold, Profit Margin, and Marketing Spend exhibit wide ranges**, suggesting significant variation across time periods and regions.
- The Date column spans **multiple years**, making the dataset appropriate for trend analysis and future growth prediction.
- Initial inspection may reveal minor data quality issues such as missing values or inconsistent data types (especially in date fields), which will be addressed in the cleaning phase.

----