# Project 1 — E-commerce Data Exploration

This notebook uses the **ecommerce_dataset.csv**.  
It follows the class requirements: working with a new dataset, computing summary statistics using **pandas** and the **Python standard library**, and creating a simple **text-based visualization**.

---

## Step 1: Load data with pandas and inspect it

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv("ecommerce_dataset.csv")

# Display basic info
print("Number of rows:", len(df))
df.head()

## Step 2: Select a numeric column

In [None]:
# Choose a numeric column to analyze, e.g. total_value
col = "total_value"
values = df[col]
values.describe()

## Step 3: Compute mean, median, and mode using pandas

In [None]:
mean_pandas = values.mean()
median_pandas = values.median()
mode_pandas = values.mode().iloc[0]

print(f"Mean (pandas): {mean_pandas:.2f}")
print(f"Median (pandas): {median_pandas:.2f}")
print(f"Mode (pandas): {mode_pandas:.2f}")

## Step 4: Compute the same values manually (no pandas, no statistics module)

In [None]:
# Extract numeric list
values_list = list(values.dropna())

# Mean
mean_manual = sum(values_list) / len(values_list)

# Median
sorted_vals = sorted(values_list)
n = len(sorted_vals)
if n % 2 == 0:
    median_manual = (sorted_vals[n//2 - 1] + sorted_vals[n//2]) / 2
else:
    median_manual = sorted_vals[n//2]

# Mode using dictionary
counts = {}
for v in values_list:
    counts[v] = counts.get(v, 0) + 1
mode_manual = max(counts, key=counts.get)

print(f"Mean (manual): {mean_manual:.2f}")
print(f"Median (manual): {median_manual:.2f}")
print(f"Mode (manual): {mode_manual:.2f}")

## Step 5: Create a text-based visualization (standard library only)

We'll show total sales by product category as a simple bar made of '*' characters.

In [None]:
# Aggregate totals by category
totals = df.groupby("product_category")["total_value"].sum().sort_values(ascending=False)

# Scale stars so that 1 star ≈ total_value / 50_000
scale = 50_000

print("E-commerce Sales by Category (each * ≈ $50K)\n")
for category, total in totals.items():
    stars = "*" * int(total / scale)
    print(f"{category:12}: {stars}  (${total:,.0f})")

## Step 6: Reflection

- **Which was easier?** Pandas is faster and simpler — one line for each statistic.  
- **Which was harder?** The manual method required sorting and counting logic manually.  
- **What could be improved?** You could expand to analyze trends by date, customer, or discount levels.

---
**Appendix:**  
- Generated: 2025-11-07 02:59  
- Python version: 3.11.8