# Laptop Pricing Analysis

This notebook explores the pricing of laptops using data cleaning, feature engineering, and visualization.

Steps included:
- Load and clean data
- Map categorical codes to meaningful labels
- Basic data exploration and summaries
- Visualization with correlation heatmaps, scatterplots, boxplots
- Grouped heatmap analysis
- Pearson correlation analysis

## 1. Import libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import os

## 2. Load dataset and clean
We load the laptop pricing dataset from the URL and remove redundant unnamed columns.

In [None]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod2.csv"
df = pd.read_csv(url)
df.drop(columns=['Unnamed: 0.1', 'Unnamed: 0'], inplace=True)
df.head()

## 3. Map numerical categories to human-readable labels
This helps make plots and analysis easier to interpret.

In [None]:
gpu_mapping = {1: "GTX 1050", 2: "RTX 3070", 3: "RTX 4080"}
os_mapping = {1: "Windows", 2: "Linux"}
category_mapping = {1: "Gaming", 2: "Business", 3: "Ultrabook", 4: "Workstation", 5: "Convertible"}

df['GPU'] = df['GPU'].replace(gpu_mapping)
df['OS'] = df['OS'].replace(os_mapping)
df['Category'] = df['Category'].replace(category_mapping)

df[['Category', 'GPU', 'OS']].head()

## 4. Data Summary
Numeric and categorical summaries provide an overview of dataset characteristics.

In [None]:
print("--- Numeric Summary ---")
print(df.describe().T)

print("\n--- Categorical Summary ---")
print(df.describe(include=['object']).T)

## 5. Correlation Matrix Heatmap
Visualize correlations among numeric features.

In [None]:
plt.figure(figsize=(8,6))
corr = df.select_dtypes(include=[np.number]).corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt='.2f', square=True)
plt.title('Correlation Matrix Heatmap')
plt.tight_layout()
plt.show()

## 6. Scatter Plots: Numeric features vs Price
Exploring linear relationships between some key numeric features and Price.

In [None]:
numeric_features = ["CPU_frequency", "Screen_Size_inch", "Weight_pounds"]
fig, axes = plt.subplots(1, 3, figsize=(18,5))
for ax, feature in zip(axes, numeric_features):
    sns.regplot(x=feature, y="Price", data=df, ax=ax)
    ax.set_title(f"{feature} vs Price")
    ax.set_ylim(0,)
plt.tight_layout()
plt.show()

## 7. Boxplots for categorical features
Price distribution across different categories like GPU, OS, RAM, etc.

In [None]:
categorical_features = ["Category", "GPU", "RAM_GB", "Storage_GB_SSD", "CPU_core", "OS"]
fig, axes = plt.subplots(2, 3, figsize=(18,10))
axes = axes.flatten()
for ax, col in zip(axes, categorical_features):
    sns.boxplot(x=col, y="Price", data=df, ax=ax)
    ax.set_title(f"Price Distribution by {col}")
    ax.tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()

## 8. Grouped Heatmap: Average Price by GPU and CPU Core Count

In [None]:
grouped = df.groupby(['GPU', 'CPU_core'])['Price'].mean().unstack()
plt.figure(figsize=(10,6))
sns.heatmap(grouped, annot=True, fmt='.0f', cmap='RdBu_r', center=grouped.mean().mean())
plt.title('Average Price by GPU and CPU Core Count')
plt.ylabel('GPU')
plt.xlabel('CPU Core Count')
plt.tight_layout()
plt.show()

## 9. Pearson Correlations with Price
Calculating Pearson correlation coefficients between Price and other features.

In [None]:
print("--- Pearson Correlations with Price ---")
for param in numeric_features + categorical_features:
    if df[param].dtype == 'object':
        encoded = pd.factorize(df[param])[0]
        coef, p_val = stats.pearsonr(encoded, df['Price'])
    else:
        coef, p_val = stats.pearsonr(df[param], df['Price'])
    print(f"{param}: Correlation = {coef:.3f}, p-value = {p_val:.3g}")