## 3. Category-Based Rating and Price Analysis

### Step 1: Import Libraries and Load Data

In [2]:
import pandas as pd
import plotly.express as px
import numpy as np

# Load dataset (replace 'dataset.csv' with your file path)
df = pd.read_csv('Google-Playstore-Preprocessed.csv')

# Display dataset overview
print("Dataset Overview:")
print(df.info())
print("\nSample Data:")
print(df.head())


Dataset Overview:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1248021 entries, 0 to 1248020
Data columns (total 19 columns):
 #   Column            Non-Null Count    Dtype  
---  ------            --------------    -----  
 0   Unnamed: 0        1248021 non-null  int64  
 1   App Name          1248021 non-null  object 
 2   App Id            1248021 non-null  object 
 3   Category          1248021 non-null  object 
 4   Rating            1248021 non-null  float64
 5   Rating Count      1248021 non-null  float64
 6   Minimum Installs  1248021 non-null  float64
 7   Maximum Installs  1248021 non-null  int64  
 8   Free              1248021 non-null  bool   
 9   Price             1248021 non-null  float64
 10  Currency          1248021 non-null  object 
 11  Size              1248021 non-null  float64
 12  Minimum Android   1248021 non-null  object 
 13  Released          1248021 non-null  object 
 14  Last Updated      1248021 non-null  object 
 15  Content Rating    1248021 non-n

### Step 2: Create Treemap Visualization

In [3]:
# Initial Treemap

# Create treemap for category-based analysis
fig = px.treemap(
    df,
    path=['Category'],  # Group by category
    values='Price',     # Size of blocks determined by price
    color='Rating',     # Color based on ratings
    title="Category-Based Price and Rating Analysis",
    color_continuous_scale='viridis'
)
fig.update_traces(textinfo='label+value+percent entry')
fig.show()

### Step 3: Feedback Loop for Transformation

In [4]:
# Handle Outliers and Inconsistencies

# Identify potential issues in Price column (e.g., extreme values)
print("\nPrice Summary:")
print(df['Price'].describe())

# Provide user options for transformation
print("\nFeedback Options:")
print("1. Remove outliers in the 'Price' column (above 99th percentile).")
print("2. Normalize the 'Price' column (min-max scaling).")
print("3. Proceed without transformation.")

# User input
choice = int(input("Enter your choice (1, 2, or 3): "))

if choice == 1:
    # Remove outliers in Price
    price_cap = df['Price'].quantile(0.99)
    df = df[df['Price'] <= price_cap]
    print(f"Outliers removed. Price capped at 99th percentile: {price_cap:.2f}")
elif choice == 2:
    # Normalize Price column
    min_price = df['Price'].min()
    max_price = df['Price'].max()
    df['Price'] = (df['Price'] - min_price) / (max_price - min_price)
    print(f"Price column normalized (min: {min_price:.2f}, max: {max_price:.2f}).")
elif choice == 3:
    print("Proceeding without transformation.")
else:
    print("Invalid choice. No transformations applied.")


Price Summary:
count    1.248021e+06
mean     1.062943e-01
std      2.414451e+00
min      0.000000e+00
25%      0.000000e+00
50%      0.000000e+00
75%      0.000000e+00
max      4.000000e+02
Name: Price, dtype: float64

Feedback Options:
1. Remove outliers in the 'Price' column (above 99th percentile).
2. Normalize the 'Price' column (min-max scaling).
3. Proceed without transformation.
Price column normalized (min: 0.00, max: 400.00).


### Step 4: Re-Visualize Treemap Post-Transformation

In [5]:
# Create updated treemap
fig = px.treemap(
    df,
    path=['Category'],  # Group by category
    values='Price',     # Size of blocks determined by price
    color='Rating',     # Color based on ratings
    title="Updated Category-Based Price and Rating Analysis",
    color_continuous_scale='plasma'
)
fig.update_traces(textinfo='label+value+percent entry')
fig.show()

### Step 5: Sunburst Visualization (Optional Alternative)

In [6]:
# Create sunburst chart
fig = px.sunburst(
    df,
    path=['Category'],  # Group by category
    values='Price',     # Size of blocks determined by price
    color='Rating',     # Color based on ratings
    title="Category-Based Price and Rating Analysis (Sunburst)",
    color_continuous_scale='magma'
)
fig.update_traces(textinfo='label+percent parent+percent entry')
fig.show()
