# 📊 Comprehensive Data Science Analysis

This notebook provides a complete analysis of the NFT Ticketing Platform's Data Science module.
It explores the synthetic data, visualizes features, and evaluates the trained models.

## Modules Covered:
1. **Scalping Detection**
2. **Risk Scoring**
3. **Bot Detection**
4. **Market Trend Prediction**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
import os
import sys

# Configure plotting
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

# Paths
DATA_DIR = "../data/processed"
ARTIFACTS_DIR = "../artifacts"

print("Libraries loaded successfully!")

## 1. 🎫 Scalping Detection Analysis
Analyzing ticket resale patterns to detect scalping behavior.

In [None]:
# Load Data
scalping_df = pd.read_csv(os.path.join(DATA_DIR, "scalping_features.csv"))
print(f"Loaded {len(scalping_df)} ticket records")
scalping_df.head()

In [None]:
# Visualize Price Differences
plt.figure(figsize=(10, 6))
sns.histplot(data=scalping_df, x='price_diff', bins=20, kde=True, color='orange')
plt.title('Distribution of Price Markups (Resale Price - Base Price)')
plt.xlabel('Price Difference ($)')
plt.ylabel('Count')
plt.show()

## 2. 🚨 Risk & Bot Detection Analysis
Analyzing user transaction patterns to identify high-risk users and potential bots.

In [None]:
# Load Data
risk_df = pd.read_csv(os.path.join(DATA_DIR, "user_risk_features.csv"))
print(f"Loaded {len(risk_df)} user profiles")
risk_df.head()

In [None]:
# Visualize User Velocity vs Amount
plt.figure(figsize=(10, 6))
sns.scatterplot(data=risk_df, x='velocity', y='avg_amount', size='total_amount', hue='fraud_rate', sizes=(20, 200))
plt.title('User Transaction Velocity vs Average Amount')
plt.xlabel('Velocity (Tx/Hour)')
plt.ylabel('Average Transaction Amount ($)')
plt.show()

## 3. 📈 Market Trend Analysis
Analyzing daily transaction volume to predict market trends.

In [None]:
# Load Data
trend_df = pd.read_csv(os.path.join(DATA_DIR, "market_trend_features.csv"))
trend_df['date'] = pd.to_datetime(trend_df['date'])
trend_df = trend_df.sort_values('date')

# Plot Volume
plt.figure(figsize=(12, 6))
plt.plot(trend_df['date'], trend_df['total_volume'], marker='o', linestyle='-', color='purple')
plt.title('Daily Transaction Volume Trend')
plt.xlabel('Date')
plt.ylabel('Total Volume ($)')
plt.grid(True, alpha=0.3)
plt.show()

## 4. 🤖 Model Artifacts Inspection
Checking the trained model files.

In [None]:
print("Available Model Artifacts:")
for f in os.listdir(ARTIFACTS_DIR):
    if f.endswith('.joblib'):
        size_kb = os.path.getsize(os.path.join(ARTIFACTS_DIR, f)) / 1024
        print(f"- {f:<30} ({size_kb:.1f} KB)")