# üö¶ SRIG ‚Äî Civic Score Dataset Generator
### Smart Road India Grid | Model 3 ‚Äî TabTransformer
**Yash Jani | Red and White Institute, Surat**

---
Is notebook mein hum **10,000 drivers ka synthetic dataset** banayenge jo SRIG ke Civic Score system ko train karne ke liye use hoga.

## üì¶ Step 1 ‚Äî Libraries Import Karo

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

print('‚úÖ Libraries successfully import ho gayi!')

## üé≤ Step 2 ‚Äî Raw Features Generate Karo

Hum 10,000 drivers ke liye realistic data banayenge jaise:
- Helmet compliance
- Signal violations
- Speed violations
- School zone violations
- aur bahut kuch...

In [None]:
np.random.seed(42)
NUM_DRIVERS = 10000

driver_ids             = [f'DL-{str(i).zfill(5)}' for i in range(1, NUM_DRIVERS + 1)]
helmet_compliance      = np.random.randint(0, 31, NUM_DRIVERS)
signal_violations      = np.random.randint(0, 20, NUM_DRIVERS)
speed_violations       = np.random.randint(0, 15, NUM_DRIVERS)
school_zone_violations = np.random.randint(0, 10, NUM_DRIVERS)
drunk_driving_cases    = np.random.randint(0, 3,  NUM_DRIVERS)
accident_history       = np.random.randint(0, 5,  NUM_DRIVERS)
years_driving          = np.random.randint(1, 40, NUM_DRIVERS)
training_completed     = np.random.randint(0, 2,  NUM_DRIVERS)
complaints_filed       = np.random.randint(0, 10, NUM_DRIVERS)
city_tier              = np.random.choice([1, 2, 3], NUM_DRIVERS)

print(f'‚úÖ {NUM_DRIVERS} drivers ka raw data generate ho gaya!')

## üßÆ Step 3 ‚Äî Civic Score Calculate Karo

SRIG ke rules ke according score calculate hoga:
- ‚úÖ Helmet compliance ‚Üí **+10 points/day**
- ‚ùå Signal violation ‚Üí **-30 points**
- ‚ùå School zone violation ‚Üí **-40 points**
- ‚ùå Drunk driving ‚Üí **-100 points**

In [None]:
score = np.full(NUM_DRIVERS, 500.0)  # Base score 500

score += helmet_compliance      * 10
score -= signal_violations      * 30
score -= speed_violations       * 25
score -= school_zone_violations * 40
score -= drunk_driving_cases    * 100
score -= accident_history       * 50
score += years_driving          * 5
score += training_completed     * 50
score -= complaints_filed       * 20
score -= (city_tier - 1)        * 30

# 0 se 1000 ke beech clip karo
score = np.clip(score, 0, 1000)

print(f'‚úÖ Civic Score calculate ho gaya!')
print(f'   Min Score : {score.min():.0f}')
print(f'   Max Score : {score.max():.0f}')
print(f'   Avg Score : {score.mean():.0f}')

## üèÖ Step 4 ‚Äî Risk Level / Tier Assign Karo

| Score Range | Tier |
|---|---|
| 900‚Äì1000 | ü•á PLATINUM |
| 750‚Äì899 | ü•à GOLD |
| 600‚Äì749 | ü•â SILVER |
| 400‚Äì599 | üî¥ BRONZE |
| 0‚Äì399 | ‚ö†Ô∏è HIGH_RISK |

In [None]:
def assign_risk(s):
    if s >= 900:   return 'PLATINUM'
    elif s >= 750: return 'GOLD'
    elif s >= 600: return 'SILVER'
    elif s >= 400: return 'BRONZE'
    else:          return 'HIGH_RISK'

risk_labels = [assign_risk(s) for s in score]

print('‚úÖ Risk levels assign ho gaye!')

## üìã Step 5 ‚Äî DataFrame Banao

In [None]:
df = pd.DataFrame({
    'driver_id':              driver_ids,
    'helmet_compliance_days': helmet_compliance,
    'signal_violations':      signal_violations,
    'speed_violations':       speed_violations,
    'school_zone_violations': school_zone_violations,
    'drunk_driving_cases':    drunk_driving_cases,
    'accident_history':       accident_history,
    'years_driving':          years_driving,
    'training_completed':     training_completed,
    'complaints_filed':       complaints_filed,
    'city_tier':              city_tier,
    'civic_score':            score.round(2),
    'risk_level':             risk_labels
})

print(f'‚úÖ DataFrame ready!')
df.head(10)

## üìä Step 6 ‚Äî Data Visualize Karo

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
fig.suptitle('SRIG ‚Äî Civic Score Dataset Analysis', fontsize=16, fontweight='bold')

# Plot 1: Risk Level Distribution
risk_counts = df['risk_level'].value_counts()
colors = ['#FFD700', '#C0C0C0', '#CD7F32', '#FF4444', '#4444FF']
axes[0].bar(risk_counts.index, risk_counts.values, color=colors[:len(risk_counts)])
axes[0].set_title('Risk Level Distribution')
axes[0].set_xlabel('Risk Level')
axes[0].set_ylabel('Number of Drivers')
axes[0].tick_params(axis='x', rotation=15)

# Plot 2: Civic Score Distribution
axes[1].hist(df['civic_score'], bins=50, color='steelblue', edgecolor='white')
axes[1].set_title('Civic Score Distribution')
axes[1].set_xlabel('Civic Score (0-1000)')
axes[1].set_ylabel('Number of Drivers')
axes[1].axvline(df['civic_score'].mean(), color='red', linestyle='--', label=f'Mean: {df["civic_score"].mean():.0f}')
axes[1].legend()

# Plot 3: Helmet Compliance vs Score
axes[2].scatter(df['helmet_compliance_days'], df['civic_score'], alpha=0.1, color='green', s=5)
axes[2].set_title('Helmet Compliance vs Civic Score')
axes[2].set_xlabel('Helmet Compliance Days')
axes[2].set_ylabel('Civic Score')

plt.tight_layout()
plt.savefig('data/dataset_analysis.png', dpi=150, bbox_inches='tight')
plt.show()
print('‚úÖ Charts save ho gaye: data/dataset_analysis.png')

## üìà Step 7 ‚Äî Dataset Stats Dekho

In [None]:
print('üìä Risk Level Distribution:')
print(df['risk_level'].value_counts())
print('\nüìà Civic Score Statistics:')
print(df['civic_score'].describe().round(2))
print('\nüî¢ Feature Statistics:')
df.drop(columns=['driver_id', 'risk_level']).describe().round(2)

## üíæ Step 8 ‚Äî CSV File Save Karo

In [None]:
os.makedirs('data', exist_ok=True)
df.to_csv('data/civic_score_dataset.csv', index=False)

print('üéâ Dataset successfully save ho gaya!')
print(f'   üìÅ Location : data/civic_score_dataset.csv')
print(f'   üìä Total Rows : {len(df):,}')
print(f'   üìã Total Columns : {len(df.columns)}')
print('\n‚úÖ Ab train_model.py / train_model.ipynb run karo!')