# ⚽ FIFA Player Dashboard with Machine Learning

This project is a web-based dashboard built with Django that uses a machine learning model (XGBoost Regressor) to predict FIFA players’ overall ratings. It visualizes player statistics, allows exploration by position, and evaluates model performance using real data.

---

## 📊 Dataset Overview

- **Source**: [FIFA 21 dataset from Kaggle](https://www.kaggle.com/stefanoleone992/fifa-21-complete-player-dataset)
- **Records**: ~18,000 player entries
- **Features Used**:
  - Age, Height, Weight
  - In-game stats: Pace, Shooting, Passing, Dribbling, Defending, Physic
  - Categorical: Preferred Foot, Work Rate, Player Positions
- **Target**: `overall` (overall rating)

---

## 🧹 Preprocessing Steps

1. **Handling Missing Values**:
   - Rows missing key numeric or target fields were dropped.

2. **Feature Engineering**:
   - `preferred_foot`: Encoded as binary (Right = 1, Left = 0)
   - `work_rate` and `player_positions`: Encoded using simple hashing
   - All features scaled naturally (XGBoost handles scaling internally)

3. **Splitting Data**:
   - `train_test_split` (80/20 ratio)

---

## 🤖 Model Architecture

- **Model Used**: `XGBoost Regressor`
- **Reason**: Efficient with tabular data, handles missing values, good out-of-the-box performance
- **Parameters**: Default `XGBRegressor()` settings for initial development

---

## 📈 Training Results

| Metric         | Score |
|----------------|-------|
| Train RMSE     | ~2.34 |
| Test RMSE      | ~2.35 |
| Train R² Score | 0.89  |
| Test R² Score  | 0.88  |
| Bias-Variance Status | Good Fit (Low Bias, Low Variance) |

> Note: These values are dynamically calculated and may vary slightly depending on the random train-test split.

---

## 🔒 Authentication

Authentication was added using Django’s built-in authentication system:

- Users must log in to access the dashboard
- `/login/`, `/logout/`, and `/register/` routes were created
- `@login_required` decorator used to protect dashboard views

---

## 🔧 Integration Steps

1. **Model Training**:  
   Train `XGBRegressor` on selected features and save using `joblib`:

   ```python
   from xgboost import XGBRegressor
   from joblib import dump

   model = XGBRegressor()
   model.fit(X_train, y_train)
   dump(model, 'xgboost_fifa_model.pkl')


In [4]:
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import LabelEncoder
import pickle

# Load dataset
df = pd.read_csv(r"C:\Users\madz\Documents\GitHub\appdev\playerperformance\datasets\players_15.csv")

# Drop rows with missing essential features or target
df = df.dropna(subset=['overall', 'preferred_foot', 'work_rate', 'player_positions'])

# Simplify player positions
def simplify_position(pos):
    if any(p in pos for p in ['CB', 'LB', 'RB', 'LWB', 'RWB', 'CDM']):
        return 'DEF'
    elif any(p in pos for p in ['CM', 'CAM', 'RM', 'LM']):
        return 'MID'
    else:
        return 'ATT'

df['player_position_group'] = df['player_positions'].apply(simplify_position)

# Encode categorical features
le_foot = LabelEncoder()
df['preferred_foot_enc'] = le_foot.fit_transform(df['preferred_foot'])

le_work = LabelEncoder()
df['work_rate_enc'] = le_work.fit_transform(df['work_rate'])

le_pos = LabelEncoder()
df['position_group_enc'] = le_pos.fit_transform(df['player_position_group'])

# Define features and target
features = ['age', 'height_cm', 'weight_kg', 'pace', 'shooting', 'passing',
            'dribbling', 'defending', 'physic',
            'preferred_foot_enc', 'work_rate_enc', 'position_group_enc']
target = 'overall'

# Apply IQR filtering for numeric features
numeric_features = ['age', 'height_cm', 'weight_kg', 'pace', 'shooting', 'passing',
                    'dribbling', 'defending', 'physic']

for feature in numeric_features:
    Q1 = df[feature].quantile(0.25)
    Q3 = df[feature].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    df = df[(df[feature] >= lower_bound) & (df[feature] <= upper_bound)]

# Split data
X = df[features]
y = df[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train XGBoost model
model = xgb.XGBRegressor(n_estimators=200, learning_rate=0.1, max_depth=6, random_state=42)
model.fit(X_train, y_train)

# Evaluate on test set
preds = model.predict(X_test)
rmse = mean_squared_error(y_test, preds) ** 0.5
r2 = r2_score(y_test, preds)

# Evaluate on training set
train_preds = model.predict(X_train)
train_rmse = mean_squared_error(y_train, train_preds) ** 0.5
train_r2 = r2_score(y_train, train_preds)

# Print metrics
print("\n📊 Training Performance:")
print(f"   RMSE: {train_rmse:.2f}")
print(f"   R²:   {train_r2:.4f}")

print("\n📈 Testing Performance:")
print(f"   RMSE: {rmse:.2f}")
print(f"   R²:   {r2:.4f}")

# Bias-variance insight
print("\n📋 Bias-Variance Check:")
if abs(train_r2 - r2) > 0.15:
    if train_r2 > r2:
        print("⚠️ High variance detected (potential overfitting)")
    else:
        print("⚠️ High bias detected (potential underfitting)")
else:
    print("✅ Bias-variance is balanced")

# Save model
with open("xgboost_fifa_model.pkl", "wb") as f:
    pickle.dump(model, f)

print("\n✅ Model saved as 'xgboost_fifa_model.pkl'")



📊 Training Performance:
   RMSE: 1.06
   R²:   0.9768

📈 Testing Performance:
   RMSE: 1.47
   R²:   0.9569

📋 Bias-Variance Check:
✅ Bias-variance is balanced

✅ Model saved as 'xgboost_fifa_model.pkl'
