# FIFA 21 Player Performance Prediction

## Dataset Overview
- Source: [FIFA 21 complete player dataset](https://www.kaggle.com/datasets/stefanoleone992/fifa-21-complete-player-dataset?select=players_21.csv)
- Total features: Player physical, technical, value, and movement stats
- Target: `overall` performance rating

## Preprocessing
- Dropped irrelevant columns (e.g. URLs, names, in-game traits)
- Removed goalkeeper-specific features
- One-hot encoded `preferred_foot` and `nationality`
- Removed missing values

## Model
- Algorithm: XGBoost Regressor
- Parameters: `n_estimators=200`, `max_depth=6`, `learning_rate=0.1`
- Train-test split: 80/20

## Results
- **R² Score**: 0.9973
- **RMSE**: 0.353

## Model Saved
- Format: Pickle (`xgb_player_performance_model.pkl`)


In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
from math import sqrt

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from xgboost import XGBRegressor

# Load data
df = pd.read_csv('players_21.csv')

# Drop unnecessary columns
columns_to_drop = [
    'sofifa_id', 'player_url', 'short_name', 'long_name', 'dob',
    'club_name', 'league_name', 'team_position', 'team_jersey_number',
    'loaned_from', 'joined', 'contract_valid_until',
    'nation_position', 'nation_jersey_number',
    'real_face', 'body_type', 'player_tags', 'player_traits',
    'ls', 'st', 'rs', 'lw', 'lf', 'cf', 'rf', 'rw',
    'lam', 'cam', 'ram', 'lm', 'lcm', 'cm', 'rcm', 'rm',
    'lwb', 'ldm', 'cdm', 'rdm', 'rwb', 'lb', 'lcb', 'cb', 'rcb', 'rb'
]
df.drop(columns=columns_to_drop, axis=1, inplace=True)

# Fill missing numerical data with median
df.fillna(df.median(numeric_only=True), inplace=True)

# One-hot encode categorical features
df = pd.get_dummies(df, columns=['preferred_foot', 'work_rate'], drop_first=True)

# Use only outfield players (optional)
df = df[df['player_positions'].notna()]
df = df[~df['player_positions'].str.contains('GK')]

# Drop 'player_positions' after filtering
df.drop('player_positions', axis=1, inplace=True)

# Fill remaining missing values (if any)
df.fillna(0, inplace=True)

# Target variable
target = 'overall'
X = df.drop(columns=[target])
y = df[target]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Ensure all features are numeric
X_train = X_train.apply(pd.to_numeric, errors='coerce')
X_test = X_test.apply(pd.to_numeric, errors='coerce')

# Fill any resulting NaNs
X_train.fillna(0, inplace=True)
X_test.fillna(0, inplace=True)

# Train XGBoost model
model = XGBRegressor(n_estimators=200, learning_rate=0.1, max_depth=6, random_state=42)
model.fit(X_train, y_train)


# Predict
y_pred = model.predict(X_test)

# Evaluation
r2 = r2_score(y_test, y_pred)
rmse = sqrt(mean_squared_error(y_test, y_pred))

print(f"✅ R² Score: {r2:.4f}")
print(f"📉 RMSE: {rmse:.4f}")

# Save model using pickle
with open('xgb_fifa21_model.pkl', 'wb') as file:
    pickle.dump(model, file)

print("💾 Model saved as 'xgb_fifa21_model.pkl'")


✅ R² Score: 0.9973
📉 RMSE: 0.3530
💾 Model saved as 'xgb_fifa21_model.pkl'
