# Project Overview

This project involves an in-depth analysis of the FIFA 21 dataset, focusing on various aspects of player and club performance, financial metrics, and potential. The dataset includes detailed information about players' attributes, market values, wages, and other relevant data points. By leveraging this data, we aim to uncover insights into player performance, market trends, and financial efficiency within the world of football.

# Objective

The primary objective of this project is to analyze and visualize the FIFA 21 dataset to gain meaningful insights into the following areas:

1. **Player Market Value and Wages**: Understand the distribution of player market values and wages, identify key factors influencing these metrics, and highlight the most valuable and highest-paid players.
2. **Club Financial Efficiency**: Evaluate the cost efficiency of clubs by comparing their market value to the wages they pay, identifying the most and least cost-efficient clubs.
3. **Player Potential and Development**: Identify young players with high potential and analyze which clubs are most effective at developing young talent.
4. **Correlation Analysis**: Explore the relationships between various player attributes (e.g., overall rating, potential, skills) and their market value or wages.
5. **Comparative Analysis**: Compare different clubs and countries based on average player market value, wages, and other financial metrics to understand regional and club-level trends.

By achieving these objectives, we aim to provide valuable insights for football clubs, scouts, analysts, and enthusiasts, helping them make informed decisions based on data-driven analysis.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

# Load the dataset with low_memory set to False
fifa_data = pd.read_csv("fifa21_male2.csv", low_memory=False)

# Check the first few rows of the dataset to confirm it's loaded properly
print(fifa_data.head())

# Check data types and the presence of any missing values
print(fifa_data.info())

# Check for missing values in each column
print(fifa_data.isnull().sum())

# Handling missing values
fifa_data.fillna(fifa_data.mean(numeric_only=True), inplace=True)

# Fill missing categorical columns with the mode (most frequent value)
categorical_cols = fifa_data.select_dtypes(include=['object']).columns
for col in categorical_cols:
    if fifa_data[col].isnull().sum() > 0:
        fifa_data[col].fillna(fifa_data[col].mode()[0], inplace=True)

# Check for remaining missing values
missing_values = fifa_data.isnull().sum()
missing_values = missing_values[missing_values > 0]
print("Remaining Missing Values:\n", missing_values)

# Drop columns with more than 30% missing values
threshold = 0.3 * len(fifa_data)
fifa_data.dropna(thresh=threshold, axis=1, inplace=True)

# Final check for missing values after handling
print("Missing values after processing:\n", fifa_data.isnull().sum())


In [None]:
# Fill missing categorical values with 'Unknown' using direct assignment (recommended method)
fifa_data['Club'] = fifa_data['Club'].fillna('Unknown')
fifa_data['Position'] = fifa_data['Position'].fillna('Unknown')

In [None]:
skill_cols = ['Volleys', 'Curve', 'Agility', 'Balance', 'Jumping', 'Vision', 'Composure']
for col in skill_cols:
    if col in fifa_data.columns:  # Ensure column exists before modifying
        fifa_data[col] = fifa_data[col].fillna(fifa_data[col].median(numeric_only=True))

In [None]:
def convert_currency(value):
    """
    Convert FIFA currency values (€100M, €500K) into numeric format.
    Example: '€100M' -> 100,000,000 | '€500K' -> 500,000
    """
    if isinstance(value, str):
        value = value.replace('€', '')  # Remove the euro sign
        if 'M' in value:
            return float(value.replace('M', '')) * 1e6  # Convert 'M' to millions
        elif 'K' in value:
            return float(value.replace('K', '')) * 1e3  # Convert 'K' to thousands
    try:
        return float(value)  # Convert numeric strings to float
    except ValueError:
        return np.nan  # Assign NaN if conversion fails

# Convert financial columns to numeric format using direct assignment
financial_cols = ['Value', 'Wage', 'Release Clause']
for col in financial_cols:
    if col in fifa_data.columns:  # Ensure column exists before modifying
        fifa_data[col] = fifa_data[col].apply(convert_currency)

# Verify the changes
print(fifa_data[financial_cols].head())

## Inference from the Distribution of Player Market Value (€ Millions)

The histogram reveals several key insights about the distribution of player market values:

- **Skewed Distribution**: The distribution is highly skewed to the right, with a majority of players having relatively low market values compared to a small number of top-tier players with very high values.
- **Peak around Lower Values**: There is a noticeable concentration of players with market values in the lower ranges, indicating that most players in the dataset have modest market values.
- **Tail of High-Value Players**: The right tail of the histogram suggests a small but significant number of players with exceptionally high market values, likely representing world-class or superstar players.
- **Market Value Range**: The dataset spans a wide range of market values, with a few players being valued in the tens or even hundreds of millions of euros, while the majority are valued much lower.

Overall, the graph highlights the unequal distribution of market values, where a small group of players drive the highest market value figures.


In [None]:
 

# Set a style for better-looking graphs
sns.set_style("darkgrid")

# Convert Market Value to Millions for better readability
fifa_data['Value_Millions'] = fifa_data['Value'] / 1e6  

# Interactive Histogram: Distribution of Player Market Values (in Millions)
fig = px.histogram(fifa_data, x='Value_Millions', nbins=50, title='Distribution of Player Market Value (€ Millions)', color_discrete_sequence=['blue'])

# Update layout for better readability
fig.update_layout(
    xaxis_title='Market Value (€ Millions)',
    yaxis_title='Number of Players',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


## 📉 KDE Plot of Player Market Value (€ Millions)

This kernel density estimate (KDE) plot provides a smoothed visualization of the distribution of player market values in millions of euros. Unlike the histogram, the KDE plot represents the probability density function of the market values, helping to better visualize the underlying distribution without the noise of individual bins.

### Key Features:
- **X-axis**: Represents the market value of players in millions of euros.
- **Y-axis**: Represents the density of players at each market value, showing where most players are concentrated.
- The shaded area highlights the overall distribution of market values, with peaks indicating areas where player values are most common.

## Inferences from the KDE Plot:

- **Skewed Distribution**: The plot clearly shows a right-skewed distribution, with the highest density concentrated on the left (lower market values) and a gradual decline as market values increase. This suggests that most players have lower market values, while only a few have exceptionally high values.
  
- **Main Peak**: The peak towards the lower end of the graph indicates that most players are valued in the lower ranges, likely reflecting the broader pool of players with modest market values.
  
- **Long Tail**: The tail on the right side of the plot suggests the presence of a small but significant number of high-value players. These players, likely representing top-tier talent, stretch the distribution to the higher end, though they are less common.
  
- **Smooth Representation**: The KDE plot provides a smoother, more continuous view of the distribution, allowing us to see trends in the data more clearly than in a histogram. This is especially useful for understanding the density of market values over a continuous range.
  
In summary, the KDE plot reinforces the idea that while most players have relatively low market values, there is a noticeable concentration of high-value players that represent the outliers in the dataset.


In [None]:
 

fig = px.density_contour(fifa_data, x='Value_Millions', title='KDE Plot of Player Market Value (€ Millions)')
fig.update_traces(contours_coloring="fill", colorscale="Viridis")
fig.update_layout(
    xaxis_title='Market Value (€ Millions)',
    yaxis_title='Density',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)
fig.show()



 The scatter plot shows the relationship between player age and market value.# There is a wide range of market values across different ages, with no clear linear trend.
 Younger players tend to have a higher market value, but there are valuable players across all age groups.

In [None]:
# Create an interactive scatter plot using Plotly
fig = px.scatter(fifa_data, x='Age', y='Value', 
                 title='Market Value vs. Age',
                 labels={'Age': 'Age', 'Value': 'Market Value (€)'},
                 opacity=0.6, color_discrete_sequence=['purple'])

# Show the plot
fig.show()


## 🏆 Top 10 Most Expensive Players in FIFA Dataset

The bar plot visualizes the top 10 most expensive players in the FIFA dataset based on their market value. Key observations include:

- **High Market Value**: The players listed have significantly high market values, indicating their importance and skill level in the game.
- **Age and Club Representation**: The plot also highlights the age and club representation of these top players, providing insights into which clubs have the most valuable players and the age distribution among the top-tier players.
- **Nationality**: The inclusion of nationality helps in understanding the diversity and global representation among the most expensive players.

Overall, this visualization provides a clear picture of the elite players in the FIFA dataset, showcasing their market value and associated attributes.


In [None]:
 

# Extract the Top 10 Most Expensive Players
top_10_expensive = fifa_data[['Name', 'Age', 'Club', 'Nationality', 'Value']].nlargest(10, 'Value')

# Display results
print("Top 10 Most Expensive Players in FIFA Dataset")
print(top_10_expensive)

# Interactive Bar Plot for Visualization
fig = px.bar(top_10_expensive, x='Value', y='Name', orientation='h', 
             title='Top 10 Most Expensive Players (€)', 
             labels={'Value': 'Market Value (€)', 'Name': 'Player Name'},
             color='Value', color_continuous_scale='Viridis')

# Update layout for better readability
fig.update_layout(
    xaxis_title='Market Value (€)',
    yaxis_title='Player Name',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


## 🔥Correlation Between Player Skills & Market Value (€ Millions)
Overall rating & potential strongly influence market value.
Pace, shooting, passing, & dribbling show weaker correlations.
Defending & physical attributes have moderate impact.
Market value is mainly driven by overall skill & future potential. 

In [None]:
 
 

# Ensure Value is converted to millions (if not done already)
fifa_data['Value_Millions'] = fifa_data['Value'] / 1e6  

# Select relevant numeric attributes for correlation analysis
skill_columns = ['OVA', 'POT', 'PAC', 'SHO', 'PAS', 'DRI', 'DEF', 'PHY', 'Value_Millions']

# Compute correlation matrix
correlation_matrix = fifa_data[skill_columns].corr()

# Convert correlation matrix to long format
correlation_long = correlation_matrix.reset_index().melt(id_vars='index')

# Rename columns for better readability
correlation_long.columns = ['Skill1', 'Skill2', 'Correlation']

# Interactive Heatmap: Correlation Between Player Skills & Market Value
fig = px.imshow(correlation_matrix, 
                labels=dict(x="Skills", y="Skills", color="Correlation"),
                x=skill_columns,
                y=skill_columns,
                color_continuous_scale='RdBu',  # Use 'RdBu' instead of 'coolwarm'
                zmin=-1, zmax=1,
                title='🔥 Correlation Between Player Skills & Market Value (€ Millions)')

# Show the plot
fig.show()


## 📈 Market Value vs. Overall Rating (in € Millions)

This scatter plot shows the relationship between player overall ratings (OVA) and their market value in millions of euros.

### Inferences:
- **Positive Correlation**: Higher overall ratings (OVA) tend to correspond with higher market values.
- **Outliers**: Some players have high market values despite lower ratings, likely due to factors like popularity or potential.
- **Concentration**: Most players with lower ratings have lower market values, with fewer players at higher ratings commanding higher values.
  
Overall, overall rating is a strong predictor of market value, though other factors also play a role.


In [None]:
 

# Create an interactive scatter plot using Plotly
fig = px.scatter(fifa_data, x='OVA', y='Value_Millions', 
                 title='Market Value vs. Overall Rating (in € Millions)',
                 labels={'OVA': 'Overall Rating', 'Value_Millions': 'Market Value (€ Millions)'},
                 opacity=0.6, color_discrete_sequence=['red'])

# Show the plot
fig.show()


## 🚀 Top 10 Players Based on Potential

This bar plot highlights the top 10 players with the highest potential ratings in the dataset. Potential (POT) is a key factor in assessing a player's future growth and value.

### Inferences:
- **High Potential Players**: The top players by potential show significant future promise, with higher potential ratings indicating players expected to improve substantially.
- **Player Distribution**: Players from various clubs feature in the top 10, showing that high potential is not restricted to players from top clubs alone.
  
This plot emphasizes the players with the greatest expected growth and their potential to rise in market value.


In [None]:
 

# Extract the Top 10 Players by Potential
top_10_potential = fifa_data[['Name', 'Age', 'Club', 'POT', 'Value']].nlargest(10, 'POT')

# Interactive Bar Plot: Top 10 Players by Potential
fig = px.bar(top_10_potential, x='POT', y='Name', orientation='h', 
             title='Top 10 Players Based on Potential', 
             labels={'POT': 'Potential Rating', 'Name': 'Player Name'},
             color='POT', color_continuous_scale='Viridis')  # Change 'coolwarm' to 'Viridis'

# Update layout for better readability
fig.update_layout(
    xaxis_title='Potential Rating',
    yaxis_title='Player Name',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


## 📊 Market Value Comparison: Young vs. Experienced Players (€ Millions)

This box plot compares the market values of players categorized into three age groups: **Young (U23)**, **Prime (23-29)**, and **Experienced (30+)**.

### Inferences:
- **Young Players (U23)**: Generally have lower market values, with some outliers showing higher values, possibly due to high potential.
- **Prime Players (23-29)**: This group tends to have the highest median market value, reflecting peak performance and market demand.
- **Experienced Players (30+)**: Market values for this group are more varied, with some high-value outliers but a generally lower range than prime-aged players.

The plot highlights how age impacts market value, with prime-aged players typically commanding higher prices.


In [None]:
 

# Categorize players into Young (U23), Prime (23-29), & Experienced (30+)
fifa_data['Player_Category'] = fifa_data['Age'].apply(
    lambda x: 'Young (U23)' if x < 23 else ('Experienced (30+)' if x >= 30 else 'Prime (23-29)')
)

# Interactive Box Plot: Market Value of Young vs. Experienced Players
fig = px.box(fifa_data, x='Player_Category', y='Value_Millions', color='Player_Category',
             title='Market Value Comparison: Young vs. Experienced Players (€ Millions)',
             labels={'Player_Category': 'Player Category', 'Value_Millions': 'Market Value (€ Millions)'},
             color_discrete_sequence=px.colors.qualitative.Set2)

# Show the plot
fig.show()


## 💰 Wage Distribution of FIFA Players (€ Millions)

This histogram visualizes the distribution of FIFA players' wages in millions of euros, helping to understand how player salaries are spread across the dataset.

### Inferences:
- **Skewed Distribution**: The wage distribution is right-skewed, with most players earning lower wages and only a small number receiving significantly higher salaries.
- **Peak at Lower Wages**: The majority of players fall within the lower wage brackets, while only a few players command high wages.
- **Outliers**: A few players, likely top stars, have much higher wages compared to the rest, stretching the distribution to the right.

Overall, the plot shows that most players earn modest wages, with a small group of high earners driving the higher end of the distribution.


In [None]:
 

# Convert Wage to Millions for better readability
fifa_data['Wage_Millions'] = fifa_data['Wage'] / 1e6

# Interactive Histogram: Distribution of Player Wages
fig = px.histogram(fifa_data, x='Wage_Millions', nbins=50, title='💰 Wage Distribution of FIFA Players (€ Millions)', color_discrete_sequence=['blue'])

# Update layout for better readability
fig.update_layout(
    xaxis_title='Wage (€ Millions)',
    yaxis_title='Number of Players',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


## 💰 Wage vs. Overall Rating (€ Millions per Week)

This scatter plot shows the relationship between player overall ratings (OVA) and their weekly wages in millions of euros.

### Inferences:
- **Positive Correlation**: There’s a general upward trend, indicating that players with higher overall ratings tend to have higher wages.
- **Outliers**: Some players with lower ratings have higher wages, which could be due to factors like popularity, marketability, or club status.
- **Concentration**: Most players with mid-range ratings earn moderate wages, while the very high-rated players show a clear wage increase.

The plot confirms that overall rating is a key factor in determining player wages, though other external factors can influence salary.


In [None]:
 

# Create an interactive scatter plot using Plotly
fig = px.scatter(fifa_data, x='OVA', y='Wage_Millions', 
                 title='Wage vs. Overall Rating (€ Millions per Week)',
                 labels={'OVA': 'Overall Rating', 'Wage_Millions': 'Wage (€ Millions)'},
                 opacity=0.6, color_discrete_sequence=['green'])

# Show the plot
fig.show()


### Inference from the Graph:

- The bar plot visualizes the top 20 football clubs based on the average market value of their players in millions of euros.
- **Top-performing clubs**: The clubs at the top of the chart are likely the most prestigious and financially powerful in the world, consistently attracting high-value players.
- **Market dominance**: Clubs with larger average market values suggest a stronger financial position, possibly influenced by success in both domestic and international competitions.
- **Potential trends**: Clubs with a higher number of elite, star players (e.g., forwards or playmakers) or recent transfer successes may show higher average player values.
- **Geographical and league impact**: The distribution of clubs might hint at the dominance of certain leagues, such as the English Premier League, La Liga, or the Bundesliga, in terms of market value.

This graph provides a clear comparison of club wealth in the football world as determined by the market value of players, which can be an indicator of future performance, financial strength, and competitive advantage in the global market.


In [None]:
  

# Group data by club and calculate average market value per player
club_market_value = fifa_data.groupby('Club')['Value_Millions'].mean().reset_index()

# Sort clubs by highest average player value
top_20_clubs_value = club_market_value.sort_values(by='Value_Millions', ascending=False).head(20)

# Interactive Bar Plot: Top 20 Clubs by Avg. Market Value per Player
fig = px.bar(top_20_clubs_value, x='Value_Millions', y='Club', orientation='h',
             title='Top 20 Clubs by Average Player Market Value (€ Millions)',
             labels={'Value_Millions': 'Average Market Value (€ Millions)', 'Club': 'Club'},
             color='Value_Millions', color_continuous_scale='Reds')

# Update layout for better readability
fig.update_layout(
    xaxis_title='Average Market Value (€ Millions)',
    yaxis_title='Club',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


### **Inference:**
- The graph shows the top 20 football clubs based on the average weekly wage per player.
- Clubs at the top pay significantly higher wages, indicating financial strength and the ability to attract star players.
- A higher average wage reflects a club’s ambition to stay competitive in the transfer market and maintain top-tier talent.


In [None]:
# Group data by club and calculate average wage per player
club_wage = fifa_data.groupby('Club')['Wage_Millions'].mean().reset_index()

# Sort clubs by highest average wage
top_20_clubs_wage = club_wage.sort_values(by='Wage_Millions', ascending=False).head(20)

# Interactive Bar Plot: Top 20 Clubs by Avg. Wage per Player
fig = go.Figure(data=[
    go.Bar(x=top_20_clubs_wage['Wage_Millions'], y=top_20_clubs_wage['Club'], orientation='h', marker=dict(color='blue'))
])

fig.update_layout(
    title='Top 20 Clubs by Average Wage per Player (€ Millions per Week)',
    xaxis_title='Average Wage (€ Millions per Week)',
    yaxis_title='Club',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

fig.show()


### **Inference:**
- The scatter plot shows the relationship between the average player market value and average wage per player for each club.
- Clubs with higher player market values tend to also have higher wages, reflecting their ability to attract and retain expensive talent.
- The correlation between market value and wage suggests that clubs investing in high-value players also prioritize offering competitive wages.


In [None]:
# Merge market value and wage data
club_financials = pd.merge(club_market_value, club_wage, on='Club')

# Interactive Scatter Plot: Wage vs. Market Value by Club
fig = px.scatter(club_financials, x='Value_Millions', y='Wage_Millions', 
                 title='Club Wage vs. Market Value (€ Millions)',
                 labels={'Value_Millions': 'Avg. Player Market Value (€ Millions)', 
                         'Wage_Millions': 'Avg. Player Wage (€ Millions per Week)'},
                 opacity=0.7, color='Club', hover_name='Club')

# Show the plot
fig.show()


### **Inference:**
- The bar plot highlights the top 10 most cost-effective football clubs based on the ratio of market value to wage.
- Clubs with higher cost efficiency achieve greater market value relative to the wage they pay, reflecting smart investment in talent.
- These clubs maximize value for money, likely identifying underpriced players with high potential, making them more competitive without overspending.


In [None]:
  

# Create a cost-effectiveness metric (Market Value / Wage)
club_financials['Cost_Efficiency'] = club_financials['Value_Millions'] / club_financials['Wage_Millions']

# Extract the Top 10 Most Cost-Effective Clubs
top_10_cost_efficient = club_financials.nlargest(10, 'Cost_Efficiency')

# Interactive Bar Plot: Most Cost-Effective Clubs
fig = px.bar(top_10_cost_efficient, x='Cost_Efficiency', y='Club', orientation='h',
             title='Most Cost-Effective Clubs (Market Value per €1M Wage)',
             labels={'Cost_Efficiency': 'Cost Efficiency Score', 'Club': 'Club'},
             color='Cost_Efficiency', color_continuous_scale='Greens')

# Update layout for better readability
fig.update_layout(
    xaxis_title='Cost Efficiency Score',
    yaxis_title='Club',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


### **Inference:**
- The code creates a wage efficiency metric, highlighting players who may be either overpaid or underpaid based on their wage relative to their overall rating.
- **Overpaid Players**: These players have a high wage but a relatively low overall rating, indicating a potential mismatch between their pay and performance.
- **Underpaid Players**: These players have a low wage but high overall rating, suggesting they offer great value for money, likely performing well despite earning less.


In [None]:
# Ensure Wage is in Millions (since it was not provided directly as Wage in millions)
fifa_data['Wage_Millions'] = fifa_data['Wage'] / 1e6

# Create Wage Efficiency Metric: (Wage in Millions / Overall Rating)
fifa_data['Wage_Efficiency'] = fifa_data['Wage_Millions'] / fifa_data['OVA']

# Get Overpaid Players (High wage, low rating)
overpaid_players = fifa_data.nlargest(10, 'Wage_Efficiency')[['Name', 'Club', 'OVA', 'Wage_Millions', 'Wage_Efficiency']]

# Get Underpaid Players (Low wage, high rating)
underpaid_players = fifa_data.nsmallest(10, 'Wage_Efficiency')[['Name', 'Club', 'OVA', 'Wage_Millions', 'Wage_Efficiency']]

# Display Results
print(" Overpaid Players (High Wage, Low Performance)")
print(overpaid_players)

print("\nUnderpaid Players (Low Wage, High Performance)")
print(underpaid_players)


### **Inference:**
- The bar plot illustrates the top 10 highest-paying football clubs based on the average weekly wage per player.
- These clubs are paying the highest wages to their players, showcasing their financial strength and ability to attract top-tier talent.
- The graph highlights the correlation between club wealth and wage distribution, where financially dominant clubs often offer larger contracts to retain competitive players.


In [None]:
# Group by Club and Calculate Average Wage
club_wages = fifa_data.groupby('Club')['Wage_Millions'].mean().reset_index()

# Top 10 Highest-Paying Clubs
top_paying_clubs = club_wages.nlargest(10, 'Wage_Millions')

# Interactive Bar Plot using Plotly
fig = px.bar(top_paying_clubs, x='Wage_Millions', y='Club', orientation='h',
             title='🏆 Top 10 Highest-Paying Clubs (€ Millions per Week)',
             labels={'Wage_Millions': 'Average Wage (€ Millions per Week)', 'Club': 'Club'},
             color='Wage_Millions', color_continuous_scale='Blues')

# Update layout for better readability
fig.update_layout(
    xaxis_title='Average Wage (€ Millions per Week)',
    yaxis_title='Club',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


### **Inference:**
- The bar plot displays the 10 least cost-effective football clubs, based on the lowest market value relative to the wage paid.
- These clubs are considered the least cost-efficient, as they are paying higher wages for lower market value, suggesting possible overpayment for players.
- Clubs in this group may need to reconsider their wage allocation strategies to improve financial sustainability and player value.


In [None]:
# 10 Worst Cost-Efficient Clubs
worst_cost_efficient_clubs = club_financials.nsmallest(10, 'Cost_Efficiency')

# Visualize using Plotly
fig = px.bar(worst_cost_efficient_clubs, x='Cost_Efficiency', y='Club', orientation='h',
             title='Most Overpaying Clubs (Lowest Market Value per €1M Wage)',
             labels={'Cost_Efficiency': 'Cost Efficiency Score', 'Club': 'Club'},
             color='Cost_Efficiency', color_continuous_scale='Reds')

# Update layout for better readability
fig.update_layout(
    xaxis_title='Cost Efficiency Score',
    yaxis_title='Club',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


### **Inference:**
- The table displays players who have high potential (POT > 85), are underpriced (Value < €10 million), and are young (Age < 23).
- These players are considered undervalued as they possess a lot of potential but are not yet commanding a high market value, making them great opportunities for clubs looking for talent at a lower price.
- The sorted list highlights the most promising players with a potential for growth, likely making them attractive targets for clubs seeking long-term investments.


In [None]:
# Filter players with high potential, low value, and young age
undervalued_players = fifa_data[(fifa_data['POT'] > 85) & 
                                (fifa_data['Value'] < 10) & 
                                (fifa_data['Age'] < 23)]

# Sort by highest potential
undervalued_players = undervalued_players.sort_values(by='POT', ascending=False)

# Show top 10 undervalued players
undervalued_players[['Name', 'Age', 'Club', 'Position', 'OVA', 'POT', 'Value', 'Wage']].head(10)


### **Inference:**
- The scatter plot compares all players based on their market value and potential rating, with undervalued players (high potential, low market value) highlighted in red.
- These undervalued players are positioned in the lower market value range but have high potential ratings, making them prime targets for clubs looking for bargains.
- The graph visually identifies players who are underpriced relative to their potential, indicating potential hidden gems for teams willing to invest in future growth.


In [None]:
# Assuming 'Value' is the original column for market value
fifa_data['Value_Millions'] = fifa_data['Value'] / 1e6  # Converting market value to millions

# Define the undervalued players: High potential but low value
undervalued_players = fifa_data[(fifa_data['POT'] > 85) &  # 'POT' corresponds to Potential
                                (fifa_data['Value_Millions'] < 10) & 
                                (fifa_data['Age'] < 23)]

# Interactive Scatter Plot using Plotly
fig = px.scatter(fifa_data, x='Value_Millions', y='POT', 
                 title='Undervalued Players: High Potential but Low Market Value',
                 labels={'Value_Millions': 'Market Value (€ Millions)', 'POT': 'Potential Rating'},
                 opacity=0.3, color_discrete_sequence=['blue'], hover_data=['Name'])

# Add undervalued players in red
fig.add_trace(go.Scatter(x=undervalued_players['Value_Millions'], y=undervalued_players['POT'],
                         mode='markers', name='Undervalued Players', 
                         marker=dict(color='red', size=10),
                         text=undervalued_players['Name']))

# Update layout for better readability
fig.update_layout(
    xaxis_title='Market Value (€ Millions)',
    yaxis_title='Potential Rating',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


### **Inference:**
- The table highlights the top 10 players who are either free agents or have very low wages (below €5,000 per week).
- These players are available at a bargain price, making them excellent opportunities for clubs looking to strengthen their squads without a large financial investment.
- The players listed may still possess significant potential (POT) and overall value (OVA), making them attractive for clubs looking for cost-effective signings.


In [None]:
# Convert Wage to Millions (as done before)
fifa_data['Wage_Millions'] = fifa_data['Wage'] / 1e6  # Converting wage to millions

# Check for free agents or cheap players
free_or_cheap_players = fifa_data[(fifa_data['Wage_Millions'] < 0.005) | (fifa_data['Contract'].isnull())]

# Show top 10 best free agents or cheap players
free_or_cheap_players[['Name', 'Age', 'Club', 'Position', 'OVA', 'POT', 'Value', 'Wage_Millions']].head(10)


### Inference: Top 10 Countries with Highest Average Wage in FIFA Dataset

The analysis provides insights into the countries with the highest average weekly wages in the FIFA dataset, adjusted to millions for better comparison. The following key observations can be made:

1. **High Average Wages**: The countries at the top of the list have players who earn a significant weekly wage, reflected in millions of euros.
2. **Distinct Regional Patterns**: Countries with strong football leagues or a high concentration of top-tier football talent often feature prominently at the top.
3. **Top 10 Countries**: The bar chart clearly illustrates which countries have the highest average wages for their football players, with notable distinctions between regions such as Europe and South America.

The visualization highlights the football powerhouses where players are compensated handsomely, demonstrating the financial disparities between countries and regions in the world of football.


In [None]:
# Convert Wage to Millions (since the original 'Wage' column is not in millions)
fifa_data['Wage_Millions'] = fifa_data['Wage'] / 1e6  # Converting wage to millions

# Group data by Nationality (Country) and calculate average wage
country_wages = fifa_data.groupby('Nationality')['Wage_Millions'].mean().reset_index()

# Sort by highest average wage and get top 10 countries
top_10_countries_by_wage = country_wages.sort_values(by='Wage_Millions', ascending=False).head(10)

# Interactive Bar Plot using Plotly
fig = px.bar(top_10_countries_by_wage, x='Wage_Millions', y='Nationality', orientation='h',
             title='Top 10 Countries with Highest Average Wage (€ Millions per Week)',
             labels={'Wage_Millions': 'Average Wage (€ Millions per Week)', 'Nationality': 'Country'},
             color='Wage_Millions', color_continuous_scale='Blues')

# Update layout for better readability
fig.update_layout(
    xaxis_title='Average Wage (€ Millions per Week)',
    yaxis_title='Country',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


### Inference: Top 15 Clubs by Wage Disparity in FIFA Dataset

This analysis highlights the clubs with the highest wage disparity, measured by the standard deviation of weekly wages. The following insights can be drawn:

1. **Wage Disparity**: Clubs with a higher standard deviation in wages suggest a significant variation in player salaries within that club. This could be due to varying player contracts, star players with high wages, and younger or less experienced players earning significantly less.
2. **Top 15 Clubs**: The visualization clearly shows the clubs with the most significant wage disparities, with some clubs showing greater differences between the highest and lowest paid players.
3. **Implications of Wage Disparity**: A high wage disparity could indicate an imbalance in player salary structures, which might reflect the financial strategies of the club, such as paying premium wages to top players while offering lower wages to other squad members.

The bar chart provides a clear comparison of wage disparities, shedding light on how the most financially complex clubs distribute their pay across different players.


In [None]:
# Convert Wage to Millions (if necessary)
fifa_data['Wage_Millions'] = fifa_data['Wage'] / 1e6  # Converting wage to millions

# Group data by Club and calculate standard deviation of wage
club_wage_disparity = fifa_data.groupby('Club')['Wage_Millions'].std().reset_index()

# Sort by highest wage disparity (standard deviation)
top_15_clubs_by_wage_disparity = club_wage_disparity.sort_values(by='Wage_Millions', ascending=False).head(15)

# Interactive Bar Plot using Plotly
fig = px.bar(top_15_clubs_by_wage_disparity, x='Wage_Millions', y='Club', orientation='h',
             title='Top 15 Clubs by Wage Disparity (€ Millions per Week)',
             labels={'Wage_Millions': 'Wage Disparity (Standard Deviation) (€ Millions per Week)', 'Club': 'Club'},
             color='Wage_Millions', color_continuous_scale='Reds')

# Update layout for better readability
fig.update_layout(
    xaxis_title='Wage Disparity (Standard Deviation) (€ Millions per Week)',
    yaxis_title='Club',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()


### Inference: Cost Efficiency of Clubs (Market Value per €1M Wage)

The analysis evaluates the cost efficiency of clubs by comparing their market value to the wages they pay. A higher cost efficiency indicates that a club gets more value (market value) per €1 million spent on wages. Key takeaways:

1. **Most Cost-Efficient Clubs**: The clubs at the top of the list (green bars) get the most market value for every million spent on player wages. These clubs are likely utilizing their budget wisely by acquiring players who provide high market value relative to their wage demands.
2. **Least Cost-Efficient Clubs**: The clubs at the bottom (red bars) exhibit lower cost efficiency, meaning they are spending more on wages for players whose market value doesn’t align proportionally. These clubs may be overpaying for certain players or not getting the return they expect from their wage investment.
3. **Implications for Financial Strategy**: Clubs that score high on cost efficiency may have better financial management and a more sustainable approach to player acquisitions. On the other hand, clubs with low cost efficiency might need to reassess their wage structures or transfer strategies to avoid financial strain.

The visualizations provide a clear distinction between clubs that manage their financial resources well and those that may be overspending in relation to the value they get from their players.


In [None]:
# Visualize Most Cost-Efficient Clubs with adjusted settings
fig = px.bar(top_10_cost_efficient, x='Cost_Efficiency', y='Club', orientation='h',
             title='Most Cost-Efficient Clubs (Higher Market Value per €1M Wage)',
             labels={'Cost_Efficiency': 'Cost Efficiency Score', 'Club': 'Club'},
             color='Cost_Efficiency', color_continuous_scale='Greens')

# Update layout for better readability
fig.update_layout(
    xaxis_title='Cost Efficiency Score',
    yaxis_title='Club',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12,
    yaxis=dict(tickangle=0)  # Adjust rotation for better readability
)

# Show the plot
fig.show()

# Visualize Least Cost-Efficient Clubs with adjusted settings
fig = px.bar(worst_cost_efficient_clubs, x='Cost_Efficiency', y='Club', orientation='h',
             title='Least Cost-Efficient Clubs (Lower Market Value per €1M Wage)',
             labels={'Cost_Efficiency': 'Cost Efficiency Score', 'Club': 'Club'},
             color='Cost_Efficiency', color_continuous_scale='Reds')

# Update layout for better readability
fig.update_layout(
    xaxis_title='Cost Efficiency Score',
    yaxis_title='Club',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12,
    yaxis=dict(tickangle=0)  # Adjust rotation for better readability
)

# Show the plot
fig.show()


### Inference: Best Clubs for Developing Young Players (Under 23 with High Potential)

This analysis identifies the top clubs focused on developing young, high-potential players (under 23 years old with potential over 85). Here are the key takeaways:

1. **Youth Development Focus**: The clubs at the top of the list have demonstrated a strong emphasis on nurturing young talent with high potential. These clubs likely invest heavily in youth scouting, training, and providing opportunities for young players to develop.
2. **Top 10 Clubs**: The visualization shows the clubs that have successfully developed the most high-potential young players, offering an indicator of their commitment to youth development.
3. **Implications for Future Talent**: Clubs that feature prominently in this analysis may have a competitive advantage in the future, as they are cultivating the next generation of elite players. These clubs are likely to see long-term benefits from their youth systems, both in terms of player performance and financial gains from potential transfers.

The bar chart emphasizes the clubs that excel in developing young football talent, reflecting their ability to invest in the future of the sport.


In [None]:
# Filter players under 23 years old and with high potential (POT > 85)
young_high_potential_players = fifa_data[(fifa_data['Age'] < 23) & (fifa_data['POT'] > 85)]

# Group by Club and count the number of high-potential young players
club_young_player_development = young_high_potential_players.groupby('Club').size().reset_index(name='Young_Development_Count')

# Sort by the number of young, high-potential players (descending)
top_clubs_for_young_players = club_young_player_development.sort_values(by='Young_Development_Count', ascending=False).head(10)

# Interactive Bar Plot using Plotly
fig = px.bar(top_clubs_for_young_players, x='Young_Development_Count', y='Club', orientation='h',
             title='Best Clubs for Developing Young Players (Under 23 with High Potential)',
             labels={'Young_Development_Count': 'Number of High Potential Young Players', 'Club': 'Club'},
             color='Young_Development_Count', color_continuous_scale='Blues')

# Update layout for better readability
fig.update_layout(
    xaxis_title='Number of High Potential Young Players',
    yaxis_title='Club',
    title_font_size=14,
    xaxis_title_font_size=12,
    yaxis_title_font_size=12
)

# Show the plot
fig.show()
