# Who is the GOAT of football history?

## 1.Introduction

Football is the most popular sport worldwide and there has always been debate over who the GOAT of football is.In 2022, when Lionel Messi led Argentina to win the World Cup, it felt like the question of "Who is the GOAT of football history?" had finally received its best answer. As a devoted fan of both Messi and FC Barcelona, I have every reason to put Messi at the very top in my heart.

However, for many older fans, the legendary status of Pelé and Maradona remains untouchable. And among some die-hard Cristiano Ronaldo supporters, there's still a strong reluctance to admit that Messi has not only won the rivalry between the two but also claimed the crown as the king of football.

Putting aside my identity as a fan, as a student of economics and Python, I realized that I could approach this question through data analysis. By analyzing the statistics of these legendary players, perhaps I can find a more objective answer to who truly deserves the title of the GOAT in football history.

In this case study, we will explore the process of selecting the GOAT of football history using football statistics from the Transfermarkt portal.

Using Python for data analysis can help eliminate subjective biases. For example, older Argentinians may have a special emotional attachment to Maradona because he defeated the English—Argentina’s opponents in the Falklands War—during the World Cup. While some marvel at Messi’s elegant dribbling, others revel in Ronaldo’s thunderous strikes.Similarly, some people may idolize past players while dismissing modern ones, or argue that today’s tactical evolution renders historical strategies irrelevant. These subjective opinions can be avoided. In the face of objective data, we are no longer entitled to "stubborn arguments."

In my report, I focus exclusively on attacking players, as defensive statistics are difficult to quantify and recognition in football tends to favor offensive players. I will evaluate the players using various metrics such as goals, assists, and more, in order to determine a detailed final ranking.

## 2.Data Clean

First, we will import the data (the data comes from Kaggle).

In [3]:
url_players=r"C:\Users\yyh\Desktop\quantecon-notebooks-datascience-master\python2\final\players.csv"
url_appearance=r"C:\Users\yyh\Desktop\quantecon-notebooks-datascience-master\python2\final\appearances.csv"
url_games=r"C:\Users\yyh\Desktop\quantecon-notebooks-datascience-master\python2\final\games.csv"
url_competitions=r"C:\Users\yyh\Desktop\quantecon-notebooks-datascience-master\python2\final\competitions.csv"
url_player_valuations=r"C:\Users\yyh\Desktop\quantecon-notebooks-datascience-master\python2\final\player_valuations.csv"

In [4]:
import pandas as pd
players_df=pd.read_csv(url_players)
appearances_df=pd.read_csv(url_appearance)
games_df=pd.read_csv(url_games)
competitions_df=pd.read_csv(url_competitions)
player_valuations_df=pd.read_csv(url_player_valuations)

Because of duplicate column names in the data, we need to preprocess the data before merging.

In [5]:
def preprocess_data(appearances_df, players_df, games_df, competitions_df, player_valuations_df):
    # 1. Delete duplicate columns.
    appearances_df_clean = appearances_df.drop(columns=['competition_id'], errors='ignore')

    # 2. Merge the match data.
    player_stats = appearances_df_clean.merge(
        games_df[['game_id', 'competition_id', 'season']],
        on='game_id',
        how='left'
    )

    # 3. Merge the competition information.
    player_stats = player_stats.merge(
        competitions_df[['competition_id', 'name', 'type', 'country_name']],
        on='competition_id',
        how='left',
        suffixes=('', '_comp')
    )

    # 4. Merge the basic player information.
    player_stats = player_stats.merge(
        players_df,
        on='player_id',
        how='left'
    )

    # 5. Extract the latest_valuation for each player.
    latest_valuation = (
        player_valuations_df.sort_values('date')
        .drop_duplicates(subset='player_id', keep='last')
        [['player_id', 'market_value_in_eur']]
    )

    # 6. Merge this smaller dataset.
    player_stats = player_stats.merge(
        latest_valuation,
        on='player_id',
        how='left'
    )

    # 7. Clean the key fields.
    player_stats = player_stats.dropna(subset=['goals', 'assists', 'minutes_played'])

    # 8. New field: goal efficiency.
    player_stats['goal_per_min'] = player_stats['goals'] / player_stats['minutes_played']

    return player_stats

In [6]:
attacking_players = preprocess_data(
    appearances_df, 
    players_df, 
    games_df, 
    competitions_df, 
    player_valuations_df
)

Now we have the cleaned player data, with both basic and advanced statistics for each player presented in attacking_players.

In [7]:
attacking_players.head()

Unnamed: 0,appearance_id,game_id,player_id,player_club_id,player_current_club_id,date,player_name,yellow_cards,red_cards,goals,...,contract_expiration_date,agent_name,image_url,url,current_club_domestic_competition_id,current_club_name,market_value_in_eur_x,highest_market_value_in_eur,market_value_in_eur_y,goal_per_min
0,2231978_38004,2231978,38004,853,235,2012-07-03,Aurélien Joachim,0,0,2,...,,ER Sport Agency,https://img.a.transfermarkt.technology/portrai...,https://www.transfermarkt.co.uk/aurelien-joach...,NL1,Rooms Katholieke Combinatie Waalwijk,75000.0,600000.0,75000.0,0.022222
1,2233748_79232,2233748,79232,8841,2698,2012-07-05,Ruslan Abyshov,0,0,0,...,,,https://img.a.transfermarkt.technology/portrai...,https://www.transfermarkt.co.uk/ruslan-abyshov...,RU1,FC Rubin Kazan,25000.0,450000.0,25000.0,0.0
2,2234413_42792,2234413,42792,6251,465,2012-07-05,Sander Puri,0,0,0,...,2023-12-31 00:00:00,,https://img.a.transfermarkt.technology/portrai...,https://www.transfermarkt.co.uk/sander-puri/pr...,SC1,Saint Mirren Football Club,100000.0,600000.0,100000.0,0.0
3,2234418_73333,2234418,73333,1274,6646,2012-07-05,Vegar Hedenstad,0,0,0,...,2024-12-31 00:00:00,TPSportManagement,https://img.a.transfermarkt.technology/portrai...,https://www.transfermarkt.co.uk/vegar-hedensta...,TR1,Fatih Karagümrük,350000.0,1500000.0,350000.0,0.0
4,2234421_122011,2234421,122011,195,3008,2012-07-05,Markus Henriksen,0,0,0,...,2024-12-31 00:00:00,,https://img.a.transfermarkt.technology/portrai...,https://www.transfermarkt.co.uk/markus-henriks...,GB1,Hull City,800000.0,5000000.0,800000.0,0.0


Next, we need to calculate key metrics such as goals, assists, and appearances based on the players' basic data.

In [8]:
player_metrics = attacking_players.groupby(['player_id', 'player_name', 'position', 'country_of_citizenship']).agg({
    'goals': 'sum',
    'assists': 'sum',
    'minutes_played': 'sum',
    'yellow_cards': 'sum',
    'red_cards': 'sum',
    'game_id': 'nunique', 
    'season': ['min', 'max']  
}).reset_index()

In [9]:
player_metrics.columns = [
    'player_id', 'name', 'position', 'country', 
    'total_goals', 'total_assists', 'total_minutes', 
    'total_yellow_cards', 'total_red_cards', 'appearances',
    'career_start', 'career_end'
]

In [10]:
#Calculate derived metrics.
player_metrics['career_years'] = player_metrics['career_end'] - player_metrics['career_start'] + 1
player_metrics['total_goal_contributions'] = player_metrics['total_goals'] + player_metrics['total_assists']
player_metrics['goals_per_game'] = player_metrics['total_goals'] / player_metrics['appearances']
player_metrics['assists_per_game'] = player_metrics['total_assists'] / player_metrics['appearances']
player_metrics['goal_contributions_per_game'] = player_metrics['total_goal_contributions'] / player_metrics['appearances']
player_metrics['minutes_per_goal'] = player_metrics['total_minutes'] / (player_metrics['total_goals'] + 0.001)  # Avoid division by zero.
player_metrics['goals_per_90min'] = (player_metrics['total_goals'] / player_metrics['total_minutes']) * 90

At the same time, we require our GOAT candidates to have played at least 50 matches (as a GOAT should far exceed this number of appearances).

In [11]:
player_metrics = player_metrics[player_metrics['appearances'] >= 50]

Now that we have preliminarily obtained our list of GOAT candidates, we can take a look at who they are and how many there are.

In [12]:
player_metrics.head()

Unnamed: 0,player_id,name,position,country,total_goals,total_assists,total_minutes,total_yellow_cards,total_red_cards,appearances,career_start,career_end,career_years,total_goal_contributions,goals_per_game,assists_per_game,goal_contributions_per_game,minutes_per_goal,goals_per_90min
0,10,Miroslav Klose,Attack,Germany,48,25,8808,19,0,136,2012,2015,4,73,0.352941,0.183824,0.536765,183.4962,0.490463
1,26,Roman Weidenfeller,Goalkeeper,Germany,0,0,13508,4,2,152,2012,2017,6,0,0.0,0.0,0.0,13508000.0,0.0
2,65,Dimitar Berbatov,Attack,Bulgaria,38,13,8788,11,1,122,2012,2015,4,51,0.311475,0.106557,0.418033,231.2571,0.389167
7,132,Tomas Rosicky,Midfield,Czech Republic,9,4,3987,13,0,77,2012,2015,4,13,0.116883,0.051948,0.168831,442.9508,0.20316
8,215,Roque Santa Cruz,Attack,Paraguay,26,8,6038,3,0,109,2012,2015,4,34,0.238532,0.073394,0.311927,232.2218,0.387546


In [13]:
print(f"There are a total of {len(player_metrics)} players who meet the criteria.")

There are a total of 10117 players who meet the criteria.


We can take an initial look at the top 10 players in goals and assists. These two metrics best reflect a player’s impact on the game.

In [14]:
import plotly.graph_objects as go

player_data = player_metrics.copy()  
player_data['total_contributions'] = player_data['total_goals'] + player_data['total_assists']

top10 = player_data.sort_values(by='total_contributions', ascending=False).head(10)

fig = go.Figure()

fig.add_trace(go.Bar(
    x=top10['name'],
    y=top10['total_goals'],
    name='Goals',
    marker_color='rgba(55, 83, 109, 0.8)',
))

fig.add_trace(go.Bar(
    x=top10['name'],
    y=top10['total_assists'],
    name='Assists',
    marker_color='rgba(26, 118, 255, 0.8)',
))

fig.update_layout(
    title='Top 10 Players by Total Goal Contributions (Goals + Assists)',
    xaxis_title='Player',
    yaxis_title='Count',
    barmode='stack',
    hovermode='x unified',
    template='plotly_white',
    font=dict(size=14),
    width=1000,
    height=600
)

for i, player in enumerate(top10['name']):
    total = top10.iloc[i]['total_contributions']
    goals = top10.iloc[i]['total_goals']
    assists = top10.iloc[i]['total_assists']
    height = goals + assists
    fig.add_annotation(
        x=player,
        y=height + 10,
        text=str(total),
        showarrow=False,
        font=dict(size=12, color='black')
    )

fig.show()

## 3.Data analysis

Just judging a player’s ability solely by the number of goals and assists is far from enough. This approach is unfair to players with short but explosive peak periods, as well as young talents. For example, Kylian Mbappé, who is only 26 this year and rose to fame early, has far fewer matches played than Messi and Ronaldo. And Ronaldinho, who collected all the honors, is no less impressive than some “evergreens” who get better with age.

So, we will introduce weights to balance and comprehensively evaluate a player’s overall ability. The weights are as follows:

total_goals: 0.25

total_assists: 0.15

goal_contributions_per_game: 0.2

career_years: 0.1

appearances: 0.1

goals_per_90min: 0.2

For each metric, we performed data normalization by scaling the values to a [0, 1] range. This prevents metrics with different units and magnitudes—such as "total goals" and "goals per 90 minutes"—from disproportionately influencing the final score. Finally, we applied the predefined weights and calculated the weighted sum to obtain the overall score.

Then, we obtained the top 10 players based on their weighted scores.

In [15]:
import plotly.express as px
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

scaler = MinMaxScaler()
metrics = ['total_goals', 'total_assists', 'goal_contributions_per_game', 'career_years', 'appearances', 'goals_per_90min']

normalized_df = player_metrics.copy()
normalized_df[metrics] = scaler.fit_transform(normalized_df[metrics])

weights = {
    'total_goals': 0.25,
    'total_assists': 0.15,
    'goal_contributions_per_game': 0.20,
    'career_years': 0.10,
    'appearances': 0.10,
    'goals_per_90min': 0.20
}

normalized_df['score'] = normalized_df[metrics].dot(pd.Series(weights))

top10_score = normalized_df.sort_values(by='score', ascending=False).head(10)

fig = px.bar(
    top10_score,
    x='name',
    y='score',
    title='Top 10 Players by Weighted Score',
    labels={'score': 'Weighted Score'},
    color='score',
    color_continuous_scale='Viridis',
    hover_data=metrics
)

fig.update_layout(
    xaxis_title='Player',
    yaxis_title='Weighted Score',
    template='plotly_white',
    font=dict(size=14)
)

fig.show()

We can combine the chart of total goals + total assists with the weighted score chart into one.

In [16]:
import plotly.graph_objects as go

top10_score_sorted = top10_score.sort_values(by='score', ascending=False)

goals_trace = go.Bar(
    x=top10_score_sorted['name'],
    y=top10_score_sorted['total_goals'],
    name='Goals',
    marker_color='rgba(55, 83, 109, 0.8)',
    hovertemplate='<b>%{x}</b><br>Goals: %{y}<extra></extra>'
)

assists_trace = go.Bar(
    x=top10_score_sorted['name'],
    y=top10_score_sorted['total_assists'],
    name='Assists',
    marker_color='rgba(26, 118, 255, 0.8)',
    hovertemplate='<b>%{x}</b><br>Assists: %{y}<extra></extra>'
)

score_trace = go.Scatter(
    x=top10_score_sorted['name'],
    y=top10_score_sorted['score'],
    name='Score',
    mode='markers+lines',
    yaxis='y2',
    marker=dict(
        color='gold',
        size=12,
        line=dict(color='black', width=1)
    ),
    line=dict(dash='dash', color='gold'),
    hovertemplate='<b>%{x}</b><br>Score: %{y:.3f}<extra></extra>'
)

layout = go.Layout(
    title='Top 10 GOAT Candidates: Goals + Assists + Score',
    barmode='stack',
    xaxis=dict(title='Player'),
    yaxis=dict(title='Goals + Assists'),
    yaxis2=dict(
        title='Weighted Score',
        overlaying='y',
        side='right',
        showgrid=False
    ),
    legend=dict(x=0.8, y=1.15, orientation='h'),
    template='plotly_white',
    plot_bgcolor='rgba(245,245,245,1)',
    paper_bgcolor='rgba(255,255,255,1)',
    font=dict(size=13),
    margin=dict(l=40, r=50, t=80, b=40)
)

fig = go.Figure(data=[goals_trace, assists_trace, score_trace], layout=layout)

fig.show()

It can be seen that for most players, the ranking by total goals + total assists is consistent with their ranking based on the weighted score.

At the same time, we can display the distribution of players’ total goals and total assists, using color and point size to represent their weighted overall scores. This provides an intuitive comparison of goal-scoring and assisting abilities:

By the horizontal and vertical coordinates of each point, we can see which players are goal-heavy “strikers,” which are assist-focused “playmakers,” and which are versatile all-rounders who excel at both.

The combined color and size make the overall performance immediately clear, allowing us to quickly identify players with high comprehensive scores and help judge “who performs better overall.”

It also helps distinguish different player styles—for example, those with high goals but low assists and vice versa cluster in different areas on the chart, useful for style classification or tactical analysis.

In short, this visualization is a powerful tool to capture both individual strengths and overall impact at a glance.

At the same time, we can plot a Goals vs. Assists scatter plot, with each point colored by the player's weighted score.

This allows us to observe each player's attacking style based on their position on the plot, and intuitively assess their overall performance through the color gradient representing their weighted score.

In [17]:
import plotly.express as px

normalized_df['label'] = ''
normalized_df.loc[top10_score.index, 'label'] = normalized_df.loc[top10_score.index, 'name']

fig = px.scatter(
    normalized_df,
    x='total_goals',
    y='total_assists',
    color='score',
    color_continuous_scale='Viridis',
    size='score',
    hover_name='name',
    text='label',
    title='Goals vs Assists Scatter Plot Colored by Weighted Score',
    labels={'total_goals': 'Total Goals', 'total_assists': 'Total Assists', 'score': 'Weighted Score'}
)

fig.update_traces(textposition='top center', textfont=dict(size=10)) 

fig.update_layout(
    height=700,
    xaxis_title='Total Goals',
    yaxis_title='Total Assists',
    template='plotly_white',
    font=dict(size=14)
)

fig.show()

As shown in the plot, Messi, Cristiano Ronaldo, and Lewandowski rank highly in both goals and assists, with the brightest colors indicating top weighted scores. This suggests that they are not only highly efficient and well-balanced in their playing styles, but also undeniable GOAT candidates from a data-driven perspective.

It seems that Lionel Messi stands out as the GOAT in football history. Next, I will use more visualizations to showcase his data in detail.

We can visualize a scatter plot of career length vs. average goal contributions per game to evaluate each player's consistency and efficiency over time.

This chart helps us visually compare the "efficiency" and "longevity" of top historical players across multiple dimensions:

Players high on the Y-axis (like Messi and Ronaldo) show strong contributions per match, even over long careers — true examples of sustained excellence.

Players positioned further to the right on the X-axis represent those with exceptionally long and stable careers.

Larger bubbles indicate more reliable data, reflecting a greater number of appearances and reducing the impact of "one-season wonders."

For instance, a player with a high Y-axis value but low X-axis position could be seen as a "brilliant but short-lived" talent — a flash of genius with a brief peak.

In [18]:
import plotly.express as px

top10_bubble = normalized_df.loc[top10_score.index].copy()

top10_bubble['label'] = ''
top5 = top10_bubble.sort_values(by='score', ascending=False).head(5)
top10_bubble.loc[top5.index, 'label'] = top5['name']

fig = px.scatter(
    top10_bubble,
    x='career_years',
    y='goal_contributions_per_game',
    size='appearances',
    color='score',
    text='label',
    color_continuous_scale='Viridis',
    hover_name='name',
    title='Top 10 Players: Career Length vs Avg Goal Contributions\n(Bubble Size = Appearances)',
    labels={
        'career_years': 'Career Years',
        'goal_contributions_per_game': 'Goals + Assists per Game',
        'score': 'Weighted Score',
        'appearances': 'Appearances'
    },
    size_max=25
)

fig.update_traces(textposition='top center', textfont_size=11)

fig.update_layout(
    font=dict(size=13),
    height=600,
    xaxis_title='Career Years',
    yaxis_title='Goals + Assists per Game',
    template='plotly_white'
)

fig.show()

We find that Messi and Cristiano Ronaldo are “balanced” superstars with both high output and moderately long careers. They avoid extremes of being short-lived or inefficient, maintaining a long professional lifespan while consistently delivering strong offensive contributions. This reflects top-level performance characterized by stability and efficiency.

Lewandowski’s strength lies in durability rather than peak explosiveness.
He has enjoyed a lengthy career with high appearance rates and consistent playtime, but his average goal contributions per game don’t reach the explosive levels of Messi and Ronaldo. This suggests Lewandowski is more of a “long-lasting endurance” striker—steady and reliable but with less flashy bursts of brilliance.

These differences highlight how diverse player styles and physical attributes shape their career trajectories.

Messi and Ronaldo are versatile, all-around players who excel in both creating and scoring, maintaining high-level contributions per game over their careers.

Lewandowski is more of a traditional “pure number 9” — his precision and positional play drive his steady scoring efficiency, with less emphasis on flamboyant flair but more on lasting effectiveness.

We can also create radar charts for the top 5 players to visually capture the differences across their various performance metrics.

Radar charts can simultaneously display multiple metrics, allowing you to instantly see the strengths and weaknesses of different subjects (like players) across various aspects. They help intuitively compare attributes such as goal-scoring ability, assist capability, efficiency, career length, and more, making comprehensive evaluation easier.

By compressing multidimensional information into a single chart, radar charts are more intuitive and easier to understand than looking at multiple separate bar charts or tables—and they’re definitely more persuasive.

In [19]:
import plotly.graph_objects as go
import numpy as np

top5 = top10_score.head(5)

categories = ['Goals', 'Assists', 'Goal Contributions/Game', 'Career Years', 'Appearances', 'Goals per 90min']

fig = go.Figure()

for i, row in top5.iterrows():
    values = row[metrics].tolist()
    values += values[:1] 
    
    fig.add_trace(go.Scatterpolar(
        r=values,
        theta=categories + [categories[0]],
        fill='toself',
        name=row['name']
    ))

fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[0, 1]
        )
    ),
    title='Top 5 Players Radar Chart',
    showlegend=True,
    template='plotly_white',
    font=dict(size=14),
    width=800,
    height=600
)

fig.show()

We can see that Messi is the most well-rounded attacking player, balancing both scoring and playmaking. He excels at both creating chances and finishing them, making his role in the team highly versatile and flexible.

Cristiano Ronaldo is an efficient finisher, with precise shooting as his core strength. He takes on the responsibility of the final strike more often but is less involved in organizing attacks.

Lewandowski is a prolific traditional striker, primarily focused on scoring. His specialization makes him stand out in goal numbers, but his overall contribution is less diverse compared to Messi.

Finally, we can create a 3D bubble chart to showcase the overall performance of the top 10 players.

It can simultaneously display three metrics—X-axis, Y-axis, and bubble size—such as goals, assists, and appearances, while even using color to represent the overall score. Compared to simple bar charts or scatter plots, this is much richer and lets you instantly see who has stronger overall ability.

By the differences in bubble size and color, you can visually distinguish “star players” from others, making it easier to identify top players who not only have high stats but also balanced performance across multiple dimensions.

You can also observe relationships between different metrics, for example: Are goals and assists positively correlated? Are players with more appearances also more efficient?

In [20]:
import plotly.express as px

top10 = top10_score.head(10)

fig = px.scatter_3d(
    top10,
    x='career_years',
    y='goals_per_90min',
    z='goal_contributions_per_game',
    size='appearances',
    color='score',
    hover_name='name',
    color_continuous_scale='Viridis',
    size_max=40,
    title='3D Bubble Chart of Top 10 Players by Weighted Score',
    labels={
        'career_years': 'Career Years',
        'goals_per_90min': 'Goals per 90 min',
        'goal_contributions_per_game': 'Goal Contributions per Game',
        'appearances': 'Appearances',
        'score': 'Weighted Score'
    }
)

fig.update_layout(scene=dict(
    xaxis=dict(backgroundcolor="rgb(230, 230,230)", gridcolor="white", zerolinecolor="white"),
    yaxis=dict(backgroundcolor="rgb(230, 230,230)", gridcolor="white", zerolinecolor="white"),
    zaxis=dict(backgroundcolor="rgb(230, 230,230)", gridcolor="white", zerolinecolor="white"),
))

fig.update_layout(
    height=700,
    width=900,
    title='3D Bubble Chart of Top 10 Players by Weighted Score'
)

fig.show()

That concludes our data analysis on the GOATs of football history!

By the by,of course, as a Chinese football fan, it’s heartbreaking to see the national team just got eliminated from the World Cup qualifiers. I totally understand that feeling. Since we’re deep into data analysis, this is also a great opportunity to take a look at the national distribution of top players — to see which countries consistently produce football legends.

Maybe it’ll offer some insight... or at least a bit of motivation for the future. 

In [21]:
import pandas as pd
import plotly.express as px
import pycountry

def get_country_code(country_name):
    try:
        return pycountry.countries.lookup(country_name).alpha_3
    except:
        country_mapping = {
            'England': 'GBR',
            'Scotland': 'GBR', 
            'Wales': 'GBR',
            'Northern Ireland': 'GBR',
            'South Korea': 'KOR',
            'North Korea': 'PRK',
            'Czech Republic': 'CZE',
            'Bosnia and Herzegovina': 'BIH',
            'Cape Verde': 'CPV'
        }
        return country_mapping.get(country_name, None)

all_players = normalized_df.copy()

all_players['country_code'] = all_players['country'].apply(get_country_code)

country_list = []
for country in all_players['country'].unique():
    country_players = all_players[all_players['country'] == country]
    top_3_players = country_players.nlargest(3, 'score')
    
    country_list.append({
        'country_code': country_players['country_code'].iloc[0],
        'country': country,
        'player_count': len(country_players),
        'representative_players': ', '.join(top_3_players['name'].tolist())
    })

country_stats = pd.DataFrame(country_list)

fig = px.choropleth(
    country_stats,
    locations='country_code',
    color='player_count',
    hover_name='country',
    hover_data={
        'player_count': True,
        'representative_players': True,
        'country_code': False
    },
    color_continuous_scale='Viridis',
    title='Player nationality distribution map',
    labels={'player_count': 'Number of players'}
)

fig.update_layout(
    title_font_size=20,
    geo=dict(
        showframe=False,
        showcoastlines=True,
        projection_type='equirectangular'
    ),
    width=1000,
    height=500
)

fig.show()

Some people claim that the poor performance of the Chinese national football team is due to racial limitations of Asians. However, the data seems to suggest that the issue lies more with China itself than with the Asian race.

Many neighboring East Asian and Central Asian countries each have more than one player on the list. Meanwhile, despite China’s population of 1.4 billion, only Wu Lei made the cut—an undeniably sobering reality.

Chinese football still has a long way to go...

## 4.Summary

As a proud Barça💙❤️ fan and a devoted Messi supporter, I proudly declare:
Lionel Messi is the GOAT🐐 of football⚽ history! 


However, there are several limitations in this report:

(1)Data availability: Due to the limitations of the dataset, some legendary players often included in GOAT debates—such as Diego Maradona and Pelé—are not featured. Their careers took place so long ago that many modern databases do not provide complete match statistics for them.

(2)Subjective weighting: The weights assigned to various advanced metrics (such as goals, assists, efficiency, etc.) were subjectively determined by myself. There may be more rigorous or statistically sound methods (e.g., regression analysis or machine learning) to optimize these weights.

(3)Focus on attacking players: This analysis only considered attacking players. However, football legends like Franz Beckenbauer—who played primarily as a defender—are often ranked among the top 10 in history. A truly comprehensive GOAT evaluation should also include defensive players.

(4)Quantity vs. quality: Does the sheer number really determine the quality? For example, a defensively strong Atlético Madrid can win a match 1-0, while this season’s offensively gifted Barcelona lost 3-4 to Inter Milan. In that case, the value of those three goals is arguably far less than the single goal that secured victory. This raises the question of whether we can find better quantitative methods to evaluate a player’s true impact beyond just counting stats.

In the future, as internet access becomes more widespread and data collection improves, player statistics will become increasingly comprehensive. We will also develop better methods to quantify player performance, enabling us to more accurately identify the true GOAT.