And here I start

In [2]:
import pandas as pd
import requests
import numpy as np
import io


Here, I create links for the data to download. There is no data for 2011, so we have to omit that one. We convert everything into pd.DataFrame, so that we have all pandas functionality right-away.

In [3]:
download_link = 'https://rsf.org/sites/default/files/import_classement/2000.csv'
links = []
for i in range(2, 26):
    if i == 11:
        continue
    links.append(download_link.replace('2000', f'{2000 + i}'))
links = pd.Series(links)
links[0]

'https://rsf.org/sites/default/files/import_classement/2002.csv'

Load files with requests and reading them. I had to create file-objects, cos read_csv accepts only those, it can't read strings directly. RSF decided to use semi-columns as separators

In [4]:
files = links.apply(lambda x: requests.get(x))
files = files.apply(lambda x : pd.read_csv(io.StringIO(x.text), sep=';'))
print(files[0])

     ï»¿Year (N)  ISO  Rank N Score N  Score N without the exactions  \
0           2002  FIN       1     0,5                            NaN   
1           2002  ISL       2     0,5                            NaN   
2           2002  NOR       3     0,5                            NaN   
3           2002  NLD       4     0,5                            NaN   
4           2002  CAN       5    0,75                            NaN   
..           ...  ...     ...     ...                            ...   
134         2002  BTN     135   90,75                            NaN   
135         2002  TKM     136    91,5                            NaN   
136         2002  MMR     137   96,83                            NaN   
137         2002  CHN     138      97                            NaN   
138         2002  PRK     139    97,5                            NaN   

     Score N with the exactions  Score exactions  Rank N-1  Score N-1  \
0                           NaN              NaN       NaN    

Now, joining 

In [5]:
data = pd.concat(list(files))
print(data.columns)

Index(['ï»¿Year (N)', 'ISO', 'Rank N', 'Score N',
       'Score N without the exactions', 'Score N with the exactions',
       'Score exactions', 'Rank N-1', 'Score N-1', 'Rank evolution',
       'FR_country', 'EN_country', 'ES_country', 'AR_country', 'FA_country',
       'Zone', 'ï»¿ISO', 'Score', 'Rank', 'Political Context', 'Rank_Pol',
       'Economic Context', 'Rank_Eco', 'Legal Context', 'Rank_Leg',
       'Social Context', 'Rank_Soc', 'Safety', 'Rank_Saf', 'Country_EN',
       'Country_FR', 'Country_ES', 'Country_AR', 'Country_FA', 'Year (N)',
       'Country_PT', 'Score evolution', 'Situation', 'Score 2025'],
      dtype='object')


Dropping obviously useless columns 

In [6]:
data = data.rename(columns={'ï»¿Year (N)': 'Year'})
data = data.drop(['Country_FR', 'Country_ES', 'Country_AR', 'Country_FA', 'Country_PT', 
           'FR_country', 'ES_country', 'AR_country', 'FA_country', 'Zone',
           'Rank_Eco', 'Rank_Pol','Rank_Leg', 'Rank_Soc', 
           'Rank_Saf', 'Situation', 'Score evolution', 'Score N with the exactions', 
           'Score exactions', 'Score N without the exactions', 'Score N-1', 'Rank evolution',
           'Rank N-1'], axis=1)
print(data.columns)

Index(['Year', 'ISO', 'Rank N', 'Score N', 'EN_country', 'ï»¿ISO', 'Score',
       'Rank', 'Political Context', 'Economic Context', 'Legal Context',
       'Social Context', 'Safety', 'Country_EN', 'Year (N)', 'Score 2025'],
      dtype='object')


Standardize columns, unite columns that were named differently into one. Dropping some more obsolete columns

Main job of the platform will be to compare rankings in a nice and visual way, using more detailed info as needed (as economic context and other "contexts"). Those detailed parameters will me measured in scores, as it's more relevant to compare the economic context of a country at some absolute scale, independently from others.

In [7]:
data.columns = data.columns.str.lower().str.replace(' ', '_')

data['year'] = data['year'].where(data['year'].notna(), data['year_(n)'])
data['rank_n'] = data['rank_n'].where(data['rank_n'].notna(), data['rank'])
data['en_country'] = data['en_country'].where(data['en_country'].notna(), data['country_en'])

data = data.drop(['year_(n)', 'rank', 'country_en'], axis=1)
data = data.rename(columns={'rank_n': 'rank', 'en_country': 'country'})

print(data['year'].sample)



<bound method NDFrame.sample of 0        2002
1        2002
2        2002
3        2002
4        2002
        ...  
175    2025.0
176    2025.0
177    2025.0
178    2025.0
179    2025.0
Name: year, Length: 4020, dtype: object>


I'll add the overall score to my data, as it's useful to select it optionally. So the overall situation can be tracked my rank and score, but I don't see a need in seeing the rank of separate factors, so the score will suffice. I'll also convert the data to lighter dtypes and rename those columns to make them simpler

In [8]:
# print(data[data['score'].notna()]['year'].unique()) #22, 23, 24
# print(data[data['score_n'].notna()]['year'].unique()) # 2 - 21
# print(data[data['score_2025'].notna()]['year'].unique()) # 25

data['score_n'] = data['score_n'].where(data['score_n'].notna(), data['score'])
data['score_n'] = data['score_n'].where(data['score_n'].notna(), data['score_2025'])
data = data.drop(['score', 'score_2025'], axis=1)
data = data.convert_dtypes()
data = data.rename(columns={'score_n': 'score', 'en_country': 'country'})

data['score'] = data['score'].str.replace(',', '.')
data['score'] = data['score'].astype(float)

data['iso'] = data['iso'].where(data['iso'].notna(), data['ï»¿iso'])
data = data.drop(['ï»¿iso'], axis=1)

Another round of standartizing column names and correcting their types manually.

In [9]:
new_col_names = []
for name in data.columns:
    if name == 'safety':
        new_col_names.append('safety_score')
        continue
    new_col_names.append(name.replace('context', 'score'))
data.columns = new_col_names

data['year'] = data['year'].astype(str).str.findall(r'(^20.{2})').str[0].astype('Int64')
data.loc[data['year'] == 2011, 'year'] = 2012

data[['political_score',
    'economic_score', 
    'legal_score',
    'social_score', 
    'safety_score']] = data[['political_score',
                            'economic_score', 
                            'legal_score',
                            'social_score', 
                            'safety_score']].apply(lambda x : x.str.replace(',', '.').astype(float))
print(data.info())



<class 'pandas.core.frame.DataFrame'>
Index: 4020 entries, 0 to 179
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   year             4020 non-null   Int64  
 1   iso              4020 non-null   string 
 2   rank             4020 non-null   Int64  
 3   score            4020 non-null   float64
 4   country          4020 non-null   string 
 5   political_score  720 non-null    float64
 6   economic_score   720 non-null    float64
 7   legal_score      720 non-null    float64
 8   social_score     720 non-null    float64
 9   safety_score     720 non-null    float64
dtypes: Int64(2), float64(6), string(2)
memory usage: 353.3 KB
None


Now, setting those countries right. I'll send to Gemini Pro 2.5 the list of unique codes and get the mapper-list of country names that I like. Using api is an overkill here, and i tried the open-source mapper - total mess, as it had multiple names for same code, while i need simplicity and efficiency here. 

Ok, I need also to manage those indexes right

In [10]:
country_mapper = {
    'AFG': 'Afghanistan',
    'AGO': 'Angola',
    'ALB': 'Albania',
    'AND': 'Andorra',
    'ARE': 'United Arab Emirates',
    'ARG': 'Argentina',
    'ARM': 'Armenia',
    'AUS': 'Australia',
    'AUT': 'Austria',
    'AZE': 'Azerbaijan',
    'BDI': 'Burundi',
    'BEL': 'Belgium',
    'BEN': 'Benin',
    'BFA': 'Burkina Faso',
    'BGD': 'Bangladesh',
    'BGR': 'Bulgaria',
    'BHR': 'Bahrain',
    'BIH': 'Bosnia and Herzegovina',
    'BLR': 'Belarus',
    'BLZ': 'Belize',
    'BOL': 'Bolivia',
    'BRA': 'Brazil',
    'BRN': 'Brunei',
    'BTN': 'Bhutan',
    'BWA': 'Botswana',
    'CAF': 'Central African Republic',
    'CAN': 'Canada',
    'CHE': 'Switzerland',
    'CHL': 'Chile',
    'CHN': 'China',
    'CIV': 'Ivory Coast',
    'CMR': 'Cameroon',
    'COD': 'Congo',
    'COG': 'Congo-Brazzaville',
    'COL': 'Colombia',
    'COM': 'Comoros',
    'CPV': 'Cape Verde',
    'CRI': 'Costa Rica',
    'CSS': 'Czechoslovakia (Former)', # Historical
    'CUB': 'Cuba',
    'CYP': 'Cyprus',
    'CZE': 'Czechia',
    'DEU': 'Germany',
    'DJI': 'Djibouti',
    'DNK': 'Denmark',
    'DOM': 'Dominican Republic',
    'DZA': 'Algeria',
    'ECU': 'Ecuador',
    'EGY': 'Egypt',
    'ERI': 'Eritrea',
    'ESP': 'Spain',
    'EST': 'Estonia',
    'ETH': 'Ethiopia',
    'FIN': 'Finland',
    'FJI': 'Fiji',
    'FRA': 'France',
    'GAB': 'Gabon',
    'GBR': 'United Kingdom',
    'GEO': 'Georgia',
    'GHA': 'Ghana',
    'GIN': 'Guinea',
    'GMB': 'Gambia',
    'GNB': 'Guinea-Bissau',
    'GNQ': 'Equatorial Guinea',
    'GRC': 'Greece',
    'GRD': 'Grenada',
    'GTM': 'Guatemala',
    'GUY': 'Guyana',
    'HKG': 'Hong Kong',
    'HND': 'Honduras',
    'HRV': 'Croatia',
    'HTI': 'Haiti',
    'HUN': 'Hungary',
    'IDN': 'Indonesia',
    'IND': 'India',
    'IRL': 'Ireland',
    'IRN': 'Iran',
    'IRQ': 'Iraq',
    'ISL': 'Iceland',
    'ISR': 'Israel',
    'ITA': 'Italy',
    'JAM': 'Jamaica',
    'JOR': 'Jordan',
    'JPN': 'Japan',
    'KAZ': 'Kazakhstan',
    'KEN': 'Kenya',
    'KGZ': 'Kyrgyzstan',
    'KHM': 'Cambodia',
    'KOR': 'South Korea',
    'KWT': 'Kuwait',
    'LAO': 'Laos',
    'LBN': 'Lebanon',
    'LBR': 'Liberia',
    'LBY': 'Libya',
    'LIE': 'Liechtenstein',
    'LKA': 'Sri Lanka',
    'LSO': 'Lesotho',
    'LTU': 'Lithuania',
    'LUX': 'Luxembourg',
    'LVA': 'Latvia',
    'MAR': 'Morocco',
    'MDA': 'Moldova',
    'MDG': 'Madagascar',
    'MDV': 'Maldives',
    'MEX': 'Mexico',
    'MKD': 'North Macedonia',
    'MLI': 'Mali',
    'MLT': 'Malta',
    'MMR': 'Myanmar',
    'MNE': 'Montenegro',
    'MNG': 'Mongolia',
    'MOZ': 'Mozambique',
    'MRT': 'Mauritania',
    'MUS': 'Mauritius',
    'MWI': 'Malawi',
    'MYS': 'Malaysia',
    'NAM': 'Namibia',
    'NER': 'Niger',
    'NGA': 'Nigeria',
    'NIC': 'Nicaragua',
    'NLD': 'Netherlands',
    'NOR': 'Norway',
    'NPL': 'Nepal',
    'NZL': 'New Zealand',
    'OMN': 'Oman',
    'PAK': 'Pakistan',
    'PAN': 'Panama',
    'PER': 'Peru',
    'PHL': 'Philippines',
    'PNG': 'Papua New Guinea',
    'POL': 'Poland',
    'PRK': 'North Korea',
    'PRT': 'Portugal',
    'PRY': 'Paraguay',
    'PSE': 'Palestine',
    'QAT': 'Qatar',
    'ROU': 'Romania',
    'RUS': 'Russia',
    'RWA': 'Rwanda',
    'SAU': 'Saudi Arabia',
    'SCG': 'Serbia and Montenegro (Former)', # Historical
    'SDN': 'Sudan',
    'SEN': 'Senegal',
    'SGP': 'Singapore',
    'SLE': 'Sierra Leone',
    'SLV': 'El Salvador',
    'SOM': 'Somalia',
    'SRB': 'Serbia',
    'SSD': 'South Sudan',
    'SUR': 'Suriname',
    'SVK': 'Slovakia',
    'SVN': 'Slovenia',
    'SWE': 'Sweden',
    'SWZ': 'Eswatini',
    'SYC': 'Seychelles',
    'SYR': 'Syria',
    'TCD': 'Chad',
    'TGO': 'Togo',
    'THA': 'Thailand',
    'TJK': 'Tajikistan',
    'TKM': 'Turkmenistan',
    'TLS': 'Timor-Leste',
    'TON': 'Tonga',
    'TTO': 'Trinidad and Tobago',
    'TUN': 'Tunisia',
    'TUR': 'Turkey',
    'TWN': 'Taiwan',
    'TZA': 'Tanzania',
    'UGA': 'Uganda',
    'UKR': 'Ukraine',
    'URY': 'Uruguay',
    'USA': 'United States',
    'UZB': 'Uzbekistan',
    'VEN': 'Venezuela',
    'VNM': 'Vietnam',
    'WSM': 'Samoa',
    'XKX': 'Kosovo', # User-assigned code
    'XKO': 'Kosovo',
    'YEM': 'Yemen',
    'YUG': 'Yugoslavia (Former)', # Historical
    'ZAF': 'South Africa',
    'ZMB': 'Zambia',
    'ZWE': 'Zimbabwe',
    # Mapping custom/duplicate codes to a standard name
    'USA1': 'United States',
    'USA2': 'United States',
    'USA_I': 'United States',
    'ISR1': 'Israel',
    'ISR2': 'Israel',
    'ISR3': 'Israel',
    'CTU': 'Northern Cyprus',
    'XCD': 'Caribbean States' # Currency code, not a country code
}
data['country'] = data['iso'].map(country_mapper)

data = data.reset_index()
data = data.drop('index', axis=1)

Ok, looks good to me, time to export. Hopefully i won't come back here again, but something tells me that I will) But it's time to proceed.

In [None]:
data.to_csv('../docs/data/data.csv')
data.to_json('../docs/data/data.json', orient='records', index=False)

In [19]:
data['rank'].min()


np.int64(1)