<a href="https://colab.research.google.com/github/wiamelk/DS-2025/blob/main/ccfinance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

First, let's read the content of the markdown file `/content/rapport_analyse_pib.md` to understand what's inside.

In [1]:
with open('/content/rapport_analyse_pib.md', 'r') as f:
    markdown_content = f.read()
print(markdown_content)

# Rapport d'Analyse Approfondie du PIB International
## Analyse Comparative Multi-pays (2010-2023)

---

## 1. Introduction et Contexte

### 1.1 Objectif de l'analyse

Cette analyse vise à examiner de manière approfondie l'évolution du Produit Intérieur Brut (PIB) de plusieurs économies majeures sur la période 2010-2023. L'objectif principal est de comparer les performances économiques, d'identifier les tendances de croissance et de comprendre les dynamiques économiques régionales et mondiales.

### 1.2 Méthodologie générale employée

L'analyse repose sur une approche quantitative combinant :
- **Analyse descriptive** : Calcul de statistiques centrales (moyennes, médianes, écarts-types)
- **Analyse comparative** : Comparaison inter-pays et identification des leaders économiques
- **Analyse temporelle** : Étude des tendances et cycles économiques
- **Visualisation de données** : Représentations graphiques multiples pour faciliter l'interprétation

### 1.3 Pays sélectionnés et période d'

# Task
Download the Gross Domestic Product (GDP) data for the United States, China, Japan, Germany, France, United Kingdom, India, and Brazil for the period 2010-2023 from the World Bank, using the indicator ID 'NY.GDP.MKTP.CD'.

## Téléchargement des données PIB

### Subtask:
Télécharger les données du Produit Intérieur Brut (PIB) pour les pays et la période spécifiés à partir de la Banque mondiale, en utilisant l'identifiant approprié de l'indicateur.


**Reasoning**:
To download the GDP data from the World Bank API, I need to import the `wbgapi` library, define the GDP indicator ID, specify the list of countries and the time range, and then use the `wbgapi.data.DataFrame` function to fetch the data.



In [2]:
import wbgapi as wb

# 2. Define the World Bank indicator ID for GDP
gdp_indicator_id = 'NY.GDP.MKTP.CD'

# 3. Create a list of country codes
countries = ['USA', 'CHN', 'JPN', 'DEU', 'FRA', 'GBR', 'IND', 'BRA']

# 4. Specify the years for data download as a range from 2010 to 2023
years = range(2010, 2024) # Range is exclusive at the end, so 2024 to include 2023

# 5. Use the wbgapi.data.DataFrame function to download the GDP data
df_gdp = wb.data.DataFrame(gdp_indicator_id, countries, mrv=years, numericTimeCodes=True)

print("GDP data downloaded successfully. First 5 rows:")
print(df_gdp.head())

ModuleNotFoundError: No module named 'wbgapi'

**Reasoning**:
The previous command failed because the `wbgapi` library was not found. I need to install `wbgapi` before it can be imported and used.



In [3]:
pip install wbgapi

Collecting wbgapi
  Downloading wbgapi-1.0.12-py3-none-any.whl.metadata (13 kB)
Downloading wbgapi-1.0.12-py3-none-any.whl (36 kB)
Installing collected packages: wbgapi
Successfully installed wbgapi-1.0.12


**Reasoning**:
The `wbgapi` library was successfully installed in the previous step. Now I will re-attempt to download the GDP data using the same code, which should now execute without the `ModuleNotFoundError`.



In [4]:
import wbgapi as wb

# 2. Define the World Bank indicator ID for GDP
gdp_indicator_id = 'NY.GDP.MKTP.CD'

# 3. Create a list of country codes
countries = ['USA', 'CHN', 'JPN', 'DEU', 'FRA', 'GBR', 'IND', 'BRA']

# 4. Specify the years for data download as a range from 2010 to 2023
years = range(2010, 2024) # Range is exclusive at the end, so 2024 to include 2023

# 5. Use the wbgapi.data.DataFrame function to download the GDP data
df_gdp = wb.data.DataFrame(gdp_indicator_id, countries, mrv=years, numericTimeCodes=True)

print("GDP data downloaded successfully. First 5 rows:")
print(df_gdp.head())

KeyError: 'numericTimeCodes is not a concept in database 2'

**Reasoning**:
The previous attempt to download data failed because `numericTimeCodes` is not a valid parameter for the `wbgapi.data.DataFrame` function. I will remove this unsupported argument and retry the data download.



In [5]:
import wbgapi as wb

# 2. Define the World Bank indicator ID for GDP
gdp_indicator_id = 'NY.GDP.MKTP.CD'

# 3. Create a list of country codes
countries = ['USA', 'CHN', 'JPN', 'DEU', 'FRA', 'GBR', 'IND', 'BRA']

# 4. Specify the years for data download as a range from 2010 to 2023
years = range(2010, 2024) # Range is exclusive at the end, so 2024 to include 2023

# 5. Use the wbgapi.data.DataFrame function to download the GDP data
df_gdp = wb.data.DataFrame(gdp_indicator_id, countries, mrv=years)

print("GDP data downloaded successfully. First 5 rows:")
print(df_gdp.head())

APIError: APIError: [400] Bad Request (https://api.worldbank.org/v2/en/sources/2/series/NY.GDP.MKTP.CD/country/USA;CHN;JPN;DEU;FRA;GBR;IND;BRA/time/all?per_page=1000&mrv=range%282010%2C+2024%29&page=1&format=json)

**Reasoning**:
The previous command resulted in an APIError: [400] Bad Request, likely because the 'mrv' parameter was incorrectly used with a range of years. The 'wbgapi' library expects a list of years for the 'time' parameter when fetching data for a specific period, so I will convert the 'years' range object to a list and pass it to the 'time' parameter.



In [6]:
import wbgapi as wb

# 2. Define the World Bank indicator ID for GDP
gdp_indicator_id = 'NY.GDP.MKTP.CD'

# 3. Create a list of country codes
countries = ['USA', 'CHN', 'JPN', 'DEU', 'FRA', 'GBR', 'IND', 'BRA']

# 4. Specify the years for data download as a range from 2010 to 2023
years = range(2010, 2024) # Range is exclusive at the end, so 2024 to include 2023

# 5. Use the wbgapi.data.DataFrame function to download the GDP data
df_gdp = wb.data.DataFrame(gdp_indicator_id, countries, time=list(years))

print("GDP data downloaded successfully. First 5 rows:")
print(df_gdp.head())

GDP data downloaded successfully. First 5 rows:
               YR2010        YR2011        YR2012        YR2013        YR2014  \
economy                                                                         
BRA      2.208838e+12  2.616156e+12  2.465228e+12  2.472820e+12  2.456044e+12   
CHN      6.192565e+12  7.671757e+12  8.673665e+12  9.743124e+12  1.067453e+13   
DEU      3.468154e+12  3.824829e+12  3.597897e+12  3.808086e+12  3.965801e+12   
FRA      2.646230e+12  2.870409e+12  2.683007e+12  2.816078e+12  2.861236e+12   
GBR      2.485483e+12  2.663806e+12  2.707090e+12  2.784854e+12  3.064708e+12   

               YR2015        YR2016        YR2017        YR2018        YR2019  \
economy                                                                         
BRA      1.802212e+12  1.795693e+12  2.063515e+12  1.916934e+12  1.873288e+12   
CHN      1.128081e+13  1.145602e+13  1.253756e+13  1.414777e+13  1.456017e+13   
DEU      3.423568e+12  3.537784e+12  3.763092e+12  4.052008e

## Chargement et inspection des données

### Subtask:
Charger les données téléchargées dans un DataFrame pandas et effectuer une inspection initiale pour comprendre sa structure, les types de données et les éventuelles valeurs manquantes.


**Reasoning**:
To understand the structure, data types, and potential missing values of the downloaded GDP data, I will display the first few rows, get a concise summary, show descriptive statistics, and check for null values.



In [7]:
print("\n--- First 5 rows of df_gdp ---")
print(df_gdp.head())

print("\n--- DataFrame Info ---")
df_gdp.info()

print("\n--- Descriptive Statistics ---")
print(df_gdp.describe())

print("\n--- Missing Values Count ---")
print(df_gdp.isnull().sum())


--- First 5 rows of df_gdp ---
               YR2010        YR2011        YR2012        YR2013        YR2014  \
economy                                                                         
BRA      2.208838e+12  2.616156e+12  2.465228e+12  2.472820e+12  2.456044e+12   
CHN      6.192565e+12  7.671757e+12  8.673665e+12  9.743124e+12  1.067453e+13   
DEU      3.468154e+12  3.824829e+12  3.597897e+12  3.808086e+12  3.965801e+12   
FRA      2.646230e+12  2.870409e+12  2.683007e+12  2.816078e+12  2.861236e+12   
GBR      2.485483e+12  2.663806e+12  2.707090e+12  2.784854e+12  3.064708e+12   

               YR2015        YR2016        YR2017        YR2018        YR2019  \
economy                                                                         
BRA      1.802212e+12  1.795693e+12  2.063515e+12  1.916934e+12  1.873288e+12   
CHN      1.128081e+13  1.145602e+13  1.253756e+13  1.414777e+13  1.456017e+13   
DEU      3.423568e+12  3.537784e+12  3.763092e+12  4.052008e+12  3.957208e+1

## Prétraitement des données

### Subtask:
Nettoyer et transformer les données si nécessaire, par exemple en gérant les valeurs manquantes ou en pivotant le tableau pour faciliter l'analyse.


**Reasoning**:
To prepare the data for further analysis, I will first reset the index of `df_gdp`, then transform it from wide to long format using `pd.melt()`, rename the new columns, and convert the 'Year' column to an integer. Finally, I will display the head and info of the transformed DataFrame to confirm the changes.



In [8]:
import pandas as pd

# 1. Reset the index so that 'economy' becomes a regular column
df_gdp_reset = df_gdp.reset_index()

# Identify year columns
year_cols = [col for col in df_gdp_reset.columns if col.startswith('YR')]

# 2. Use pd.melt() to transform df_gdp from wide to long format
df_gdp_long = pd.melt(df_gdp_reset,
                      id_vars=['economy'],
                      value_vars=year_cols,
                      var_name='Year',
                      value_name='GDP')

# 3. Rename the new columns created by melt (already done during melt by var_name and value_name)
# The column containing the years is 'Year' and the GDP values is 'GDP'

# 4. Convert the 'Year' column from a string (e.g., 'YR2010') to an integer
df_gdp_long['Year'] = df_gdp_long['Year'].str.replace('YR', '').astype(int)

# 5. Display the first 5 rows and the .info() of the transformed DataFrame
print("\n--- First 5 rows of df_gdp_long ---")
print(df_gdp_long.head())

print("\n--- DataFrame Info for df_gdp_long ---")
df_gdp_long.info()


--- First 5 rows of df_gdp_long ---
  economy  Year           GDP
0     BRA  2010  2.208838e+12
1     CHN  2010  6.192565e+12
2     DEU  2010  3.468154e+12
3     FRA  2010  2.646230e+12
4     GBR  2010  2.485483e+12

--- DataFrame Info for df_gdp_long ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 112 entries, 0 to 111
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   economy  112 non-null    object 
 1   Year     112 non-null    int64  
 2   GDP      112 non-null    float64
dtypes: float64(1), int64(1), object(1)
memory usage: 2.8+ KB


## Analyse descriptive du PIB

### Subtask:
Calculer des statistiques descriptives clés (moyenne, médiane, écart-type) pour le PIB de chaque pays sur la période analysée.


**Reasoning**:
To calculate the descriptive statistics for each country's GDP over the analyzed period, I will group the `df_gdp_long` DataFrame by the 'economy' column and then apply the `.describe()` method to the 'GDP' column for each group. This will provide the mean, median, standard deviation, and other relevant statistics.



In [9]:
print("\n--- Descriptive Statistics of GDP by Country ---")
df_gdp_stats = df_gdp_long.groupby('economy')['GDP'].describe()
print(df_gdp_stats)


--- Descriptive Statistics of GDP by Country ---
         count          mean           std           min           25%  \
economy                                                                  
BRA       14.0  2.068610e+12  3.436189e+11  1.476107e+12  1.819981e+12   
CHN       14.0  1.262309e+13  3.960065e+12  6.192565e+12  9.975976e+12   
DEU       14.0  3.884012e+12  3.233821e+11  3.423568e+12  3.639195e+12   
FRA       14.0  2.738970e+12  1.744748e+11  2.442483e+12  2.646654e+12   
GBR       14.0  2.860704e+12  2.397849e+11  2.485483e+12  2.691025e+12   
IND       14.0  2.474090e+12  6.245391e+11  1.675616e+12  1.902323e+12   
JPN       14.0  5.105791e+12  6.270059e+11  4.213167e+12  4.905455e+12   
USA       14.0  1.993306e+13  3.839683e+12  1.504897e+13  1.706255e+13   

                  50%           75%           max  
economy                                            
BRA      2.007719e+12  2.394242e+12  2.616156e+12  
CHN      1.199679e+13  1.488735e+13  1.831677e+13  
D