Ce ficher a pour objectif de mener une analyse préliminaire des données pour essayer de mettre en évidence les variables qui vont être les plus utiles dans la comparaison des PER des entreprises cotées sur le SP500 et celles cotées sur l'Eurostoxx 600.

In [10]:
%pip install pynsee

Note: you may need to restart the kernel to use updated packages.


In [11]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pynsee
import pynsee.download
import seaborn as sns

In [12]:
# Créons un data frame pour l'Eurostoxx 600 à partir du CSV

chemin_fichier = '../Nettoyage des données/df_Eurostoxx_clean.csv'
df_STX600 = pd.read_csv(chemin_fichier)
df_STX600.head()   

Unnamed: 0,Ticker,YahooTicker,Nom,Zone,Sector,Industry,Country,Beta,MarketCapitalizationBN,AnneeFiscale,...,TotalRevenueBN,TotalEquityBN,TotalAssetsBN,TotalDebtBN,Dividendes_Annuels,Annual_Volume_Traded_BN,%MargeNette,%Gearing,%PayOut,Croissance de l'EPS (en %)
0,1COV,1COV.DE,Covestro AG,Eurostoxx,Basic Materials,Specialty Chemicals,Germany,1.043,11.211176,2021,...,15.903,7.696,15.571,2.528,1.3,0.225463,10.161605,32.848233,15.531661,
1,1U1,1U1.DE,1&1 DrillischAktiengesellschaft,Eurostoxx,Communication Services,Telecom Services,Germany,0.41,4.187117,2021,...,3.909659,5.219201,7.06373,0.102285,0.05,0.035485,9.464304,1.959783,2.380952,
2,1U1,1U1.DE,1&1 DrillischAktiengesellschaft,Eurostoxx,Communication Services,Telecom Services,Germany,0.41,4.187117,2022,...,3.963691,5.579841,7.257085,0.102669,0.05,0.016834,9.267322,1.839999,2.403846,-0.952381
3,1U1,1U1.DE,1&1 DrillischAktiengesellschaft,Eurostoxx,Communication Services,Telecom Services,Germany,0.41,4.187117,2023,...,4.096701,5.887074,7.740306,0.188507,0.05,0.024756,7.687893,3.202049,2.793296,-13.942308
4,1U1,1U1.DE,1&1 DrillischAktiengesellschaft,Eurostoxx,Communication Services,Telecom Services,Germany,0.41,4.187117,2024,...,4.064254,6.09397,8.130073,0.412959,0.05,0.013964,5.235007,6.776518,4.132231,-32.402235


In [13]:
# Créons un data frame pour le SP500 à partir du CSV

chemin_fichier = '../Nettoyage des données/df_SP500_clean.csv'
df_SP500 = pd.read_csv(chemin_fichier)
df_SP500.head()   

Unnamed: 0,Ticker,YahooTicker,Nom,Zone,Sector,Industry,Country,Beta,MarketCapitalizationBN,AnneeFiscale,...,TotalRevenueBN,TotalEquityBN,TotalAssetsBN,TotalDebtBN,Dividendes_Annuels,Annual_Volume_Traded_BN,%MargeNette,%Gearing,%PayOut,Croissance de l'EPS (en %)
0,A,A,A,USA,Healthcare,Diagnostics & Research,United States,1.274,39.8715,2021,...,6.319,5.389,10.705,2.729,0.776,0.407757,19.148599,50.640193,19.497487,
1,A,A,A,USA,Healthcare,Diagnostics & Research,United States,1.274,39.8715,2022,...,6.848,5.305,10.532,2.769,1.065,0.427283,18.311916,52.196041,25.417661,5.276382
2,A,A,A,USA,Healthcare,Diagnostics & Research,United States,1.274,39.8715,2023,...,6.833,5.845,10.763,2.735,0.911,0.461257,18.147227,46.79213,21.587678,0.71599
3,A,A,A,USA,Healthcare,Diagnostics & Research,United States,1.274,39.8715,2024,...,6.51,5.898,11.846,3.39,0.956,0.431039,19.800307,57.477111,21.531532,5.21327
4,AAPL,AAPL,AAPL,USA,Technology,Consumer Electronics,United States,1.107,4113.459053,2022,...,394.328,50.672,352.755,132.48,0.91,22.065504,25.309641,261.446164,14.796748,


Vérifions que nous avons récolté des données cohérentes

In [14]:
# On filtre, on trie par ordre décroissant (ascending=False), et on prend les 5 premières valorisations boursières du SP500 en 2024
top_5_equity_SP500 = df_SP500[df_SP500['AnneeFiscale'] == 2024].sort_values(by='MarketCapitalizationBN', ascending=False).head(5)
top_5_equity_SP500.head()

Unnamed: 0,Ticker,YahooTicker,Nom,Zone,Sector,Industry,Country,Beta,MarketCapitalizationBN,AnneeFiscale,...,TotalRevenueBN,TotalEquityBN,TotalAssetsBN,TotalDebtBN,Dividendes_Annuels,Annual_Volume_Traded_BN,%MargeNette,%Gearing,%PayOut,Croissance de l'EPS (en %)
6,AAPL,AAPL,AAPL,USA,Technology,Consumer Electronics,United States,1.107,4113.459053,2024,...,391.035,56.95,364.98,106.629,0.99,14.351428,23.971256,187.23266,16.202946,-0.811688
624,GOOGL,GOOGL,GOOGL,USA,Communication Services,Internet Content & Information,United States,1.07,3840.511312,2024,...,350.018,325.084,450.256,25.461,0.6,6.901337,28.603672,7.83213,7.380074,39.212329
620,GOOG,GOOG,GOOG,USA,Communication Services,Internet Content & Information,United States,1.07,3835.825226,2024,...,350.018,325.084,450.256,25.461,0.6,4.950585,28.603672,7.83213,7.380074,39.212329
991,MSFT,MSFT,MSFT,USA,Technology,Software - Infrastructure,United States,1.07,3657.266364,2024,...,245.122,268.477,512.163,67.127,3.08,5.174582,35.955973,25.002887,25.969646,22.016461
95,AMZN,AMZN,AMZN,USA,Consumer Cyclical,Internet Retail,United States,1.372,2436.513923,2024,...,637.959,285.97,624.894,130.9,0.0,10.300864,9.287117,45.774032,0.0,


In [15]:
# On filtre, on trie par ordre décroissant (ascending=False), et on prend les 5 premières valorisations boursières du STX600 en 2024
top_5_equity_STX600 = df_STX600[df_STX600['AnneeFiscale'] == 2024].sort_values(by='MarketCapitalizationBN', ascending=False).head(5)
top_5_equity_STX600.head()

Unnamed: 0,Ticker,YahooTicker,Nom,Zone,Sector,Industry,Country,Beta,MarketCapitalizationBN,AnneeFiscale,...,TotalRevenueBN,TotalEquityBN,TotalAssetsBN,TotalDebtBN,Dividendes_Annuels,Annual_Volume_Traded_BN,%MargeNette,%Gearing,%PayOut,Croissance de l'EPS (en %)
597,INVE B,INVE-B.ST,Investor AB - Class B Shares,Eurostoxx,Financial Services,Asset Management,Sweden,0.787,972.7768,2024,...,168.909,819.364,952.09,98.937,4.8,0.610232,67.100036,12.074853,12.972973,-10.800386
131,ATCO A,ATCO-A.ST,Atlas Copco AB - Class A Shares,Eurostoxx,Industrials,Specialty Industrial Machinery,Sweden,0.937,812.441993,2024,...,176.771,113.7,208.538,34.708,2.8,0.941863,16.847786,30.525945,45.826514,6.076389
1185,VOLV B,VOLV-B.ST,Volvo AB - Class B Shares,Eurostoxx,Industrials,Farm & Heavy Construction Machinery,Sweden,0.88,589.974471,2024,...,526.816,194.048,714.564,258.851,18.0,0.743307,9.56482,133.395345,72.639225,1.142857
126,ASSA B,ASSA-B.ST,Assa Abloy AB - Class B Shares,Eurostoxx,Industrials,Security & Protection Services,Sweden,0.828,392.215134,2024,...,150.162,107.071,223.605,73.501,5.4,0.357249,10.414752,68.646973,38.352273,14.751426
973,SEB A,SEB-A.ST,Skandinaviska Enskilda Banken - Class A Shares,Eurostoxx,Financial Services,Banks - Regional,Sweden,0.308,374.901834,2024,...,81.61,231.148,3759.028,953.911,11.5,0.755741,43.94682,412.684081,65.676756,-3.791209


Je regarde si il y a des lignes qui ont le même Net income 

In [20]:
# 2. On repère les doublons sur la colonne NetIncomeBN
# keep=False est important : il garde TOUTES les lignes concernées (A et B), pas juste la copie (B)
masque_doublons = df_SP500.duplicated(subset=['NetIncomeBN'], keep=False)

# 3. On filtre le DataFrame et on trie par montant pour voir les paires ensemble
resultat = df_SP500[masque_doublons].sort_values(by='NetIncomeBN')

print(f"Nombre de lignes concernées : {len(resultat)}")
resultat.head()

Nombre de lignes concernées : 139


Unnamed: 0,Ticker,YahooTicker,Nom,Zone,Sector,Industry,Country,Beta,MarketCapitalizationBN,AnneeFiscale,...,TotalRevenueBN,TotalEquityBN,TotalAssetsBN,TotalDebtBN,Dividendes_Annuels,Annual_Volume_Traded_BN,%MargeNette,%Gearing,%PayOut,Croissance de l'EPS (en %)
1388,TTWO,TTWO,TTWO,USA,Communication Services,Electronic Gaming & Multimedia,United States,0.957,45.960036,2022,...,3.5048,3.8097,6.5463,0.2502,0.0,0.595163,11.926501,6.567446,0.0,
841,LH,LH,LH,USA,Healthcare,Diagnostics & Research,United States,0.961,21.444786,2023,...,12.1616,7.875,16.7251,5.9542,2.677114,0.18741,3.437048,75.608889,55.773208,-65.836299
880,LUV,LUV,LUV,USA,Industrials,Airlines,United States,1.167,20.035912,2024,...,27.483,10.35,33.75,8.058,0.72,2.245677,1.691955,77.855072,92.307692,-7.142857
879,LUV,LUV,LUV,USA,Industrials,Airlines,United States,1.167,20.035912,2023,...,26.091,10.515,36.487,9.2,0.9,1.854596,1.782224,87.494056,107.142857,-7.692308
1060,NWSA,NWSA,NWSA,USA,Communication Services,Entertainment,United States,0.973,15.137602,2022,...,10.385,8.222,17.221,4.155,0.2,0.723795,5.999037,50.53515,18.867925,


Problème à gérer. 