# Système de Prédiction des Risques en Aviation — EDA (Exploratory Data Analysis)

Ce notebook analyse le dataset enrichi issu de la fusion entre :
- les données de vols réels (OpenSky Network)
- les données météorologiques aéronautiques (METAR)

Objectifs :
- comprendre la structure des données
- détecter les anomalies
- identifier les patterns de vol et conditions météo
- préparer le feature engineering nécessaire au Machine Learning

L'analyse est effectuée sur le fichier enrichi le plus récent disponible dans `data/processed/`.

In [None]:
import os, sys

PROJECT_ROOT = os.path.abspath("..")
os.chdir(PROJECT_ROOT)
if PROJECT_ROOT not in sys.path:
    sys.path.append(PROJECT_ROOT)

print("Working directory set to:", os.getcwd())



Working directory set to: c:\Users\yassi\OneDrive\Documents\Administratif\Data_Science\aviation-risk-project


In [8]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from src.utils import get_latest_file

sns.set_theme(style="whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)


In [11]:
pattern = "data/processed/flights_enriched_*.csv"

filepath = get_latest_file(pattern)

print("Loaded File:", filepath)

df=pd.read_csv(filepath)

df.head()

Loaded File: data/processed\flights_enriched_20251210_173141.csv


Unnamed: 0,icao24,callsign,origin_country,time_position,last_contact,longitude,latitude,baro_altitude,on_ground,velocity,...,metar_metarType,metar_rawOb,metar_lat,metar_lon,metar_elev,metar_name,metar_cover,metar_clouds,metar_fltCat,metar_fetch_time_utc
0,80162c,AXB522,India,1765380000.0,1765379869,55.6938,24.0396,10668.0,False,251.49,...,METAR,METAR LFPG 101600Z 27006KT 9999 BKN030 BKN190 ...,49.015,2.534,107,"Paris/De Gaulle Arpt, ID, FR",BKN,"[{'cover': 'BKN', 'base': 3000}, {'cover': 'BK...",MVFR,2025-12-10 16:20:04.386928+00:00
1,801638,AXB1120,India,1765380000.0,1765379869,77.9913,27.8394,8839.2,False,252.36,...,METAR,METAR LFPG 101600Z 27006KT 9999 BKN030 BKN190 ...,49.015,2.534,107,"Paris/De Gaulle Arpt, ID, FR",BKN,"[{'cover': 'BKN', 'base': 3000}, {'cover': 'BK...",MVFR,2025-12-10 16:20:04.386928+00:00
2,408120,VIR47GH,United Kingdom,1765380000.0,1765379869,-3.3146,51.7957,8191.5,False,183.03,...,METAR,METAR LFPG 101600Z 27006KT 9999 BKN030 BKN190 ...,49.015,2.534,107,"Paris/De Gaulle Arpt, ID, FR",BKN,"[{'cover': 'BKN', 'base': 3000}, {'cover': 'BK...",MVFR,2025-12-10 16:20:04.386928+00:00
3,88044a,AIQ3228,Thailand,1765380000.0,1765379870,100.0653,12.0238,11277.6,False,239.34,...,METAR,METAR LFPG 101600Z 27006KT 9999 BKN030 BKN190 ...,49.015,2.534,107,"Paris/De Gaulle Arpt, ID, FR",BKN,"[{'cover': 'BKN', 'base': 3000}, {'cover': 'BK...",MVFR,2025-12-10 16:20:04.386928+00:00
4,a2e5ec,SKW4128,United States,1765380000.0,1765379869,-122.9854,44.2956,3444.24,False,171.99,...,METAR,METAR LFPG 101600Z 27006KT 9999 BKN030 BKN190 ...,49.015,2.534,107,"Paris/De Gaulle Arpt, ID, FR",BKN,"[{'cover': 'BKN', 'base': 3000}, {'cover': 'BK...",MVFR,2025-12-10 16:20:04.386928+00:00


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10357 entries, 0 to 10356
Data columns (total 39 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   icao24                10357 non-null  object 
 1   callsign              10175 non-null  object 
 2   origin_country        10357 non-null  object 
 3   time_position         10251 non-null  float64
 4   last_contact          10357 non-null  int64  
 5   longitude             10251 non-null  float64
 6   latitude              10251 non-null  float64
 7   baro_altitude         9356 non-null   float64
 8   on_ground             10357 non-null  bool   
 9   velocity              10356 non-null  float64
 10  true_track            10357 non-null  float64
 11  vertical_rate         9382 non-null   float64
 12  sensors               0 non-null      float64
 13  geo_altitude          9258 non-null   float64
 14  squawk                5973 non-null   float64
 15  spi                