# ✈️ Air Crashes Analysis (1908-2023)

## 📌 Introduction
This notebook analyzes global air crashes from **1908-2023** using Python.
The goal is to uncover trends, most affected airlines, and geographical patterns.

### **Dataset Details**
- **Source**: [Kaggle Air Crashes Full Data (1908-2023)](https://www.kaggle.com/datasets/jogwums/air-crashes-full-data-1908-2023)
- **Total Records**: 5,000+ air crashes
- **Key Features**:
  - Date (Year, Month, Day)
  - Location (Country/Region, Latitude, Longitude)
  - Aircraft Details (Manufacturer, Model, Operator)
  - Severity (Fatalities, Survivors, Total Aboard)


## 🛠️ 1. Load and Clean the Data
- Load the raw dataset
- Apply data cleaning functions
- Add missing latitude & longitude using **Nominatim API**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from src.data_cleaning import load_data, clean_data
from src.visualization import *

# Display settings
pd.set_option('display.max_columns', None)

# Load the cleaned dataset
df = pd.read_csv('../data/processed/cleaned_aircrashes.csv')
df.head()

## 📊 2. Exploratory Data Analysis (EDA)
- Check for missing values & data types
- Overview of air crashes trends
- Most affected countries & airlines

In [None]:
# Check for missing values
df.isnull().sum()

## ✈️ 3. Data Visualizations
- **Crashes Over Time** (per year, quarter, month)
- **Geographical Distribution** of crashes
- **Top Aircraft Models & Manufacturers**
- **Correlation Heatmap** (to analyze crash severity)
- **Fatality Trends & Survival Rates**

In [None]:
# Generate visualizations
plot_crashes_over_time(df)
plot_crashes_by_location(df)
plot_common_aircraft_manufacturers(df)
plot_correlation_heatmap(df)

## 🔮 4. Conclusion & Next Steps
- Key takeaways from the analysis
- Future improvements:
  - **Interactive map using Folium** 🌍
  - **Power BI Dashboard** 📊
  - **Advanced Predictive Analysis** 🔍