## 📊 Saudi Arabia Regional Incident Analysis (2024-2025)
This notebook presents an exploratory analysis of regional incident reports across Saudi Arabia. The dataset includes:

### Traffic Accidents

### Drowning Incidents

### Shortness of Breath (SOB) Cases

## Objectives:
Identify the total volume of incidents per region.

Analyze the dominant type of incident in each area.

Compare relative proportions (%) of each type.

Highlight outliers and regions with unusually high percentages.

## Notes:
Data is aggregated at the regional level.

Percentages are calculated row-wise to highlight relative dominance.

This analysis supports resource allocation, public safety planning, and awareness campaigns.

### Sources:
1- total road traffic accidents: https://open.data.gov.sa/en/datasets/view/23d1fe79-9d33-4e89-bef4-a4796e4261cb

2- total cases of shortness of breath: https://open.data.gov.sa/en/datasets/view/a80945a5-014c-4113-9c1e-2bdb9e938ce5

3- total cases of drowning: https://open.data.gov.sa/en/datasets/view/e59b6c45-4b33-4bb5-adf7-c457c48e1d88

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df_acc = pd.read_csv(r"C:\Users\wasee\Desktop\total road traffic Accidents CSV.csv", skiprows=2, names=['code', 'region', 'accidents'])
df_grk = pd.read_csv(r"C:\Users\wasee\Desktop\total cases of drowning CSV.csv", skiprows=2, names=['code', 'region', 'drowning'])
df_sob = pd.read_csv(r"C:\Users\wasee\Desktop\total cases of shortness of breath CSV.csv", skiprows=2, names=['code', 'region', 'sob'])


df_acc = df_acc[~df_acc['region'].astype(str).str.contains('الإجمالي')]
df_grk = df_grk[~df_grk['region'].astype(str).str.contains('الإجمالي')]
df_sob = df_sob[~df_sob['region'].astype(str).str.contains('الإجمالي')]

df_acc.drop(columns='code', inplace=True)
df_grk.drop(columns='code', inplace=True)
df_sob.drop(columns='code', inplace=True)

# دمج البيانات حسب المنطقة
df_merged = df_acc.merge(df_grk, on='region').merge(df_sob, on='region')

# عرض النتيجة
print(df_merged)


             region accidents  drowning     sob
0       مكة المكرمة     81480       426   34988
1   المدينة المنورة      9147        63    8930
2            القصيم      5080        36    2915
3   المنطقة الشرقية     17830       133   10443
4              عسير     10891        60    6462
5              تبوك      4806        27    2724
6              حائل      3029        13    1288
7   الحدود الشمالية      1050         3     598
8             جازان      4684        53    1667
9             نجران      1720        15    1107
10           الباحة      2041        28    1290
11            الجوف      1917         8     914
12              NaN   202,514       865  73,326


In [3]:
df_merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   region     12 non-null     object
 1   accidents  13 non-null     object
 2   drowning   13 non-null     int64 
 3   sob        13 non-null     object
dtypes: int64(1), object(3)
memory usage: 548.0+ bytes


In [4]:
df_merged["accidents"] = df_merged["accidents"].str.replace(",", "").astype(int)

In [5]:
df_merged.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   region     12 non-null     object
 1   accidents  13 non-null     int32 
 2   drowning   13 non-null     int64 
 3   sob        13 non-null     object
dtypes: int32(1), int64(1), object(2)
memory usage: 496.0+ bytes


In [6]:
df_merged["sob"] = df_merged["sob"].str.replace(",", "").astype(int)

In [7]:
df_merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   region     12 non-null     object
 1   accidents  13 non-null     int32 
 2   drowning   13 non-null     int64 
 3   sob        13 non-null     int32 
dtypes: int32(2), int64(1), object(1)
memory usage: 444.0+ bytes


In [14]:
df_merged = df_merged.dropna(subset=["region"])


In [15]:
df_merged.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12 entries, 0 to 11
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   region     12 non-null     object
 1   accidents  12 non-null     int32 
 2   drowning   12 non-null     int64 
 3   sob        12 non-null     int32 
dtypes: int32(2), int64(1), object(1)
memory usage: 384.0+ bytes


In [16]:

def summarize_dataframe(df):
    """طباعة ملخص إحصائي شامل للبيانات"""
    print("🧾 عدد الصفوف × الأعمدة:", df.shape)
    print("\n🔍 معلومات الأعمدة:")
    print(df.dtypes)
    
    print("\n📉 القيم الفارغة لكل عمود:")
    print(df.isnull().sum())
    
    print("\n📊 الإحصائيات العامة:")
    display(df.describe(include='all').transpose())

In [17]:
summarize_dataframe(df_merged)

🧾 عدد الصفوف × الأعمدة: (12, 4)

🔍 معلومات الأعمدة:
region       object
accidents     int32
drowning      int64
sob           int32
dtype: object

📉 القيم الفارغة لكل عمود:
region       0
accidents    0
drowning     0
sob          0
dtype: int64

📊 الإحصائيات العامة:


Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
region,12.0,12.0,مكة المكرمة,1.0,,,,,,,
accidents,12.0,,,,11972.916667,22423.38915,1050.0,2010.0,4745.0,9583.0,81480.0
drowning,12.0,,,,72.083333,116.956447,3.0,14.5,32.0,60.75,426.0
sob,12.0,,,,6110.5,9681.744757,598.0,1242.75,2195.5,7079.0,34988.0


In [18]:
df_merged.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12 entries, 0 to 11
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   region     12 non-null     object
 1   accidents  12 non-null     int32 
 2   drowning   12 non-null     int64 
 3   sob        12 non-null     int32 
dtypes: int32(2), int64(1), object(1)
memory usage: 384.0+ bytes


In [24]:
df_merged.sort_values("accidents", ascending=False).head(3)
df_merged.sort_values("drowning", ascending=False).head(3)
df_merged.sort_values("sob", ascending=False).head(3)


Unnamed: 0,region,accidents,drowning,sob
0,مكة المكرمة,81480,426,34988
3,المنطقة الشرقية,17830,133,10443
1,المدينة المنورة,9147,63,8930



### Analytical Report: Traffic Accidents, Drowning, and Breathing Distress Cases by Region
## Overview:
This report analyzes emergency report data from 12 regions in Saudi Arabia, covering three categories:

Traffic accidents (accidents)

Drowning incidents (drowning)

Shortness of breath cases (sob)

### Top Regions by Case Type:
Category	Leading Region	Number of Reports
Accidents	Makkah	81,480 🚗
Drowning	Makkah	426 🌊
Breathing Distress	Makkah	34,988 🫁

### Makkah ranks highest across all three categories, likely due to high population density and traffic volume. This may suggest the need to strengthen emergency response capabilities in the region.

### Key Observations:
Eastern Province ranks second across multiple categories:

Accidents: 17,830

Drowning: 133

Breathing distress: 10,443

Medina comes third:

Accidents: 9,147

Drowning: 63

Breathing distress: 8,930

## Analytical Recommendations:
### Investigate contributing factors to the high number of drowning cases in inland (non-coastal) regions.

### Conduct temporal analysis (if monthly or seasonal data is available) to identify spikes or trends.

### Correlate with emergency response time and available resources to uncover potential gaps or areas for improvement.

In [28]:
df_merged["total"] = df_merged[["accidents", "drowning", "sob"]].astype(int).sum(axis=1)
df_merged["pct_accidents"] = df_merged["accidents"] / df_merged["total"]
df_merged["pct_drowning"] = df_merged["drowning"] / df_merged["total"]
df_merged["pct_sob"] = df_merged["sob"] / df_merged["total"]

In [29]:
df_merged

Unnamed: 0,region,accidents,drowning,sob,total,pct_accidents,pct_drowning,pct_sob
0,مكة المكرمة,81480,426,34988,116894,0.697042,0.003644,0.299314
1,المدينة المنورة,9147,63,8930,18140,0.504245,0.003473,0.492282
2,القصيم,5080,36,2915,8031,0.632549,0.004483,0.362968
3,المنطقة الشرقية,17830,133,10443,28406,0.627684,0.004682,0.367634
4,عسير,10891,60,6462,17413,0.625452,0.003446,0.371102
5,تبوك,4806,27,2724,7557,0.635967,0.003573,0.360461
6,حائل,3029,13,1288,4330,0.699538,0.003002,0.29746
7,الحدود الشمالية,1050,3,598,1651,0.635978,0.001817,0.362205
8,جازان,4684,53,1667,6404,0.731418,0.008276,0.260306
9,نجران,1720,15,1107,2842,0.605208,0.005278,0.389514


### Emergency Incident Breakdown by Region (Normalized)

This table summarizes the distribution of three types of emergency reports across 12 regions:
- Traffic Accidents (`accidents`)
- Drowning Cases (`drowning`)
- Shortness of Breath (`sob`)

Each region’s total was calculated, and the relative proportion of each incident type was derived.

| Region           | Total Reports | % Accidents | % Drowning | % SOB |
|------------------|----------------|-------------|------------|--------|
| Makkah           | 116,894        | 69.7%       | 0.36%      | 29.9%  |
| Madinah          | 18,140         | 50.4%       | 0.35%      | 49.2%  |
| Qassim           | 8,031          | 63.3%       | 0.45%      | 36.3%  |
| Eastern Region   | 28,406         | 62.8%       | 0.47%      | 36.8%  |
| Asir             | 17,413         | 62.5%       | 0.34%      | 37.1%  |
| Tabuk            | 7,557          | 63.6%       | 0.36%      | 36.0%  |
| Hail             | 4,330          | 69.9%       | 0.30%      | 29.7%  |
| Northern Border  | 1,651          | 63.6%       | 0.18%      | 36.2%  |
| Jazan            | 6,404          | 73.1%       | 0.83%      | 26.0%  |
| Najran           | 2,842          | 60.5%       | 0.53%      | 38.9%  |
| Al Bahah         | 3,359          | 60.8%       | 0.83%      | 38.4%  |
| Al Jawf          | 2,839          | 67.5%       | 0.28%      | 32.2%  |

---

##  Key Insights

1. **Jazan** and **Al Bahah** both have the **highest proportion of drowning cases** at **0.83%**.
   - This is interesting considering their total volume of reports is relatively low (6,404 and 3,359 respectively).
   - This may suggest environmental or infrastructure factors that should be investigated.

2. **Madinah** stands out as the **only region where shortness of breath (SOB) cases nearly equal accidents** (49.2% vs. 50.4%).
   - This could indicate higher prevalence of respiratory issues or better detection/reporting.

3. **Makkah** leads significantly in total volume (116,894 reports).
   - It also shows a **high accident proportion (69.7%)**, implying a possible traffic congestion issue or high urban activity.

4. **Najran** and **Al Bahah** also have relatively **higher SOB proportions** compared to other regions (38.9% and 38.4%).

---


In [30]:
df_merged[["accidents", "drowning", "sob"]].describe()


Unnamed: 0,accidents,drowning,sob
count,12.0,12.0,12.0
mean,11972.916667,72.083333,6110.5
std,22423.38915,116.956447,9681.744757
min,1050.0,3.0,598.0
25%,2010.0,14.5,1242.75
50%,4745.0,32.0,2195.5
75%,9583.0,60.75,7079.0
max,81480.0,426.0,34988.0


##  Descriptive Analysis of Incident Reports

This summary provides insight into the distribution of reports for **Traffic Accidents**, **Drowning**, and **Shortness of Breath (SOB)** across 12 regions.

| Metric      | Accidents       | Drowning        | SOB            |
|-------------|------------------|------------------|----------------|
| Count       | 12 regions       | 12 regions       | 12 regions     |
| Mean        | 11,973 reports   | 72 reports       | 6,111 reports  |
| Std Dev     | 22,423.39        | 116.96           | 9,681.74       |
| Min         | 1,050            | 3                | 598            |
| 25% Quartile| 2,010            | 14.5             | 1,242.75       |
| Median      | 4,745            | 32               | 2,195.5        |
| 75% Quartile| 9,583            | 60.75            | 7,079          |
| Max         | 81,480           | 426              | 34,988         |

###  Insights:

1. **Traffic Accidents** show a high variance, with one region (Makkah) significantly inflating the mean (mean ≈ 11.9K, max = 81K).
2. **Drowning** cases are rare in most regions, but some like **Jazan** and **Al Bahah** reach unusual highs.
3. **SOB reports** have a large range (from 598 to 34,988), showing that in some areas it’s a major concern (especially **Makkah** and **Madinah**).
4. The **standard deviation** confirms high variability across regions, indicating a need for localized public health strategies.




In [36]:
df_merged["dominant_type"] = df_merged[["accidents", "drowning", "sob"]].astype(int).idxmax(axis=1)
df_merged

Unnamed: 0,region,accidents,drowning,sob,total,pct_accidents,pct_drowning,pct_sob,dominant_type
0,مكة المكرمة,81480,426,34988,116894,0.697042,0.003644,0.299314,accidents
1,المدينة المنورة,9147,63,8930,18140,0.504245,0.003473,0.492282,accidents
2,القصيم,5080,36,2915,8031,0.632549,0.004483,0.362968,accidents
3,المنطقة الشرقية,17830,133,10443,28406,0.627684,0.004682,0.367634,accidents
4,عسير,10891,60,6462,17413,0.625452,0.003446,0.371102,accidents
5,تبوك,4806,27,2724,7557,0.635967,0.003573,0.360461,accidents
6,حائل,3029,13,1288,4330,0.699538,0.003002,0.29746,accidents
7,الحدود الشمالية,1050,3,598,1651,0.635978,0.001817,0.362205,accidents
8,جازان,4684,53,1667,6404,0.731418,0.008276,0.260306,accidents
9,نجران,1720,15,1107,2842,0.605208,0.005278,0.389514,accidents


##  Dominant Incident Type by Region

Each region was analyzed to identify which incident type (Accidents, Drowning, or Shortness of Breath) is the most frequent.

| Region           | Dominant Incident Type |
|------------------|------------------------|
| Makkah           | Accidents              |
| Madinah          | Accidents              |
| Qassim           | Accidents              |
| Eastern Region   | Accidents              |
| Asir             | Accidents              |
| Tabuk            | Accidents              |
| Hail             | Accidents              |
| Northern Border  | Accidents              |
| Jazan            | Accidents              |
| Najran           | Accidents              |
| Al Bahah         | Accidents              |
| Al Jawf          | Accidents              |

### **Insight**: Despite differences in population and total reports, traffic accidents dominate in every region. This suggests national-level traffic safety issues and highlights the need for stronger prevention policies.
