### Kaggle API

###### Excecution Steps:

1. Install Kaggle_API
2. Set Kaggle API Token
3. Search for Dataset
4. Download Dataset

Set Kaggle API Token
* Download *kaggle.json* file from Kaggle
* Create *.kaggle* folder in ROOT directory
* Paste *kaggle.json* to ROOT directory

### Search Kaggle

In [1]:
!kaggle datasets list -s "Global Disaster Events"

ref                                                             title                                                     size  lastUpdated                 downloadCount  voteCount  usabilityRating  
--------------------------------------------------------------  --------------------------------------------------  ----------  --------------------------  -------------  ---------  ---------------  
zubairdhuddi/global-daset                                       Global Disaster Response Analysis (2018–2024)          1737432  2025-11-11 10:19:57                  1394         29  1.0              
nasa/landslide-events                                           Landslides After Rainfall, 2007-2016                    130677  2017-01-19 15:58:33.987000          14378        105  0.8235294        
elvinrustam/global-disaster-events-20002025                     Global Disaster Events (2000–2025)                     2295712  2025-09-02 11:03:54.873000            334          6  1.0              


### Download the Dataset

In [2]:
!kaggle datasets download -d "elvinrustam/global-disaster-events-20002025"

Dataset URL: https://www.kaggle.com/datasets/elvinrustam/global-disaster-events-20002025
License(s): CC0-1.0
global-disaster-events-20002025.zip: Skipping, found more recently modified local copy (use --force to force download)


### Unzip and Read the Data

Unzip the ".zip" file using zipfile library 

In [3]:
import zipfile
import os

In [4]:
with zipfile.ZipFile("global-disaster-events-20002025.zip", "r") as file:
    file.extractall("Global Disaster Events")

In [5]:

os.listdir("Global Disaster Events")

['Clean', 'Unclean']

In [6]:
os.listdir("Global Disaster Events/Unclean")


['Drought.csv',
 'Earthquake.csv',
 'Eruption.csv',
 'Flood.csv',
 'Forest Fires.csv',
 'Tropical Cyclone.csv']

##### Read Data

In [7]:
import pandas as pd

In [8]:
# path = "Global Disaster Events/Unclean"

# files = os.listdir(path)

# dataframes = {}  

# for file in files:
#     file_path = os.path.join(path, file)
#     df_name = file.replace(".csv", "")  
#     df = pd.read_csv(file_path)
#     dataframes[df_name] = df
#     print(f"{df_name} shape: {df.shape}")

In [9]:
path = "Global Disaster Events/Unclean"

# Dictionary to hold DataFrames
dfs = {}

# Loop through CSV files and assign them names 
for i, file in enumerate(sorted(os.listdir(path)), start=1):
    if file.endswith(".csv"):
        df_name = f"DataFrame{i}"
        dfs[df_name] = pd.read_csv(os.path.join(path, file))

df1 = dfs["DataFrame1"]
df2 = dfs["DataFrame2"]
df3 = dfs["DataFrame3"]
df4 = dfs["DataFrame4"]
df5 = dfs["DataFrame5"]
df6 = dfs["DataFrame6"]

In [10]:
#df1.columns

df1.isnull().sum()

coordinates          0
name                 0
description          0
htmldescription      0
alertlevel           0
alertscore           0
episodealertlevel    0
episodealertscore    0
country              0
fromdate             0
todate               0
severity             0
severitytext         0
source               0
iso3                 0
eventtype            0
GDACS ID             0
Name                 0
Countries            0
Start Date           0
Duration             0
Impact               0
dtype: int64

In [11]:
df2.columns

Index(['coordinates', 'name', 'description', 'htmldescription', 'alertlevel',
       'alertscore', 'episodealertlevel', 'episodealertscore', 'country',
       'fromdate', 'todate', 'severity', 'severitytext', 'source', 'iso3',
       'eventtype', 'Earthquake Magnitude', 'Depth', 'Exposed Population',
       'Capacity'],
      dtype='object')

In [12]:
df3.columns

Index(['coordinates', 'name', 'description', 'htmldescription', 'alertlevel',
       'alertscore', 'episodealertlevel', 'episodealertscore', 'country',
       'fromdate', 'todate', 'severity', 'severitytext', 'source', 'iso3',
       'eventtype', 'GDACS ID', 'Name', 'Exposed Population 30km',
       'Exposed Population 100km', 'Max Volc. Explosivity Index VEI',
       'Population Exposure Index PEI'],
      dtype='object')

In [13]:
df4.columns

Index(['coordinates', 'name', 'description', 'htmldescription', 'alertlevel',
       'alertscore', 'episodealertlevel', 'episodealertscore', 'country',
       'fromdate', 'todate', 'severity', 'severitytext', 'source', 'iso3',
       'eventtype', 'Death', 'Displaced', 'GDACS_ID'],
      dtype='object')

In [14]:
df5.columns

Index(['coordinates', 'name', 'description', 'htmldescription', 'alertlevel',
       'alertscore', 'episodealertlevel', 'episodealertscore', 'country',
       'fromdate', 'todate', 'severity', 'severitytext', 'source', 'iso3',
       'eventtype', 'GDACS ID', 'People affected', 'Countries',
       'Start Date - Last detection', 'Duration (days)', 'Burned area'],
      dtype='object')

In [15]:
df6.columns

Index(['coordinates', 'name', 'description', 'htmldescription', 'alertlevel',
       'alertscore', 'episodealertlevel', 'episodealertscore', 'country',
       'fromdate', 'todate', 'severity', 'severitytext', 'source', 'iso3',
       'eventtype', 'GDACS ID', 'Name', 'Exposed countries',
       'Exposed population', 'Maximum wind speed', 'Maximum storm surge',
       'Vulnerability'],
      dtype='object')

###### https://en.wikipedia.org/wiki/Global_Disaster_Alert_and_Coordination_System

##### *Remove 'NaN' values from dataframes with missing data*

In [16]:
# Drop rows with any NaN for multiple dataframes
for df in [df2, df4, df6]:
    df.dropna(inplace=True)

# For DF3, drop empty columns first, then rows
DF3 = df3.dropna(axis=1, how='all').copy()
DF3.dropna(inplace=True)

In [19]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1780 entries, 0 to 1779
Data columns (total 22 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   coordinates        1780 non-null   object 
 1   name               1780 non-null   object 
 2   description        1780 non-null   object 
 3   htmldescription    1780 non-null   object 
 4   alertlevel         1780 non-null   object 
 5   alertscore         1780 non-null   int64  
 6   episodealertlevel  1780 non-null   object 
 7   episodealertscore  1780 non-null   float64
 8   country            1780 non-null   object 
 9   fromdate           1780 non-null   object 
 10  todate             1780 non-null   object 
 11  severity           1780 non-null   float64
 12  severitytext       1780 non-null   object 
 13  source             1780 non-null   object 
 14  iso3               1780 non-null   object 
 15  eventtype          1780 non-null   object 
 16  GDACS ID           1780 

In [20]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3669 entries, 4 to 20295
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   coordinates           3669 non-null   object 
 1   name                  3669 non-null   object 
 2   description           3669 non-null   object 
 3   htmldescription       3669 non-null   object 
 4   alertlevel            3669 non-null   object 
 5   alertscore            3669 non-null   int64  
 6   episodealertlevel     3669 non-null   object 
 7   episodealertscore     3669 non-null   float64
 8   country               3669 non-null   object 
 9   fromdate              3669 non-null   object 
 10  todate                3669 non-null   object 
 11  severity              3669 non-null   float64
 12  severitytext          3669 non-null   object 
 13  source                3669 non-null   object 
 14  iso3                  3669 non-null   object 
 15  eventtype             366

In [21]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 186 entries, 0 to 185
Data columns (total 22 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   coordinates                      186 non-null    object 
 1   name                             186 non-null    object 
 2   description                      186 non-null    object 
 3   htmldescription                  186 non-null    object 
 4   alertlevel                       186 non-null    object 
 5   alertscore                       186 non-null    int64  
 6   episodealertlevel                186 non-null    object 
 7   episodealertscore                186 non-null    float64
 8   country                          186 non-null    object 
 9   fromdate                         186 non-null    object 
 10  todate                           186 non-null    object 
 11  severity                         186 non-null    float64
 12  severitytext          

### Concatenate Dataframes

In [17]:
df = pd.concat([df1, df2, DF3, df4, df5, df6])
print(df.shape)
df.head(2)

(11221, 42)


Unnamed: 0,coordinates,name,description,htmldescription,alertlevel,alertscore,episodealertlevel,episodealertscore,country,fromdate,...,GDACS_ID,People affected,Start Date - Last detection,Duration (days),Burned area,Exposed countries,Exposed population,Maximum wind speed,Maximum storm surge,Vulnerability
0,"[11.087, 53.882]","Drought in Germany, Denmark, France, Latvia, P...","Drought in Germany, Denmark, France, Latvia, P...","Green Drought in Germany, Denmark, France, Lat...",Green,1,Green,0.25,"Germany, Denmark, France, Latvia, Poland, Sweden",2017-07-21T00:00:00,...,,,,,,,,,,
1,"[11.087, 53.882]","Drought in Germany, Denmark, France, Latvia, P...","Drought in Germany, Denmark, France, Latvia, P...","Green Drought in Germany, Denmark, France, Lat...",Green,1,Green,0.25,"Germany, Denmark, France, Latvia, Poland, Sweden",2017-07-21T00:00:00,...,,,,,,,,,,


In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 11221 entries, 0 to 404
Data columns (total 42 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   coordinates                      11221 non-null  object 
 1   name                             11221 non-null  object 
 2   description                      11221 non-null  object 
 3   htmldescription                  11221 non-null  object 
 4   alertlevel                       11221 non-null  object 
 5   alertscore                       11221 non-null  int64  
 6   episodealertlevel                11221 non-null  object 
 7   episodealertscore                11221 non-null  float64
 8   country                          11221 non-null  object 
 9   fromdate                         11221 non-null  object 
 10  todate                           11221 non-null  object 
 11  severity                         11221 non-null  float64
 12  severitytext             

## Author

<a href="https://www.linkedin.com/in/andrew-kalumba-harris/">Andrew Kalumba</a><br>
<a href =""> </a>


| Date (YYYY-MM-DD) | Prepared By     | 
| ----------------- | --------------  | 
| 2025-12-13        | Author          | 


## <h3 align="center"> © Data Science 2025.  <h3/>