### Kaggle API

###### Excecution Steps:

1. Install Kaggle_API
2. Set Kaggle API Token
3. Search for Dataset
4. Download Dataset

Set Kaggle API Token
* Download *kaggle.json* file from Kaggle
* Create *.kaggle* folder in ROOT directory
* Paste *kaggle.json* to ROOT directory

### Search Kaggle

In [66]:
!kaggle datasets list -s "Global Disaster Events" 

ref                                                             title                                                     size  lastUpdated                 downloadCount  voteCount  usabilityRating  
--------------------------------------------------------------  --------------------------------------------------  ----------  --------------------------  -------------  ---------  ---------------  
zubairdhuddi/global-daset                                       Global Disaster Response Analysis (2018–2024)          1737432  2025-11-11 10:19:57                  1452         30  1.0              
elvinrustam/global-disaster-events-20002025                     Global Disaster Events (2000–2025)                     2295712  2025-09-02 11:03:54.873000            435          6  1.0              
nasa/landslide-events                                           Landslides After Rainfall, 2007-2016                    130677  2017-01-19 15:58:33.987000          14383        105  0.8235294        


### Download the Dataset

In [67]:
!kaggle datasets download -d "elvinrustam/global-disaster-events-20002025"

Dataset URL: https://www.kaggle.com/datasets/elvinrustam/global-disaster-events-20002025
License(s): CC0-1.0
global-disaster-events-20002025.zip: Skipping, found more recently modified local copy (use --force to force download)


### Unzip and Read the Data

Unzip the ".zip" file using zipfile library 

In [68]:
import zipfile
import os

In [69]:
with zipfile.ZipFile("global-disaster-events-20002025.zip", "r") as file:
    file.extractall("Global Disaster Events")

In [70]:
os.listdir("Global Disaster Events")

['.ipynb_checkpoints', 'Clean', 'gde-preclean.csv', 'Unclean']

In [71]:
# os.listdir("Global Disaster Events/Unclean")

#### `Wrangle` and Read Data

In [72]:
import warnings

import pandas as pd

warnings.simplefilter(action="ignore", category=FutureWarning)

In [73]:
def wrangle(filepath):
    # Read CSV file into dataframe
    df = pd.read_csv(filepath)
    return df

In [74]:
Drought = wrangle("Unclean/Drought.csv")
print("Drought datashape:", Drought.shape)
Drought.head(2)

Drought datashape: (1780, 22)


Unnamed: 0,coordinates,name,description,htmldescription,alertlevel,alertscore,episodealertlevel,episodealertscore,country,fromdate,...,severitytext,source,iso3,eventtype,GDACS ID,Name,Countries,Start Date,Duration,Impact
0,"[11.087, 53.882]","Drought in Germany, Denmark, France, Latvia, P...","Drought in Germany, Denmark, France, Latvia, P...","Green Drought in Germany, Denmark, France, Lat...",Green,1,Green,0.25,"Germany, Denmark, France, Latvia, Poland, Sweden",2017-07-21T00:00:00,...,Minor impact for agricultural drought in 80936...,GDO,DEU,DR,DR 1012168,Central Northern Europe-2018,"Germany, Denmark, France, Latvia, Poland, Sweden",End of Jul 2017,569 days (at 18 Feb 2019),Minor impact for agricultural drought in 80936...
1,"[11.087, 53.882]","Drought in Germany, Denmark, France, Latvia, P...","Drought in Germany, Denmark, France, Latvia, P...","Green Drought in Germany, Denmark, France, Lat...",Green,1,Green,0.25,"Germany, Denmark, France, Latvia, Poland, Sweden",2017-07-21T00:00:00,...,Minor impact for agricultural drought in 80936...,GDO,DEU,DR,DR 1012168,Central Northern Europe-2018,"Germany, Denmark, France, Latvia, Poland, Sweden",End of Jul 2017,569 days (at 18 Feb 2019),Minor impact for agricultural drought in 80936...


In [75]:
# Drought.info()

## Clean `Drought Data`

In [76]:
# Split coordinates into longitudes and latitudes
Drought[['longitude', 'latitude']] = Drought['coordinates'].str.strip("[]").str.split(",", expand=True)

In [77]:
from country_converter import CountryConverter

cc = CountryConverter()
Drought["Country"] = cc.convert(Drought["iso3"], to = "name_short")

for time in ["fromdate", "todate"]:
    Drought[time] = pd.to_datetime(Drought[time]).dt.tz_localize('UTC').dt.tz_convert("Africa/Nairobi")

In [78]:
#Rename column "severity" to "severity(km2)" in Drought dataframe
Drought["severity(km2)"] = Drought["severity"].astype(int)

# Extract the number of days the event lasted
Drought["Duration(days)"] = Drought["Duration"].str.split().str[0].astype(int)

In [79]:
Drought_data = Drought.copy()
DROUGHT = Drought_data[['alertlevel','alertscore', 'fromdate', 'todate', 'severity(km2)', 'iso3', 'eventtype','Impact', 'longitude', 'latitude', 'Country', 'Duration(days)']]

In [80]:
# DROUGHT.info()

## Clean `Earthquake Data`

In [81]:
Earthquake = wrangle("Unclean/Earthquake.csv")
print("Earthquake datashape:", Earthquake.shape)
Earthquake.head(2)

Earthquake datashape: (20296, 20)


Unnamed: 0,coordinates,name,description,htmldescription,alertlevel,alertscore,episodealertlevel,episodealertscore,country,fromdate,todate,severity,severitytext,source,iso3,eventtype,Earthquake Magnitude,Depth,Exposed Population,Capacity
0,"[70.4, 36.2]",Earthquake in Afghanistan,Earthquake in Afghanistan,Red M 6.4 Earthquake in Afghanistan at: 19 Jan...,Red,3,Red,3.0,Afghanistan,2000-01-19T00:00:00,2000-01-19T00:00:00,6.4,magnitude 6.4M and depth 0km,NEIC,AFG,EQ,6.4M,0 Km,460000 people within 100km,
1,"[58.175, 35.217]",Earthquake in Iran,Earthquake in Iran,Green M 4.9 Earthquake in Iran at: 14 Feb 2000...,Green,1,Green,0.5,Iran,2000-02-14T00:00:00,2000-02-14T00:00:00,4.9,magnitude 4.9M and depth 0km,NEIC,IRN,EQ,4.9M,0 Km,440000 people within 100km,


In [82]:
# Print info
Earthquake.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20296 entries, 0 to 20295
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   coordinates           20296 non-null  object 
 1   name                  20296 non-null  object 
 2   description           20296 non-null  object 
 3   htmldescription       20296 non-null  object 
 4   alertlevel            20296 non-null  object 
 5   alertscore            20296 non-null  int64  
 6   episodealertlevel     20296 non-null  object 
 7   episodealertscore     20296 non-null  float64
 8   country               18490 non-null  object 
 9   fromdate              20296 non-null  object 
 10  todate                20296 non-null  object 
 11  severity              20296 non-null  float64
 12  severitytext          20296 non-null  object 
 13  source                20296 non-null  object 
 14  iso3                  19215 non-null  object 
 15  eventtype          

In [83]:
Earthquake[['longitude', 'latitude']] = Earthquake['coordinates'].str.strip("[]").str.split(",", expand=True)

In [84]:
# Remove "NaN" values 
Earthquake.dropna(inplace= True)

In [85]:
Earthquake["iso3"] = Earthquake["iso3"].replace("ROM", "ROU")

In [86]:

Earthquake["Country"] = cc.convert(Earthquake["iso3"], to = "name_short")

for time in ["fromdate", "todate"]:
    Earthquake[time] = pd.to_datetime(Earthquake[time]).dt.tz_localize('UTC').dt.tz_convert("Africa/Nairobi")

In [87]:
# Remove the 'M' and convert to float
Earthquake["Magnitude(M)"] = Earthquake["Earthquake Magnitude"].str.replace("M", "", regex=False).astype(float)

# Depth of earthquake in km
Earthquake["Depth(km)"] = Earthquake["Depth"].str.replace(r'[^\d.]', '', regex=True)

In [88]:
Earthquake_data = Earthquake.copy()
EARTHQUAKE = Earthquake_data[['alertlevel','alertscore', 'fromdate', 'todate', 'Magnitude(M)', 
  'iso3', 'eventtype','Depth(km)', 'longitude', 'latitude', 'Country']]

In [89]:
# EARTHQUAKE.info()

## Clean `Eruption Data`

In [90]:
Eruption = wrangle("Unclean/Eruption.csv")
print("Eruption datashape:", Eruption.shape)
Eruption.head(2)

Eruption datashape: (186, 22)


Unnamed: 0,coordinates,name,description,htmldescription,alertlevel,alertscore,episodealertlevel,episodealertscore,country,fromdate,...,severitytext,source,iso3,eventtype,GDACS ID,Name,Exposed Population 30km,Exposed Population 100km,Max Volc. Explosivity Index VEI,Population Exposure Index PEI
0,"[-130.55, 56.583333]",Eruption Iskut-Unuk River Cones,Eruption Iskut-Unuk River Cones,Green Eruption Iskut-Unuk River Cones in Canad...,Green,1,Green,0.0,Canada,2011-09-20T18:00:00,...,,MONTREAL,,VO,VO 360,Iskut-Unuk River Cones (),,No people affected,,
1,"[-72.966666, -45.9]","Eruption Hudson, Cerro","Eruption Hudson, Cerro","Green Eruption Hudson, Cerro in Chile-S at: 28...",Green,1,Green,0.0,Chile-S,2011-10-28T09:48:00,...,,BUENOS AIRES,,VO,VO 484,"Hudson, Cerro ()",,No people affected,,


In [91]:
# Eruption.info()

In [92]:
Eruption[['longitude', 'latitude']] = Eruption['coordinates'].str.strip("[]").str.split(",", expand=True)

In [93]:
# Remove "NaN" values row-wise and column-wise
Eruption = Eruption.dropna(axis=1, how='all').dropna(axis=0, how='any').reset_index(drop=True)

In [94]:
# Eruption.info()

In [95]:
Eruption.rename(columns={"Max Volc. Explosivity Index VEI": "Severity(VEI)"}, inplace=True)

Eruption["iso3"] = Eruption["iso3"].replace("ROM", "ROU")

In [96]:

Eruption["Country"] = cc.convert(Eruption["iso3"], to = "name_short")

for time in ["fromdate", "todate"]:
    Eruption[time] = pd.to_datetime(Eruption[time]).dt.tz_localize('UTC').dt.tz_convert("Africa/Nairobi")

In [97]:
Eruption_data = Eruption.copy()
ERUPTION = Eruption_data[['alertlevel','alertscore', 'fromdate', 'todate', 'Severity(VEI)', 
  'iso3', 'eventtype', 'longitude', 'latitude', 'Country']]

In [98]:
# ERUPTION.info()

## Clean `Flood Data`

In [99]:
Flood = wrangle("Unclean/Flood.csv")
print("Flood datashape:", Flood.shape)
Flood.head(2)

Flood datashape: (2441, 19)


Unnamed: 0,coordinates,name,description,htmldescription,alertlevel,alertscore,episodealertlevel,episodealertscore,country,fromdate,todate,severity,severitytext,source,iso3,eventtype,Death,Displaced,GDACS_ID
0,"[31.712, -27.822]",Flood in Mozambique,Flood in Mozambique,ORANGE Flood in Mozambique from: 26 Jan 2000 0...,ORANGE,2,ORANGE,2.0,Mozambique,2000-01-26T00:00:00,2000-03-27T23:59:59,7.74,Magnitude 7.74,DFO,MOZ,FL,929.0,733000,FL 1583
1,"[126.1, 7.199]",Flood in Philippines,Flood in Philippines,GREEN Flood in Philippines from: 28 Jan 2000 0...,GREEN,1,GREEN,1.0,Philippines,2000-01-28T00:00:00,2000-02-01T23:59:59,4.92,Magnitude 4.92,DFO,PHL,FL,23.0,20000,FL 1584


In [100]:
Flood.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2441 entries, 0 to 2440
Data columns (total 19 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   coordinates        2441 non-null   object 
 1   name               2441 non-null   object 
 2   description        2441 non-null   object 
 3   htmldescription    2441 non-null   object 
 4   alertlevel         2441 non-null   object 
 5   alertscore         2441 non-null   int64  
 6   episodealertlevel  2441 non-null   object 
 7   episodealertscore  2438 non-null   float64
 8   country            2441 non-null   object 
 9   fromdate           2441 non-null   object 
 10  todate             2441 non-null   object 
 11  severity           2441 non-null   float64
 12  severitytext       2441 non-null   object 
 13  source             2441 non-null   object 
 14  iso3               2330 non-null   object 
 15  eventtype          2441 non-null   object 
 16  Death              2440 

In [101]:
Flood[['longitude', 'latitude']] = Flood['coordinates'].str.strip("[]").str.split(",", expand=True)

In [102]:
# Remove "NaN" values 
Flood.dropna(inplace= True)

In [103]:
Flood["iso3"] = Flood["iso3"].replace("ROM", "ROU")

In [104]:
mapping = {
    "ANG": "AGO",  # Angola
    "RWD": "RWA",  # Rwanda
    "ALG": "DZA",  # Algeria
}

Flood["iso3"] = Flood["iso3"].replace(mapping)

# Convert
Flood["Country"] = cc.convert(Flood["iso3"], to="name_short")

    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
SUA not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not fo

In [105]:
for time in ["fromdate", "todate"]:
    Flood[time] = pd.to_datetime(Flood[time]).dt.tz_localize('UTC').dt.tz_convert("Africa/Nairobi")

In [106]:
# Rename severity of the floods in relation to height
Flood.rename(columns={"severity": "Severity(metres)"}, inplace=True)

# Fill NaN values in Deaths with zero
Flood['Deaths'] = Flood['Death'].fillna(0).astype(int)

In [107]:
Flood_data = Flood.copy()

FLOOD = Flood_data[['alertlevel','alertscore', 'fromdate', 'todate', 'Severity(metres)', 
  'iso3', 'eventtype','Deaths','Displaced', 'longitude', 'latitude', 'Country']]

In [108]:
# FLOOD.info()

## Clean `ForestFires Data`

In [109]:
ForestFires = wrangle("Unclean/Forest Fires.csv")
print("ForestFires datashape:", ForestFires.shape)
ForestFires.head(2)

ForestFires datashape: (3055, 22)


Unnamed: 0,coordinates,name,description,htmldescription,alertlevel,alertscore,episodealertlevel,episodealertscore,country,fromdate,...,severitytext,source,iso3,eventtype,GDACS ID,People affected,Countries,Start Date - Last detection,Duration (days),Burned area
0,"[105.65993703583405, 17.33516236486571]",Forest fires in Laos,Forest fires in Laos,Green Forest fires in Laos from: 23 Feb 2021 t...,Green,1,GREEN,1.0,Laos,2021-02-23T00:00:00,...,Green impact for forestfire in 5070 ha,GWIS,LAO,WF,WF 1000144,0 in the burned area,Laos,23 Feb 2021 - 23 Mar 2021,28,5070 ha
1,"[85.87540845200718, 22.856787081945416]",Forest fires in India,Forest fires in India,Green Forest fires in India from: 25 Feb 2021 ...,Green,1,GREEN,1.0,India,2021-02-25T00:00:00,...,Green impact for forestfire in 9866 ha,GWIS,IND,WF,WF 1000128,35445 in the burned area,India,25 Feb 2021 - 11 Mar 2021,14,9866 ha


In [110]:
# ForestFires.info()

In [111]:
ForestFires[['longitude', 'latitude']] = ForestFires['coordinates'].str.strip("[]").str.split(",", expand=True)

In [112]:

ForestFires["Country"] = cc.convert(ForestFires["iso3"], to = "name_short")

for time in ["fromdate", "todate"]:
    ForestFires[time] = pd.to_datetime(ForestFires[time]).dt.tz_localize('UTC').dt.tz_convert("Africa/Nairobi")

In [113]:
# Rename severity of the floods in relation to height
ForestFires.rename(columns={"severity": "Severity(ha)"}, inplace=True)

# Extract number of people affected
ForestFires["People affected"] = ForestFires["People affected"].str.extract(r'(\d+)').astype(int)

In [114]:
ForestFires_data = ForestFires.copy()

FORESTFIRES = ForestFires_data[['alertlevel','alertscore', 'fromdate', 'todate', 'Severity(ha)', 
  'iso3', 'eventtype','People affected', 'Duration (days)','longitude', 'latitude', 'Country']]

In [115]:
# FORESTFIRES.info()

## Clean `TropicalCyclone Data`

In [116]:
TropicalCyclone = wrangle("Unclean/Tropical Cyclone.csv")
print("TropicalCyclone datashape:", TropicalCyclone.shape)
TropicalCyclone.head(2)

TropicalCyclone datashape: (405, 23)


Unnamed: 0,coordinates,name,description,htmldescription,alertlevel,alertscore,episodealertlevel,episodealertscore,country,fromdate,...,source,iso3,eventtype,GDACS ID,Name,Exposed countries,Exposed population,Maximum wind speed,Maximum storm surge,Vulnerability
0,"[97.5, 18.299999237]",Tropical Cyclone NARGIS-08,Tropical Cyclone NARGIS-08,Red Tropical Cyclone NARGIS-08 off-shore from:...,Red,3,Green,1.0,,2008-04-28T06:00:00,...,JTWC,,TC,TC 10238,NARGIS-08,Off-shore,7 million in Category 1 or higher,74 km/h Tropical storm,n.a.,-- ()
1,"[136.600000000219, 34.3999999997814]",Tropical Cyclone SONGDA-11,Tropical Cyclone SONGDA-11,Orange Tropical Cyclone SONGDA-11 in Japan fro...,Orange,2,Green,1.0,Japan,2011-05-22T00:00:00,...,JRC,,TC,TC 24054,SONGDA-11,Japan,No people in Category 1 or higher,64 km/h Tropical storm,n.a.,Low ()


In [117]:
# Print info
# TropicalCyclone.info()

In [118]:
TropicalCyclone[['longitude', 'latitude']] = TropicalCyclone['coordinates'].str.strip("[]").str.split(",", expand=True)

In [119]:
# Remove "NaN" values 
TropicalCyclone.dropna(inplace= True)

In [120]:
TropicalCyclone["iso3"] = TropicalCyclone["iso3"].replace("ROM", "ROU")

In [121]:

TropicalCyclone["Country"] = cc.convert(TropicalCyclone["iso3"], to = "name_short")

for time in ["fromdate", "todate"]:
    TropicalCyclone[time] = pd.to_datetime(TropicalCyclone[time]).dt.tz_localize('UTC').dt.tz_convert("Africa/Nairobi")

    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3
    not found in ISO3


In [122]:
TropicalCyclone["Max. Windspeed"] = TropicalCyclone["Maximum wind speed"].str.extract(r'(\d+)').astype(int)

In [123]:
TropicalCyclone_data = TropicalCyclone.copy()

TROPICALCYCLONE = TropicalCyclone_data[['alertlevel','alertscore', 'fromdate', 'todate', 
  'iso3', 'eventtype', 'Max. Windspeed','longitude', 'latitude', 'Country']]

In [124]:
# TROPICALCYCLONE.info()

### Concatenate Dataframes

In [125]:
# Concatenate dataframes
df =  pd.concat([DROUGHT,EARTHQUAKE,ERUPTION,FLOOD,FORESTFIRES,TROPICALCYCLONE])

In [126]:
# Print object type, shape, and head
print("df type:", type(df))
print("df shape:", df.shape)
df.head(3)

df type: <class 'pandas.core.frame.DataFrame'>
df shape: (11221, 22)


Unnamed: 0,alertlevel,alertscore,fromdate,todate,severity(km2),iso3,eventtype,Impact,longitude,latitude,...,Magnitude(M),Depth(km),Severity(VEI),Severity(metres),Deaths,Displaced,Severity(ha),People affected,Duration (days),Max. Windspeed
0,Green,1,2017-07-21 03:00:00+03:00,2019-02-09 18:07:00+03:00,80936.0,DEU,DR,Minor impact for agricultural drought in 80936...,11.087,53.882,...,,,,,,,,,,
1,Green,1,2017-07-21 03:00:00+03:00,2019-02-09 18:07:00+03:00,80936.0,DEU,DR,Minor impact for agricultural drought in 80936...,11.087,53.882,...,,,,,,,,,,
2,Green,1,2017-07-21 03:00:00+03:00,2019-02-09 18:07:00+03:00,80936.0,DEU,DR,Minor impact for agricultural drought in 80936...,11.087,53.882,...,,,,,,,,,,


In [127]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 11221 entries, 0 to 404
Data columns (total 22 columns):
 #   Column            Non-Null Count  Dtype                         
---  ------            --------------  -----                         
 0   alertlevel        11221 non-null  object                        
 1   alertscore        11221 non-null  int64                         
 2   fromdate          11221 non-null  datetime64[ns, Africa/Nairobi]
 3   todate            11221 non-null  datetime64[ns, Africa/Nairobi]
 4   severity(km2)     1780 non-null   float64                       
 5   iso3              11221 non-null  object                        
 6   eventtype         11221 non-null  object                        
 7   Impact            1780 non-null   object                        
 8   longitude         11221 non-null  object                        
 9   latitude          11221 non-null  object                        
 10  Country           11221 non-null  object             

In [128]:
df.columns

Index(['alertlevel', 'alertscore', 'fromdate', 'todate', 'severity(km2)',
       'iso3', 'eventtype', 'Impact', 'longitude', 'latitude', 'Country',
       'Duration(days)', 'Magnitude(M)', 'Depth(km)', 'Severity(VEI)',
       'Severity(metres)', 'Deaths', 'Displaced', 'Severity(ha)',
       'People affected', 'Duration (days)', 'Max. Windspeed'],
      dtype='object')

## Save `df`

In [129]:
df.to_csv("Global Disaster Events/gde-preclean.csv", index=False)

## Author

<a href="https://www.linkedin.com/in/andrew-kalumba-harris/">Andrew Kalumba</a><br>
<a href =""> </a>


| Date (YYYY-MM-DD) | Prepared By     | 
| ----------------- | --------------  | 
| 2025-12-06        | Author          | 


## <h3 align="center"> © Data Science 2025.  <h3/>