# HURDAT

## Data description

Data about hurricanes from [National Oceanic and Atmospheric Administration](https://www.aoml.noaa.gov/). The data is available at the following [link](https://www.aoml.noaa.gov/hrd/hurdat/Data_Storm.html) with the name __Hurdat 2__. The documentation about the data is in the [link](https://www.aoml.noaa.gov/hrd/hurdat/hurdat2-format.pdf). The data is already on the repository into the folder `data/hurdat` with the name `hurdat2.txt`.

The collected information are:

- Identification of the hurricane (code, name).
- Date of the registration with hour information, data collected in intervals of 6 hours.
- Position of the hurricane in longitude x latitude.
- Measurements of wind and pressure.
- Measurements of the size of the hurricane considering a threshold of wind velocity (only after 2004). 

For a specific hurricane, we have a time series of measurements of size, position, wind and pressure, so the data permits a temporal analysis of hurricanes.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
import sys
import os
module_path = os.path.abspath(os.path.join('../scripts'))
sys.path.append(module_path)
from coordinates import convert_to_web_mercator

## Preprocessing text file

The original data is not well formated, the following code will transform it in a `csv`.

In [6]:
df = {"cod": [], "name":[], "rows": [], 
      "year":[], "month":[], "day":[], "hour":[], "minute":[],
      "record":[], "status":[], "latitude": [], "longitude":[], "wind" :[], "pressure":[],
      "34_rad_ne": [], "34_rad_se":[], "34_rad_sw": [], "34_rad_nw": [],
      "50_rad_ne": [], "50_rad_se":[], "50_rad_sw": [], "50_rad_nw": [],
      "64_rad_ne": [], "64_rad_se":[], "64_rad_sw": [], "64_rad_nw": []}

with open("../data/hurdat/hurdat2.txt", "r") as f:
    txt = f.readlines()
    
cod = ""
name = ""
rows = 0
for line in txt:
    values = line.split(",")
    
    if len(values) <= 6:
        cod = values[0]
        name = values[1].strip()
        rows = values[2]
    elif len(values) == 21:
        df['cod'].append(cod)
        df['name'].append(name)
        df['rows'].append(rows)
        
        year = int(values[0][:4])
        month = int(values[0][4:6])
        day = int(values[0][6:])
        time = values[1].strip()
        hour = int(time[0:2])
        minute = int(time[2:])
        
        df['year'].append(year)
        df['month'].append(month)
        df['day'].append(day)
        df['hour'].append(hour)
        df['minute'].append(minute)
        
        record = values[2]
        status = values[3].strip()
        
        df['record'].append(record)
        df['status'].append(status)
        
        
        if values[4][-1] == 'S':
            latitude = float(values[4][:-1])*-1
        else:
            latitude = float(values[4][:-1])
            
        if values[5][-1] == 'W':
            longitude = float(values[5][:-1])*-1
        else:
            longitude = float(values[5][:-1])
        df['latitude'].append(latitude)
        df['longitude'].append(longitude)
        
        wind = values[6]
        pressure = values[7]
        df['wind'].append(wind)
        df['pressure'].append(pressure)
        
        col_i = 0
        keys = ["34_rad_ne", "34_rad_se", "34_rad_sw", "34_rad_nw",
                "50_rad_ne", "50_rad_se", "50_rad_sw", "50_rad_nw",
                "64_rad_ne", "64_rad_se","64_rad_sw", "64_rad_nw"]
        while (8 + col_i) < len(values) - 1:
            num_val = float(values[8 + col_i])
            if num_val == -999:
                num_val = np.nan
            df[keys[col_i]].append(num_val)
            col_i +=1
            
df = pd.DataFrame(df)
df['date'] = pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']])
df.to_csv("../data/hurdat/hurricanes_clean.csv", index = False)

## Creating objects dataframe

- Each different hurricane will be an object. 
- We remove hurricanes with no area information (prior 2004).
- Use a projection to the Web Mercator system so we get a positioning system that is "close to linear". 
- The year information of the data isn't important, we want to compare hurricanes from different years, so it only matters the month, day and hour of the date.
- We consider the region of the hurricane as the octagon formed by its extension on each of the cardial directions.

In [7]:
df = pd.read_csv("../data/hurdat/hurricanes_clean.csv")
df['date'] = pd.to_datetime(df.date)

#Removing without radium values
df = df.dropna().copy().reset_index(drop = True)

#Creating object column
cods = df.cod.unique()
print(f"Total of unique hurricanes: {len(cods)}")
cods_id = dict([(cods[i], i) for i in range(len(cods))])
df['cod_i'] = df.cod.map(cods_id)
df['object'] = df['cod_i']

#Projecting data to Web Mercator
df = convert_to_web_mercator(df)
df['xcenter'] = df['longitude_merc']/1000
df['ycenter'] = df['latitude_merc']/1000

#Creating time column (in seconds)
df['time'] = (df.month - 1)*30*24*3600 + (df.day-1) * 24*3600 + df.hour * 3600
                                    
#Calculating area
min_val = 1
points = []
points_coords =  []
for i in range(df.shape[0]):
    arrow = lambda x : np.array([np.cos(x*np.pi/4), np.sin(x*np.pi/4)])
    p = df.loc[i, ['xcenter', 'ycenter']].values
    p_coord = df.loc[i, ["longitude", "latitude"]].values
    dir_n = max((df['34_rad_ne'].iloc[i] + df['34_rad_nw'].iloc[i])/2, min_val) * arrow(2)
    dir_ne = max(df['34_rad_ne'].iloc[i], min_val) * arrow(1)
    dir_e = max((df['34_rad_se'].iloc[i] + df['34_rad_ne'].iloc[i])/2, min_val) * arrow(8)
    dir_se = max(df['34_rad_se'].iloc[i], min_val) * arrow(7)
    dir_s = max((df['34_rad_se'].iloc[i] + df['34_rad_sw'].iloc[i])/2, min_val) * arrow(6)
    dir_sw = max(df['34_rad_sw'].iloc[i], min_val) * arrow(5)
    dir_w = max((df['34_rad_sw'].iloc[i] + df['34_rad_nw'].iloc[i])/2, min_val) * arrow(4)
    dir_nw = max(df['34_rad_nw'].iloc[i], min_val) * arrow(3)
    
    points.append([p + dir_n * 1.852,
                   p + dir_ne * 1.852,
                   p + dir_e * 1.852,
                   p + dir_se * 1.852,
                   p + dir_s * 1.852,
                   p + dir_sw * 1.852,
                   p + dir_w * 1.852,
                   p + dir_nw * 1.852])

    points_coords.append([p_coord + dir_n / 60,
                   p_coord + dir_ne / 60,
                   p_coord + dir_e / 60,
                   p_coord + dir_se / 60,
                   p_coord + dir_s / 60,
                   p_coord + dir_sw / 60,
                   p_coord + dir_w / 60,
                   p_coord + dir_nw / 60])
    
df['points'] = points
df["points_coords"] = points_coords
df['points'] = df.points.apply(lambda x : [list(t) for t in x])
df['points_coords'] = df.points_coords.apply(lambda x : [list(t) for t in x])
df['area'] = df.points.apply(lambda x : Polygon(x).convex_hull.area)

# creating column with initial coordinate of each hurricane
objects = df.object.unique()
objects_map = {"latitude": {}, "longitude": {}}
for e in objects:
    longitude_start = df[df.object == e].sort_values('time').longitude.iloc[0]
    latitude_start = df[df.object == e].sort_values('time').latitude.iloc[0]
    objects_map["longitude"][str(e)] = longitude_start
    objects_map["latitude"][str(e)] = latitude_start
    
df['longitude_start'] = df.object.map(lambda x : objects_map["longitude"][str(x)])
df['latitude_start'] = df.object.map(lambda x : objects_map["latitude"][str(x)])

#dropping objects that ended on january (and started on december)
objects_to_remove = df[df.month == 1].object.unique()
df = df[~df.object.isin(objects_to_remove)]
df = df.reset_index(drop = True)

df.to_csv("../data/processed/hurdat.csv", index = False)


Total of unique hurricanes: 299
