<a href="https://colab.research.google.com/github/olanrewajufarooq/MIROceanographyAnalysis/blob/main/Oceanography_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Analysis of the Oceanographic Data**

This notebook is used to analyse the data obtained from the Drifters and the CDT Casting on the Day 2 of the 2022/2023 Sea Trip. The information obtained is also compared with information from Day 1 and Day 3, likewise, the information from intake 1 (i.e. 2021/2022).

### Importing Necessary Modules for the Notebook

In [1]:
# Python-based Libraries
import os
from datetime import timedelta, time
from math import sin, cos, sqrt, atan2, radians
import datetime as dt

# Data Analysis Libraries
import numpy as np
import pandas as pd

# Graph plotting libraries
import matplotlib.pyplot as plt
import folium

### Connecting to Google Drive

In [2]:
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
# Defining the path of the Group folder on Google Drive
path = "./drive/MyDrive/OceanographyAnalysis/"

# Check if the files are accessible
os.listdir(path + "Drifters Data")

['drifter-10_12_22-3368.csv',
 'LCI00273.txt',
 'drifter-10_12_22-7230.csv',
 'drifter-10_12_22-8436.csv',
 'drifter-10_12_22-6439.csv',
 'LCI00274.txt',
 'LCI00277.txt',
 'drifter-10_12_22-0119.csv',
 'drifter-10_12_22-2052.csv']

## Import Drifter Data



In [4]:
# Initializing a Dictionary datatype that stores data for each Drifter
data = {}

# Iterate through all file
for file in os.listdir(f"{path}/Drifters Data"):

    # Data from the White Drifters are stored in ".csv" formats using "UTF-16 LE" Encoding
    if file.endswith(".csv"):
        
        data_key = file.split(".")[0][-4:] # Obtaining the name of the Drifter
        data_value = pd.read_csv(f'{path}/Drifters Data/{file}', encoding="UTF-16 LE") #The encoding is very important.
        data[data_key] = data_value # Storing the data in the "data" dictionary
    
    # Data from the Yellow Drifters are stored in ".txt" formats using "UTF-8" Encoding
    elif file.endswith(".txt"):
        
        data_key = file.split(".")[0][-3:] # Obtaining the name of the Drifter
        data_value = pd.read_csv(f'{path}/Drifters Data/{file}', encoding="UTF-8") #The encoding is very important.
        data[data_key] = data_value # Storing the data in the "data" dictionary

# Show the names of all drifters for which the data has been read
print(list(data.keys()))

['3368', '273', '7230', '8436', '6439', '274', '277', '0119', '2052']


### Import the Logsheet

In [5]:
log_df = pd.read_excel(f"{path}/logsheet.xlsx")

# Convert the Deployment Time and the Time of Recovery to datetime objects in Pandas
log_df['Deployment Time'] = pd.to_datetime(log_df['Deployment Time'],format= '%H:%M:%S' ).dt.time
log_df['Time of Recovery'] = pd.to_datetime(log_df['Time of Recovery'],format= '%H:%M:%S' ).dt.time

log_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Station No        14 non-null     int64  
 1   Type              14 non-null     object 
 2   Name              10 non-null     float64
 3   Deployment Time   14 non-null     object 
 4   Long Deg          14 non-null     int64  
 5   Long Min          14 non-null     float64
 6   Long Dir          14 non-null     object 
 7   Lat Deg           14 non-null     int64  
 8   Lat Min           14 non-null     float64
 9   Lat Dir           14 non-null     object 
 10  Time of Recovery  13 non-null     object 
 11  Long Deg.1        12 non-null     float64
 12  Log Min           12 non-null     float64
 13  Long Dir.1        12 non-null     object 
 14  Lat Deg.1         12 non-null     float64
 15  Lat Min.1         12 non-null     float64
 16  Lat Dir.1         12 non-null     object 
dtyp

## **Data Cleaning**



### Converting the time from UTC to Paris Time

In [6]:
# Checking the data from the yellow drifters
data['273'].head(3)

Unnamed: 0,Position time (UTC),Reception time (UTC),Latitude (°),Longitude (°),Speed (m/s),Course (°),Status,Battery (V),Temperature (°C)
0,2022-10-12 07:20:00,2022-10-12 07:20:21,43.10374,5.91212,4.16667,102.2,1,4.108,16.1
1,2022-10-12 07:30:00,2022-10-12 07:30:19,43.08793,5.92784,4.00278,85.9,1,4.107,16.9
2,2022-10-12 07:40:00,2022-10-12 07:40:26,43.08135,5.95918,4.58333,120.2,1,4.105,17.2


In [7]:
# Converting the UTC time to Paris Time [For Yellow Drifters]

def DataClean_YDrifters(data_df):
    data_df['Position time (UTC)'] = pd.to_datetime(data_df['Position time (UTC)'])
    data_df['Reception time (UTC)'] = pd.to_datetime(data_df['Reception time (UTC)'])

    data_df['Position time'] = data_df['Position time (UTC)'] + timedelta(hours = 2)
    data_df['Reception time'] = data_df['Reception time (UTC)'] + timedelta(hours = 2)

    data_df = data_df.drop(labels=['Position time (UTC)', 'Reception time (UTC)', 'Course (°)', 'Reception time',
                                        'Status', 'Battery (V)'], axis=1)

    data_df = data_df.rename(columns={"Latitude (°)":"Latitude", "Longitude (°)":"Longitude", 
                      "Speed (m/s)":"Speed", "Temperature (°C)":"Temperature"}, inplace=False)
    
    data_df["Position time"] = data_df["Position time"].dt.time
    
    return data_df

DataClean_YDrifters(data["273"]).head()

Unnamed: 0,Latitude,Longitude,Speed,Temperature,Position time
0,43.10374,5.91212,4.16667,16.1,09:20:00
1,43.08793,5.92784,4.00278,16.9,09:30:00
2,43.08135,5.95918,4.58333,17.2,09:40:00
3,43.07754,5.97256,0.0,17.6,09:50:00
4,43.07886,5.98378,0.59722,18.0,10:00:00


In [8]:
# Checking the data from others
data['0119'].head(3)

Unnamed: 0,DeviceName,DeviceDateTime,BatteryStatus,CommId,Latitude,Longitude
0,0-4410119,2022-10-12 11:11:48,GOOD,0-4410119,43.08366,5.95759
1,0-4410119,2022-10-12 11:07:25,GOOD,0-4410119,43.08342,5.95843
2,0-4410119,2022-10-12 11:01:48,GOOD,0-4410119,43.08297,5.95946


In [9]:
# Converting the UTC time to Paris Time [For Other Drifters]

def DataClean_ODrifters(data_df):
    
    data_df["Position time"] = pd.to_datetime(data_df["DeviceDateTime"]).dt.time
    data_df = data_df.drop(labels=['DeviceName', 'BatteryStatus', 'CommId', "DeviceDateTime"], axis=1)
    
    return data_df

DataClean_ODrifters(data['0119']).head()

Unnamed: 0,Latitude,Longitude,Position time
0,43.08366,5.95759,11:11:48
1,43.08342,5.95843,11:07:25
2,43.08297,5.95946,11:01:48
3,43.08281,5.96034,10:56:47
4,43.08254,5.96113,10:51:45


In [10]:
# Clean all data

for key in data.keys():
    if len(key) == 3:
        data[key] = DataClean_YDrifters(data[key])
    elif len(key) == 4:
        data[key] = DataClean_ODrifters(data[key])

In [11]:
data['277'].head()

Unnamed: 0,Latitude,Longitude,Speed,Temperature,Position time
0,43.1038,5.91181,3.96111,15.8,09:20:00
1,43.08799,5.92791,3.63611,16.8,09:30:00
2,43.08133,5.95918,4.63611,16.9,09:40:00
3,43.07751,5.97264,0.0,17.2,09:50:00
4,43.07879,5.98421,0.0,17.7,10:00:00


In [12]:
data["2052"].head()

Unnamed: 0,Latitude,Longitude,Position time
0,43.08428,5.96721,11:58:35
1,43.08621,5.96815,11:53:35
2,43.08921,5.97023,11:48:34
3,43.08892,5.97103,11:43:37
4,43.08864,5.97186,11:38:36


### Extracting Data from Deployment Time to Recovery Time

In [13]:
# Extract Data within the Deployment and Recovery time

def extractData(key, data, log_df):

    deploy_time = log_df["Deployment Time"][log_df["Name"] == float(key)]
    recov_time = log_df["Time of Recovery"][log_df["Name"] == float(key)]

    deploy_bool = np.array([(data[key]["Position time"][i] > deploy_time).to_numpy() for i in range(len(data[key]))]).flatten()
    recov_bool = np.array([(data[key]["Position time"][i] < recov_time).to_numpy() for i in range(len(data[key]))]).flatten()

    cleaning_bool = deploy_bool & recov_bool
    data[key] = data[key][cleaning_bool]
    data[key].reset_index(inplace = True)
    
    return data

In [14]:
for key in data.keys():
    data = extractData(key, data, log_df)
    
# All data has been cleaned and extracted at this point

In [15]:
data["0119"].head()

Unnamed: 0,index,Latitude,Longitude,Position time
0,0,43.08366,5.95759,11:11:48
1,1,43.08342,5.95843,11:07:25
2,2,43.08297,5.95946,11:01:48
3,3,43.08281,5.96034,10:56:47
4,4,43.08254,5.96113,10:51:45


In [16]:
data["0119"].tail()

Unnamed: 0,index,Latitude,Longitude,Position time
10,10,43.08068,5.96633,10:21:45
11,11,43.08035,5.96723,10:16:44
12,12,43.08007,5.96811,10:11:47
13,13,43.07985,5.96888,10:07:07
14,14,43.07952,5.96972,10:01:48


### Exporting the Cleaned Data

In [17]:
for key in data.keys():
    data[key].to_csv(f'{path}/Cleaned Drifters Data/{key}.csv')

# All the cleaned data has been exported to a folder for any necessary external use

## **Analysis**

### Computing the Trajectory Velocity

By: Chin

In [18]:
# approximate radius of earth in km
R = 6373.0


#input time values in HH:MM:SS, time2 must be greater than time1 obviously
def calVelocity(lat1, lon1, lat2, lon2, time1, time2):
    
    
    start=time1
    end=time2
    start_dt = dt.datetime.strptime(start, '%H:%M:%S')
    end_dt = dt.datetime.strptime(end, '%H:%M:%S')
    diff = (end_dt - start_dt) 
    timediff = diff.seconds/60 
    print(type(start_dt))

    lat1 = radians(lat1)
    lon1 = radians(lon1)
    lat2 = radians(lat2)
    lon2 = radians(lon2)

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    distance = R * c
    velocity = distance*1000/60/timediff

    # print("Result:", distance)
    # print(timediff)
    return velocity
    
calVelocity(43.08366,5.95759,43.08342,5.95843, "11:11:00", "11:14:00")

def calVelocity2(data):
    df_lat = np.array(data["Latitude"])
    df_long = np.array(data["Longitude"])
    df_time = data["Position time"]
    df_time = pd.to_datetime(df_time.astype(str), format='%H:%M:%S')
    
    d_time = np.array(df_time[:-1]) - df_time[1:]
    d_time = d_time.apply(lambda dt_i : dt_i.seconds/60.0)
    rad_lat = np.radians(df_lat)
    rad_long = np.radians(df_long)
    d_lat = rad_lat[:-1] - np.array(rad_lat[1:])
    d_long = rad_long[:-1] - np.array(rad_long[1:]) 
    
    a = np.sin(d_lat / 2.0)**2 + np.cos(rad_lat[1:]) * np.cos(rad_lat[:-1]) * np.sin(d_long / 2)**2
    c = 2.0 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
    
    distance = R * c
    velocity = distance*1000.0/60.0/np.array(df_time, dtype=float)

<class 'datetime.datetime'>


In [19]:
calVelocity2(data["3368"])

ValueError: ignored

In [20]:
list(data.keys())
data["3368"].columns

Index(['index', 'Latitude', 'Longitude', 'Position time'], dtype='object')

### Visualizing the Trajectories and Velocities on Graph
By: Farooq and Maria

### Analysis of Drifter Types

### Analysis of Circulation

### Analysis of the Daily Variability

## **CTD Data Analysis**

### Import and Clean the CTD Data
By Haleem and Aduragbemi

### Plot the T-S Profile

### Compare the Outputs with 2021 Data