# 1. Data representation for analysis



## 1.1 Preparing the notebook

Press *play* in the following cell to install some libraries needed to view the maps. After installation, restart the runtime (from the toolbar *Runtime* -> *Restart runtime*) and continue with the next cells.

In [None]:
! apt-get install libgeos-3.5.0
! apt-get install libgeos-dev
! pip install https://github.com/matplotlib/basemap/archive/master.zip

Press *play* in the following cell to import the datasets from the GitHub repository.

In [None]:
! git clone https://github.com/vitoreno/StelleDataset.git
! unzip /content/StelleDataset/data.zip

Press *play* in the following cell to import the libraries needed to run the notebook.

In [None]:
%load_ext google.colab.data_table
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import sys
from datetime import datetime
import scipy.stats

## 1.2 Loading the dataset

Press the play button of the following cells to access the imported datasets and view a description of them.

The dataset *mediterranean_surface_temperature_2014_15_16* describes the surface temperature of the Adriatic, Ionian and Tyrrhenian Seas from 2014 to 2016. For each observation we know:
*   *time*: observation date;
*   *sea*: indicates whether the observation took place in the Adriatic, Ionian or Tyrrhenian Sea;
*   *lat, lon*: coordinates (latitude, longitude);
*   *sst*: Sea Surface Temperature.



In [None]:
data = pd.read_csv("/content/mediterranean_surface_temperature_2014_15_16.csv")
data.describe()

The dataset *soil_moisture_2016* describes the soil moisture on the coasts of the Adriatic, Ionian, Tyrrhenian, Red and Labrador Seas in the year 2016, and the sea surface temperature recorded at the nearest point. For each observation we know:
*   *time*: observation date;
*   *sea*: indicates whether the observation took place in the Adriatic, Ionian, Tyrrhenian, Red or Labrador seas;
*   *lat_sm, lon_sm*: coordinates (latitude, longitude) of the point where the soil moisture has been sampled;
*   *sm*: Soil Moisture.
*   *lat, lon*: coordinates (latitude, longitude) of the point where the temperature has been sampled;
*   *sst*: Sea Surface Temperature.


In [None]:
data = pd.read_csv("/content/soil_moisture_2016.csv")
data.describe()

## 1.3 Map visualization

The execution of the next cell allows to view a geographical map of the Mediterranean with the relative temperatures recorded on the selected date.

Select a date between 2014-01-01 and 2016-12-31, and press *play*.

In [None]:
date_str = '2014-01-02' #@param {type:"date"}

current_date = datetime.strptime(date_str + " 12:00:00", '%Y-%m-%d %H:%M:%S')

if (current_date < datetime.strptime("2014-01-01 12:00:00", '%Y-%m-%d %H:%M:%S')) | (current_date > datetime.strptime("2016-12-31 12:00:00", '%Y-%m-%d %H:%M:%S')):
  sys.exit("Data non valida. Inserire data compresa fra 2014-01-01 e 2016-12-31")

data = pd.read_csv("/content/mediterranean_surface_temperature_2014_15_16.csv")
data.time = pd.to_datetime(data.time)

current_data = data.loc[data.time == current_date]
lat = current_data.lat.to_numpy()
lon = current_data.lon.to_numpy()
sst = current_data.sst.to_numpy()

fig = plt.figure(figsize=(10, 8))
m = Basemap(projection='lcc', resolution='c',
            width=1.5E6, height=1.5E6, 
            lat_0=42, lon_0=14)
m.shadedrelief(scale=0.5)
m.scatter(lon, lat, latlon=True, c=sst,
          cmap='Reds', marker ='+', edgecolors='none', alpha=0.7)
plt.colorbar()

The execution of the next cell allows to view the geographical maps of the Mediterranean, Red Sea and Labrador Sea with relative temperatures and soil moisture values, recorded on the selected date.
Select a date between 2016-01-01 and 2016-12-31, and press *play* (please not that data belonging to the Labrador Sea is not available for the whole period).

In [None]:
date_str = '2016-08-01' #@param {type:"date"}

current_date = datetime.strptime(date_str + " 12:00:00", '%Y-%m-%d %H:%M:%S')

if (current_date < datetime.strptime("2016-01-01 12:00:00", '%Y-%m-%d %H:%M:%S')) | (current_date > datetime.strptime("2016-12-31 12:00:00", '%Y-%m-%d %H:%M:%S')):
  sys.exit("Data non valida. Inserire data compresa fra 2016-01-01 e 2016-12-31")

data = pd.read_csv("/content/soil_moisture_2016.csv")
data.time = pd.to_datetime(data.time)

current_data = data.loc[data.time == current_date]
lat_sst = current_data.lat.to_numpy()
lon_sst = current_data.lon.to_numpy()
sst = current_data.sst.to_numpy()
lat_sm = current_data.lat_sm.to_numpy()
lon_sm = current_data.lon_sm.to_numpy()
sm = current_data.sm.to_numpy()

# Mediterranean
fig = plt.figure(figsize=(10, 5))
m = Basemap(projection='lcc', resolution='c',
            width=1.5E6, height=1.5E6, 
            lat_0=42, lon_0=14)
m.shadedrelief(scale=0.5)
m.scatter(lon_sst, lat_sst, latlon=True, c=sst,
          cmap='Reds', marker ='+', edgecolors='none', alpha=0.7)
plt.colorbar()
m.scatter(lon_sm, lat_sm, latlon=True, c=sm,
          cmap='Blues', marker ='x', edgecolors='none', alpha=0.7)
plt.colorbar()
plt.title("Mediterranean Sea")

# Red
fig = plt.figure(figsize=(10, 5))
m = Basemap(projection='lcc', resolution='c',
            width=2E6, height=2E6, 
            lat_0=20, lon_0=38)
m.shadedrelief(scale=0.5)
m.scatter(lon_sst, lat_sst, latlon=True, c=sst,
          cmap='Reds', marker ='+', edgecolors='none', alpha=0.7)
plt.colorbar()
m.scatter(lon_sm, lat_sm, latlon=True, c=sm,
          cmap='Blues', marker ='x', edgecolors='none', alpha=0.7)
plt.colorbar()
plt.title("Red Sea")

# Labrador
fig = plt.figure(figsize=(10, 5))
m = Basemap(projection='lcc', resolution='c',
            width=2E6, height=2E6, 
            lat_0=54, lon_0=-55)
m.shadedrelief(scale=0.5)
m.scatter(lon_sst, lat_sst, latlon=True, c=sst,
          cmap='Reds', marker ='+', edgecolors='none', alpha=0.7)
plt.colorbar()
m.scatter(lon_sm, lat_sm, latlon=True, c=sm,
          cmap='Blues', marker ='x', edgecolors='none', alpha=0.7)
plt.colorbar()
plt.title("Labrador Sea")

## 1.4 Pie chart

Press *play* to display the observations of *mediterranean_surface_temperature_2014_15_16* in a Pie chart.

In [None]:
data = pd.read_csv("/content/mediterranean_surface_temperature_2014_15_16.csv")
data.sea.value_counts().plot.pie(autopct='%1.0f%%', figsize=(5,5))

## 1.5 Histogram

Running the next cell it is possible to view the average or median temperature of the Adriatic, Ionian and Tyrrhenian Seas, during the selected year.

Select mode (average or median) and year of interest, and then press *play*.

In [None]:
modality = "median" #@param ["mean", "median"]
year = 2016 #@param [2014, 2015, 2016] {type:"raw"}

data = pd.read_csv("/content/mediterranean_surface_temperature_2014_15_16.csv")
data.time = pd.to_datetime(data.time)

current_data = data.loc[data.time.dt.year == year]

if modality == "mean":
  sst_mean = current_data.groupby([pd.DatetimeIndex(current_data.time).month, current_data.sea]).mean()
  sst_mean.unstack().plot.bar(y='sst', xlabel='month', ylabel='surface temperature', ylim=[285,302], figsize=(10,5))
elif modality == "median":
  sst_median = current_data.groupby([pd.DatetimeIndex(current_data.time).month, current_data.sea]).median()
  sst_median.unstack().plot.bar(y='sst', xlabel='month', ylabel='surface temperature', ylim=[285,302], figsize=(10,5))

## 1.6 Boxplot

Press *play* to view the distribution of latitude and longitude of the observations related to the Adriatic, Ionian and Tyrrhenian Seas, using the boxplot.

In [None]:
data = pd.read_csv("/content/mediterranean_surface_temperature_2014_15_16.csv")

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,5))
data.boxplot(column='lat', by='sea', ax=axes[0])
data.boxplot(column='lon', by='sea', ax=axes[1])

Through the execution of the next cell it is possible to view the temperature distribution in the observations related to the Adriatic, Ionian and Tyrrhenian Seas in the selected year and month, using the boxplot.
Select the year and month of interest, and press *play*.

In [None]:
month = 12 #@param [1,2,3,4,5,6,7,8,9,10,11,12] {type: "raw"}
year = 2016 #@param [2014,2015,2016] {type: "raw"}

data = pd.read_csv("/content/mediterranean_surface_temperature_2014_15_16.csv")
data.time = pd.to_datetime(data.time)

current_data = data.loc[((data.time.dt.year == year) & (data.time.dt.month == month))]

current_data.boxplot(column='sst', by='sea', figsize=(5,5))

## 1.7 Correlation

The execution of the following cell allows to calculate the correlation between the soil moisture values on the coast and the sea surface temperatures recorded in the nearest points, on the selected date. The distributions of temperature and humidity are also described by means of histograms.
Select a date between 2016-01-01 and 2016-12-31, and press *play*.

In [None]:
date_str = '2016-07-23' #@param {type:"date"}

current_date = datetime.strptime(date_str + " 12:00:00", '%Y-%m-%d %H:%M:%S')

if (current_date < datetime.strptime("2016-01-01 12:00:00", '%Y-%m-%d %H:%M:%S')) | (current_date > datetime.strptime("2016-12-31 12:00:00", '%Y-%m-%d %H:%M:%S')):
  sys.exit("Data non valida. Inserire data compresa fra 2016-01-01 e 2016-12-31")

data = pd.read_csv("/content/soil_moisture_2016.csv")
data.time = pd.to_datetime(data.time)

current_data = data.loc[data.time == current_date]

x = current_data.sst.to_numpy()
y = current_data.sm.to_numpy()

r, p = scipy.stats.pearsonr(x, y)

print("Correlation coefficient: ",r)
print("P-value: ",p)

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,5))
axes[0].set_title('Temperature')
current_data.sst.plot.hist(alpha=0.5, ax=axes[0])
axes[1].set_title('Soil moisture')
current_data.sm.plot.hist(alpha=0.5, ax=axes[1])