# MiniSOM Tutorial for 2-D Atmospheric Data and Example Using Mean Sea Level Pressure Data 

Background on Self Organizing Maps (SOMs):

Self-organizing Maps, SOMs, are a form of unsupervised learning that utilizes a competitive neural network to cluster alike data. SOMs are like the clustering technique used in K-means. SOMs take multidimensional data and reduce it to a two-dimensional array that can be easily visualized. Patterns that share similar characteristics are grouped adjacent to one another; whereas patterns that share minimal similarities are grouped on opposing sides of the SOM. 

MiniSOM Tutorial: This tutorial will be done in 2 Steps. This notebook will go over Step #1.
SOM Step #1:

### Python Imports

In [19]:
#Imports
import xarray as xr
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

### Dataset and Export Location:

In [None]:
#Define the path to which the data is located, this will be the folder that the data files and plots will be saved too. This will need to be updated for to the User's specific path. 
PATH ='/Users/research/thesis_code/'

Defining the Functions Used Within This Code

In [20]:
#Functions
def select_latlon(ds):
    return ds.sel(lat = slice(81,60), lon = slice(174,-126+360)) #change to your lat/lon

This tutorial will use the following NCEP Reanalysis 2 MSLP data that can easily be downloaded from: https://psl.noaa.gov/data/gridded/data.ncep.reanalysis2.html.
For this tutorial we will be using the following file 'mslp.2015.nc'

In [21]:
#Define the path to which the data is located, this will be the folder that the data files and plots will be saved too. This will need to be updated for to the User's specific path.  
folderpath ='/Users/research/thesis_code/'

This data has been subset to the region of study. The generation of SOMs computationally expensive if the datasets are large and/or the domain of study is large. To be the most computationally efficient, restrict the data to a domain of study. NCEP Reanalysis data is 6-hour data. For datasets that are hourly, it is recommended to resample the data to every 6-hours to reduce the computational expense. Futhermore, the month of Dec. 2015 will be used for demonstrations.  

If you need to resample the data the following code can be used to resample an xarray dataarray into a differing time interval

In [None]:
#dx = dy.resample(time = '6H').nearest()

In [22]:
dy = xr.open_mfdataset(PATH +'mslp.2015.nc',    #Xarray features will be used throughout this tutorial 
                         preprocess=select_latlon)
#Here we will be grabbing ONLY Jan., Feb., Oct., Nov., and Dec. 
ds = dy.isel(time=dy.time.dt.month.isin([1,2,10,11,12]))
ds

Unnamed: 0,Array,Chunk
Bytes,509.62 kiB,509.62 kiB
Shape,"(604, 9, 24)","(604, 9, 24)"
Count,4 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 509.62 kiB 509.62 kiB Shape (604, 9, 24) (604, 9, 24) Count 4 Tasks 1 Chunks Type float32 numpy.ndarray",24  9  604,

Unnamed: 0,Array,Chunk
Bytes,509.62 kiB,509.62 kiB
Shape,"(604, 9, 24)","(604, 9, 24)"
Count,4 Tasks,1 Chunks
Type,float32,numpy.ndarray


The most essential step in generating a SOM using 2-D Atmospheric data is to get the data from 3-D data to 2-D data with no alterations to the data itself. Here we will 
begin generating arrays and variables that will be used to reduce the 3-D data to 2-D data.

In [23]:
#Creating Variables from the subset winter ERA5 Data.
time_values = ds['time'].values
mslp_values = ((ds['mslp'])/100).values
mslpraw = (ds['mslp'])/100  #This is the data the NON-anomaly data.
lon = ds['lon'].values
lat = ds['lat'].values

#generate the empty array that will house the 6-hour interval data.
nhour =int((ds['time'].size))
nlat = int((ds['lat'].size))
nlon = int((ds['lon'].size))
mslparr = np.empty((nhour, nlat*nlon))  #This is the new array that we will place the data into. 

Here we will reduce the 3-D MSL data to 2-D data by stacking the latitude and longitude fields
Raw MSL data will be saved to generate SOMs using the raw data. Comparison between the general patterns observed in the anomaly SOM and composite mean SOM is done as a check that the SOM is 
accurately representing the input data. This will be described in Step #3

In [24]:
#We are now going to place the raw MSLP data into the array (mslparr)
for i in range(nhour):
    mslparr[i,:]= mslpraw[i,:,:].stack(point=["lat", "lon"])

In [25]:
#We are now calculating the hourly anomaly data. The hourly mean will be removed from the data. 
for i in range(nhour):
    mslparr[i,:] =mslparr[i,:]-np.mean(mslparr[i,:])

We are now going to be normalizing the data by finding the minimum and maximum and generating a factor to multiple to the data. The factor is based off the hourly
max and min in the MSLP data. Anomalies are used to prevent bias by strength/intensity of MSLP highs/lows.

In [26]:
maxmslp=-9999999
minmslp=999999   #we are setting the minmslp and maxmslp variables to a value so that there is no junk in the variable and each will easily overcome the set value.

for i in range(nhour):
    minmslp=min(minmslp,np.min(mslparr[i,:]))
    maxmslp=max(maxmslp,np.max(mslparr[i,:]))
print(maxmslp, minmslp)

#We are generating the MSLP factor to be multipled to the data to normalize it
mslp_factor=100./(maxmslp-minmslp)
print(mslp_factor)

#The data is now being normalized.
data_train = mslparr*mslp_factor
data_train.shape

28.759672094274492 -56.773122716833086
1.1691422011971209


(604, 216)

In [27]:
#Saving the variables that we need for the SOM making and plotting 
mslpraw.to_netcdf(PATH + 'VER2_SOM_MSLPraw_NCEP_data.nc')
np.save(PATH + 'TEST2_som_data_train.npy', data_train)
np.save(PATH + 'TEST2_som_time_data.npy', time_values)

Summary: This code is Step #1 in the MiniSOM Tutorial for 2-D Atmospheric Data and Example Using Mean Sea Level Pressure Data. This step gets the 3-d data into the 2-D form that is needed for SOM generation
and visualization. The following notebook will generate the SOMs themselves, plot the frequencies, sammon maps, and the SOM anomaly plots.