# National Oceanic and Atmospheric Administration (NOAA)
This jupyter notebook is meant to be used along with the North American Mesoscale Forecast System (NAM) dataset.  
This dataset can be found online under "Data Access > Model > Datasets > NAM" on the NOAA website.  
Once the data has been properly requested, confirmed, and processed using NOAA's Order Data feature,  
the requester is given 5 days to download the data via a email-link.  

# NAM 2017
To be more specific, the range this notebook will be targetting is the entire 2017 year (1800 UTC only). (100 ish Gb)  
Through further observation of the email-link provided by NOAA, it can be seen that the files end in an extension ".tar".  
These are all just zipped files. BEWARE- unpacking all these files doubles the 100Gb to 200Gb.  
It is recommended to make a main folder (moddata) with subfolders with each month on them (01, 02, 03, ..., 12).  
Then, putting all the corresponding ".tar" files in their respective month folder. The idea is to unpack an entire month and then delete the ".tar" files for that month. That way you don't have to unpack 100Gb to 200Gb and then try to delete 100Gb. It will be more like unpacking 6Gb to 12Gb then deleting the .tar files to regain space.

# Unpacking ".tar" Files
The ".tar" file names should look like the following:

namanl_218_2017010118.g2.tar

The format is very simple: namanl_218_yyyymmddhh.g2.tar  
Where yyyy = year, mm = month, dd = day, hh = hour (UTC)  
The above file would then be of 2017 January 1st 18 UTC

Since this data ranges accross the entire North America continent, 18 UTC was chosen. In New York, 18 UTC translates to 2 P.M. This makes any "real images" from the dataset appear more visually appealing since that part of the Earth will be facing towards the sun. This would be better than "real images" of NA taken at night. This also saves a lot of space. Imagine simply having 2 timestamps per day instead of 1, this would easily double the size of the data. NOAA allows 4 timestamps per day (0 UTC, 6 UTC, 12 UTC, and 18 UTC).  

Once you're inside the month directory containing all the ".tar" files for that month, simply use the following command to unpack:

// assuming path /moddata/01 being the path to all of january's ".tar" files type  
// and that you are currently inside the /01 directory, type the following into the kernel:  
for f in *.tar; do tar -xvf $f; done

This command should run for about 20 seconds and you should see all the files being unpacked individually.  
For every ".tar" file unpacked, there should be 5 ".grb2" files

// now type the following command to delete all of the ".tar" files in that directory:  
rm -r *.tar

This is done for all 12 months until every subdirectory of /moddata contains only ".grb2" files.  
The new file format should be:

nam_218_20170101_1800_000.grb2

nam_218_yyyymmdd_hhhh_band.grb2

# Congratulations!
Once all of the ".grb2" for every month are neatly organized in their own folder, the Exploritory Data Analysis can begin!  
These files can be viewed using a python package called "pygrib".

If you have anaconda installed, simply use:  
conda install -c conda-forge pygrib

That's about it for installations. Now let's dive into the code--

In [None]:
import pygrib # used to view ".grb2" files
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
def print_grbs(grbs):
    """
    grbs: grbs = pygrib.open("filepath.grb2"), grbs is a pygrib object
    this function displays all the meta-data for the current pygrib file opened
    """
    for grb in grbs:
        print(grb)

In [None]:
def get_paths(filepath, year = 0, month = 0, band = 0, everything = 0,):
    """
    filepath: a string containing the file path to the directory containing all the .grb2 files
    """
    if (everything != 0):
        years = [2017]
        months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
        month_days = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
        bands = [0, 1, 2, 3, 6]
        lst = []
        for month in months:
            for day in range(1, month_days[month - 1] + 1):
                for year in years:
                    for band in bands:
                        if (month < 10 and day < 10):
                            lst.append(filepath + "/nam_218_" + str(year) + "0" + str(month) + "0" + str(day) + "_1800_00" + str(band) + ".grb2")
                        elif (month < 10 and day >= 10):
                            lst.append(filepath + "/nam_218_" + str(year) + "0" + str(month) + str(day) + "_1800_00" + str(band) + ".grb2")
                        elif (month >= 10 and day < 10):
                            lst.append(filepath + "/nam_218_" + str(year) + str(month) + "0" + str(day) + "_1800_00" + str(band) + ".grb2")
                        else:
                            lst.append(filepath + "/nam_218_" + str(year) + str(month) + str(day) + "_1800_00" + str(band) + ".grb2")
        return lst
    else:
        month_days = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
        lst = []
        for day in range(1,month_days[month - 1] + 1):
            if (month < 10 and day < 10):
                lst.append(filepath + "/nam_218_" + str(year) + "0" + str(month) + "0" + str(day) +"_1800_00" + str(band) + ".grb2")
            elif (month < 10 and day >= 10):
                lst.append(filepath + "/nam_218_" + str(year) + "0" + str(month) + str(day) +"_1800_00" + str(band) + ".grb2")
            elif (month >= 10 and day < 10):
                lst.append(filepath + "/nam_218_" + str(year) + str(month) + "0" + str(day) +"_1800_00" + str(band) + ".grb2")
            else:
                lst.append(filepath + "/nam_218_" + str(year) + str(month) + str(day) +"_1800_00" + str(band) + ".grb2")
        return lst

In [None]:
filepath = "/media/sf_moddata/2017"
paths = get_paths(filepath, everything = 1)
print(paths[0]) # 2017 january 31 18 utc band 0
print(paths[-1]) # 2017 december 31 18 utc band 6
print(len(paths)) # 365 days * 5 bands per day = 1825

In [None]:
dec = get_paths(filepath, 2017, 12, 0)
print(dec[0]) # december 1
print(dec[-1]) # december 31
print(len(dec)) # 31 days * 1 band per day = 31

In [None]:
# let's take a look inside this file
# print_grbs(grbs)

In [None]:
# by looking at the data above, the unit for temperature is kelvin (K)
# there are multiple ways to extract data from a given "row"
# .values is a numpy command used to return a numpy array
temp_surf = grbs.select(name="Temperature")[0].values # temperature at the surface
temp_2m = grbs.select(name="2 metre temperature")[0].values # temperature at 2m
temp_5000 = grbs[58].values # index 58 is temperature at 5000m
snow_depth = grbs[364].values # index 364 is snow depth
soil_temp = grbs.select(name="Soil Temperature")[0].values
lightning = grbs.select(name="Lightning")[0].values

In [None]:
# let's just see what all these array's look like
print(temp_surf)
#print(temp_2m)
#print(temp_5000)
#print(snow_depth)
#print(soil_temp)
#print(lightning)

In [None]:
# lets try printing some of this data
# this should print out a heatmap in the shape of NA. As the altitude increases the temperature decreases
ax = sns.heatmap(temp_surf, cbar='true')
ax.invert_yaxis() # heatmap would print it upside down if it wasn't for this
print(np.max(temp_surf))

In [None]:
ax = sns.heatmap(temp_2m, cbar='true')
ax.invert_yaxis()
print(np.max(temp_2m))

In [None]:
ax = sns.heatmap(temp_5000, cbar='true')
ax.invert_yaxis()
print(np.max(temp_5000))

In [None]:
ax = sns.heatmap(snow_depth, cbar='true')
ax.invert_yaxis()
print(np.max(snow_depth))

In [None]:
ax = sns.heatmap(lightning, cbar='true')
ax.invert_yaxis()
print(np.max(lightning))