### Data File Structure

The hierarchical structure of an h5 file is as follows:
  **file** -> **group** -> **dataset** -> **field**.

It's important to note that all names are case-sensitive.

In [16]:
import h5py
import pandas as pd

#### Groups
For the AIRS L1C Infrared data, we have five groups (or keys).

In [17]:
f = h5py.File("../data/0131.h5")
print(f)

for k in f.keys():
    print("\t", k)

<HDF5 file "0131.h5" (mode r)>
	 Channel:L1C_AIRS_Science
	 GeoTrack:L1C_AIRS_Science
	 GeoXTrack:L1C_AIRS_Science
	 L1C_AIRS_Science
	 Module:L1C_AIRS_Science


#### Datasets
Within the "L1C_AIRS_Science" group, we find the necessary data within Geolocation Fields and Data Fields. 

In [None]:
datasets = f['L1C_AIRS_Science']
print(datasets)

for d in datasets:
    print("\t", d)

<HDF5 group "/L1C_AIRS_Science" (3 members)>
	 Geolocation Fields
	 Data Fields
	 Swath Attributes


#### Fields
For image generation, we only need `radiances`. Note that fields could themselves be multidimentional arrays.

In [None]:
fields = datasets['Data Fields']
print(fields)

for f in fields:
    print("\t", f)

<HDF5 group "/L1C_AIRS_Science/Data Fields" (65 members)>
	 radiances
	 scanang
	 satheight_t
	 satheight
	 satroll_t
	 satroll
	 satpitch_t
	 satpitch
	 satyaw_t
	 satyaw
	 satzen
	 satazi
	 solzen
	 solazi
	 glintlat_t
	 glintlat
	 glintlon_t
	 glintlon
	 sun_glint_distance
	 nadirTAI_t
	 nadirTAI
	 sat_lat_t
	 sat_lat
	 sat_lon_t
	 sat_lon
	 scan_node_type_t
	 scan_node_type
	 topog
	 topog_err
	 landFrac
	 landFrac_err
	 ftptgeoqa
	 zengeoqa
	 demgeoqa
	 satgeoqa_t
	 satgeoqa
	 glintgeoqa_t
	 glintgeoqa
	 moongeoqa_t
	 moongeoqa
	 state
	 Rdiff_swindow
	 Rdiff_lwindow
	 SceneInhomogeneous
	 dust_flag
	 dust_score
	 spectral_clear_indicator
	 BT_diff_SO2
	 nominal_freq_t
	 nominal_freq
	 orbit_phase_deg_t
	 orbit_phase_deg
	 shift_y0
	 Doppler_shift_ppm
	 AB_Weight
	 L1cProc
	 L1cSynthReason
	 NeN
	 ChanID_t
	 ChanID
	 ChanMapL1b_t
	 ChanMapL1b
	 L1cNumSynth_t
	 L1cNumSynth
	 Inhomo850


### Radiances

Over a six-minute time window for observations in each file, the instrument sweeps left-to-right 135 times, each time capturing 90 spectra. 

A spectrum has 2645 channels, which are listed in order of increasing wavenumbers. 

Note that radiances has type `>f4`, which means the datatype is a 4-byte floating point number stored in big-endian byte order. As most modern computers use little-endian, we need to convert them. 

In [None]:
rad = fields['radiances']
print(rad)

<HDF5 dataset "radiances": shape (135, 90, 2645), type ">f4">


### Geolocation and Timestamp

Similarly, we could retrieve geo info and time from the datasets (or fields) accordingly. As the instrument capture a spectrum for each location, naturally, the dimension of these fields are 135 by 90. 

In [None]:
geo = datasets['Geolocation Fields']
print(geo)
for field in geo:
    print("\t", field)

<HDF5 group "/L1C_AIRS_Science/Geolocation Fields" (3 members)>
	 Latitude
	 Longitude
	 Time


In [None]:
lat = geo['Latitude']
lon = geo['Longitude']
time = geo['Time']

print(lat)
print(lon)
print(time)

<HDF5 dataset "Latitude": shape (135, 90), type ">f8">
<HDF5 dataset "Longitude": shape (135, 90), type ">f8">
<HDF5 dataset "Time": shape (135, 90), type ">f8">


### Granule Metadata

The metadata is stored in `Swath Attributes`. It contains information about the instrument, the data, and the processing. As each file contains a 6-min observation for 135 sweeps, which consitutes a granule, the metadata is on the granule level. However, note that the instrument fly over an area multiple times each day, and granule is not a location-fixed concept, but rather a fly-over-operation unit.

In [None]:
meta = datasets['Swath Attributes']
print(meta)

<HDF5 group "/L1C_AIRS_Science/Swath Attributes" (212 members)>


For each member of the `Swath Attributes`, there is a named type and a dataset. The named type contains the metadata, and the dataset contains the data.

In [None]:
for m in meta:
    print("\t", meta[m])

	 <HDF5 named type "_FV_Latitude_t" (dtype |V8)>
	 <HDF5 dataset "_FV_Latitude": shape (1,), type "|V8">
	 <HDF5 named type "_FV_Longitude_t" (dtype |V8)>
	 <HDF5 dataset "_FV_Longitude": shape (1,), type "|V8">
	 <HDF5 named type "_FV_Time_t" (dtype |V8)>
	 <HDF5 dataset "_FV_Time": shape (1,), type "|V8">
	 <HDF5 named type "_FV_radiances_t" (dtype |V4)>
	 <HDF5 dataset "_FV_radiances": shape (1,), type "|V4">
	 <HDF5 named type "_FV_scanang_t" (dtype |V4)>
	 <HDF5 dataset "_FV_scanang": shape (1,), type "|V4">
	 <HDF5 named type "_FV_satheight_t" (dtype |V4)>
	 <HDF5 dataset "_FV_satheight": shape (1,), type "|V4">
	 <HDF5 named type "_FV_satroll_t" (dtype |V4)>
	 <HDF5 dataset "_FV_satroll": shape (1,), type "|V4">
	 <HDF5 named type "_FV_satpitch_t" (dtype |V4)>
	 <HDF5 dataset "_FV_satpitch": shape (1,), type "|V4">
	 <HDF5 named type "_FV_satyaw_t" (dtype |V4)>
	 <HDF5 dataset "_FV_satyaw": shape (1,), type "|V4">
	 <HDF5 named type "_FV_satzen_t" (dtype |V4)>
	 <HDF5 dataset "_

For the value of a member, we use `AttrValue` to retrieve it. Since it's a numpy array, we use `[0]` to get the scalar value.

In [None]:
slon = meta['start_Longitude']
print(slon['AttrValues'][0])

-141.89888920831928


In [None]:
for t in meta:
    if isinstance(meta[t], h5py.Dataset):
        print(t, ": ", meta[t]['AttrValues'][0])

_FV_Latitude :  -9999.0
_FV_Longitude :  -9999.0
_FV_Time :  -9999.0
_FV_radiances :  -9999.0
_FV_scanang :  -9999.0
_FV_satheight :  -9999.0
_FV_satroll :  -9999.0
_FV_satpitch :  -9999.0
_FV_satyaw :  -9999.0
_FV_satzen :  -9999.0
_FV_satazi :  -9999.0
_FV_solzen :  -9999.0
_FV_solazi :  -9999.0
_FV_glintlat :  -9999.0
_FV_glintlon :  -9999.0
_FV_sun_glint_distance :  -9999
_FV_nadirTAI :  -9999.0
_FV_sat_lat :  -9999.0
_FV_sat_lon :  -9999.0
_FV_topog :  -9999.0
_FV_topog_err :  -9999.0
_FV_landFrac :  -9999.0
_FV_landFrac_err :  -9999.0
_FV_ftptgeoqa :  4294967295
_FV_zengeoqa :  65534
_FV_demgeoqa :  65534
_FV_satgeoqa :  4294967295
_FV_glintgeoqa :  65534
_FV_moongeoqa :  65534
_FV_state :  -9999
_FV_Rdiff_swindow :  -9999.0
_FV_Rdiff_lwindow :  -9999.0
_FV_SceneInhomogeneous :  255
_FV_dust_flag :  -9999
_FV_dust_score :  -9999
_FV_spectral_clear_indicator :  -9999
_FV_BT_diff_SO2 :  -9999.0
_FV_nominal_freq :  -9999.0
_FV_orbit_phase_deg :  -9999.0
_FV_shift_y0 :  -9999.0
_FV_D

### Using Nested Path to Access Data
We could also use a path structure to specify the location of the data. For example, to access the radiances, we could use `L1C_AIRS_Science/Geolocation Fields/radiances`.

In [37]:
# Open the HDF5 file
with h5py.File('../data/0131.h5', 'r') as f:
    # Visit all items in the file
    f.visititems(print_hdf5_item)

Channel:L1C_AIRS_Science
GeoTrack:L1C_AIRS_Science
GeoXTrack:L1C_AIRS_Science
L1C_AIRS_Science
L1C_AIRS_Science/Data Fields
L1C_AIRS_Science/Data Fields/AB_Weight
L1C_AIRS_Science/Data Fields/BT_diff_SO2
L1C_AIRS_Science/Data Fields/ChanID
L1C_AIRS_Science/Data Fields/ChanID_t
L1C_AIRS_Science/Data Fields/ChanMapL1b
L1C_AIRS_Science/Data Fields/ChanMapL1b_t
L1C_AIRS_Science/Data Fields/Doppler_shift_ppm
L1C_AIRS_Science/Data Fields/Inhomo850
L1C_AIRS_Science/Data Fields/L1cNumSynth
L1C_AIRS_Science/Data Fields/L1cNumSynth_t
L1C_AIRS_Science/Data Fields/L1cProc
L1C_AIRS_Science/Data Fields/L1cSynthReason
L1C_AIRS_Science/Data Fields/NeN
L1C_AIRS_Science/Data Fields/Rdiff_lwindow
L1C_AIRS_Science/Data Fields/Rdiff_swindow
L1C_AIRS_Science/Data Fields/SceneInhomogeneous
L1C_AIRS_Science/Data Fields/demgeoqa
L1C_AIRS_Science/Data Fields/dust_flag
L1C_AIRS_Science/Data Fields/dust_score
L1C_AIRS_Science/Data Fields/ftptgeoqa
L1C_AIRS_Science/Data Fields/glintgeoqa
L1C_AIRS_Science/Data Fiel

In [None]:
with h5py.File('../data/0131.h5', 'r') as f:
    # Access the dataset within the group
    dataset = f['L1C_AIRS_Science/Data Fields/radiances']
    
    # Convert the dataset to a numpy array
    data = dataset[:]
    print(data)