This script opens ICESat-02 ATL08 v6 data to export photons

Ideally, file structure should be as follows:
- The current working directory contains this script, and separate subdirectories for raw data and for extracted data
- This script will be pulling data and writing data between these two subdirectories

                                                 Current Working Directory
___________________________________________________________________________________________________________________________
                 ↓                                     ↓                                            ↓
            ./Raw Data/    <------->     height_extraction.ipynb (this script)    <------->  ./Extracted Data/

Setting our current working directory to proper location, and subdirectories for raw & extracted data

In [26]:
import os
os.getcwd()
os.chdir('C:\\Users\\17865\\OneDrive\\Desktop\\Masters\\Research\\IceSat\\Notebook\\')
os.getcwd()
raw_data = 'C:\\Users\\17865\\OneDrive\\Desktop\\Masters\\Research\\IceSat\\Notebook\\Raw Data'
extracted_data = 'C:\\Users\\17865\\OneDrive\\Desktop\\Masters\\Research\\IceSat\\Notebook\\Extracted Data'

Importing necessary packages: h5py, numpy, etc

In [122]:
import h5py
import numpy as np
from numpy import savetxt
import pandas as pd
from datetime import datetime

Assigning an object to our test H5 file found within the Raw Data subdirectory

In [28]:
input_file = "./Raw Data/ATL08_20230621231925_00272006_006_01.h5"

In [29]:
f = h5py.File(input_file, 'r')
list(f.keys())

['METADATA',
 'ancillary_data',
 'ds_geosegments',
 'ds_metrics',
 'ds_surf_type',
 'gt1l',
 'gt1r',
 'gt2l',
 'gt2r',
 'gt3l',
 'gt3r',
 'orbit_info',
 'quality_assessment']

The file has numerous keys including one for each beam. 

Creating an array for all the beams.

In [30]:
beams = np.array(('gt1l', 'gt1r', 'gt2l', 'gt2r', 'gt3l', 'gt3r'))
print(beams[1])

gt1r


Each beam is also its own group, which has keys within it. The 'land segments' group contains our parameters.

In [31]:
gt1r = f['gt1r']
list(gt1r.keys())

['signal_photons', 'land_segments']

Dr. Thomas's instructions: "Extract relevant data from h5 file (h_canopy, h_canopy_uncertainty, night_flag, lat, long, and file name (because the beam and date is contained in the name) into text or whatever (we already have some code to do this).   You will want a file with the following information from icesat: Shotnumber, date, beam (or ground track), lat, lon, h_canopy and associated metrics, 20m segment heights, night_flag, h_canopy_uncertainty."

** Also want to grab any parameter related to satellite motion (ascension/descension). Thinking  'beam_number_here/orbit_info/sc_orient' is the relevant metric

Iterating through the beams array to make sure they're read in correctly. Extracting relevant parameters from within the land_segments group.

# High level overview of code: 
# For every file in our 5000 file dataset, and for every beam of the 6 beams within that specific file, we'll assign variables to parameters of interest (lat, long, night_flag) and write those values to csv 

# Outer loop: current file in subdirectory
# Inner loop: beams of current file
# Innermost code: assigning, reshaping, and getting all parameters into a single table that we'll write to csv 


In [127]:
date_time = datetime.strptime(input_file[input_file.index('_') + 1:input_file.index('_') + 15], "%Y%m%d%H%M%S")
print(date_time.strftime('%Y-%m-%d-%H:%M:%S'))


2023-06-21-23:19:25


In [151]:
# Creating an output file and clearing it's contents
output_file = "./Extracted Data/test_output.txt"
open(output_file, 'w').close()


for filename in os.listdir(raw_data):
    file = os.path.join(raw_data, filename)
    
    # checking if it is a file, assigning an array to all the file's keys
    if os.path.isfile(file):
        file = h5py.File(file, 'r')
        keys = np.array(list(file.keys()))
        beams = keys[5:11]
        
        print('Current file:', filename)
        for beam in beams:
            
            
            
            latitude = file[beam + '/land_segments/latitude']
            latitude = np.array((latitude)).reshape((latitude.shape[0], 1))
            
            longitude = file[beam + '/land_segments/latitude']
            longitude = np.array((longitude)).reshape((longitude.shape[0], 1))
            
            # Column for canopy height             
            h_canopy = file[beam + '/land_segments/canopy/h_canopy']
            h_canopy = np.array(h_canopy).reshape((h_canopy.shape[0], 1))       
            
            # Column for canopy height uncertainty
            h_canopy_uncertainty = file[beam + '/land_segments/canopy/h_canopy_uncertainty']
            h_canopy_uncertainty = np.array(h_canopy_uncertainty).reshape((h_canopy_uncertainty.shape[0], 1))        
            
            
            # This metric is NOT reshaped into a 1 column array, it has dimensions of ( number of transects x 18)
            canopy_h_metrics = file[beam + '/land_segments/canopy/canopy_h_metrics']
            canopy_h_metrics = np.array(canopy_h_metrics)
            
            # This metric is also NOT reshaped into a 1 column array, it has dimensions of ( number of transects x 5)
            h_canopy_20m = file[beam + '/land_segments/canopy/h_canopy_20m/']
            h_canopy_20m = np.array(h_canopy_20m)
            
            # Column for the canopy openness
            canopy_openness = file[beam + '/land_segments/canopy/canopy_openness']
            canopy_openness = np.array(canopy_openness).reshape((canopy_openness.shape[0], 1))
            
            ## Double check this, dimensions are wildly different than other metrics
            # Column for photon segment ID
            # segment_id = file[beam + '/signal_photons/ph_segment_id']
            # segment_id = np.array(segment_id)
            # print(segment_id.shape)
            
            # Column for the segment ID beginning
            segment_id_beg = file[beam + '/land_segments/segment_id_beg']
            segment_id_beg = np.array(segment_id_beg).reshape((segment_id_beg.shape[0], 1))
            
            # Column for the segment ID ending
            segment_id_end = file[beam + '/land_segments/segment_id_end']
            segment_id_end = np.array(segment_id_end).reshape((segment_id_end.shape[0], 1))
            
            # Column for the night flag
            night_flag = file[beam + '/land_segments/night_flag']
            night_flag = np.array(night_flag).reshape((night_flag.shape[0], 1))
            
            # Column for the specific file name, same dimensions as latitude column
            file_name = np.full_like(latitude, filename, dtype=np.dtype('U100'))
                     
            # Creating a column for the granule date.
            # HMS data can be parsed later as there's inconsistencies between file names and the granule START time listed in the Earthdata portal. 
            # Creating a datetime object of the filename string, then a column of just the datetime information
            date_time = datetime.strptime(filename[filename.index('_') + 1:filename.index('_') + 15], "%Y%m%d%H%M%S")
            date = np.full_like(latitude, date_time, dtype=np.dtype('U100'))
            date = np.array(date).reshape((date.shape[0], 1))

    
            # Creating a column for the specific beam name, same dimensions as the latitude column
            beam_name = np.full_like(latitude, beam, dtype=np.dtype('U100'))
            # print('beam name:', beam)
            
            df = np.hstack((latitude, longitude, ))
            
            
            with open(output_file, "ab") as f:
                np.savetxt(f, df)
        

Current file: ATL08_20230620104617_00042002_006_01.h5
beam name: gt1l
beam name: gt1r
beam name: gt2l
beam name: gt2r
beam name: gt3l
beam name: gt3r
Current file: ATL08_20230621231925_00272006_006_01.h5
beam name: gt1l
beam name: gt1r
beam name: gt2l
beam name: gt2r
beam name: gt3l
beam name: gt3r


Based on our loop through the 2 files, if append all the latitude information to a table we should have one with 136,000 rows
lets test

In [149]:
f = h5py.File(input_file, 'r')
# list(f['gt1r/signal'].keys())
print(f['gt1r/signal_photons/ph_segment_id'].shape)


(1200864,)


In [60]:
# for i in range(1):
    
    
      
#     latitude = f[beams[i] + '/land_segments/latitude']
#     latitude = np.array((latitude)).reshape((latitude.shape[0], 1))
#     # print('latitude dimensions: ', latitude.shape)
    
#     longitude = f[beams[i] + '/land_segments/longitude']
#     longitude = np.array((longitude)).reshape((longitude.shape[0], 1))
#     # print('longitude dimensions: ', longitude.shape)
    
    
#     h_canopy = f[beams[i] + '/land_segments/canopy/h_canopy']
#     h_canopy = np.array(h_canopy).reshape((h_canopy.shape[0], 1))
#     # print('h_canopy dimensions: ', h_canopy.shape)
    
    
#     canopy_h_metrics = f[beams[i] + '/land_segments/canopy/canopy_h_metrics/']
#     print('canopy_h_metrics dimensions: ', canopy_h_metrics.shape)
    
#     # Curious about this parameter, h_canopy_20m was transposed in Dr. Thomas's code
#     h_canopy_20m = f[beams[i] + '/land_segments/canopy/h_canopy_20m/']
#     # print('h_canopy_20m dimensions: ', h_canopy_20m.shape)/
    
    
#     canopy_openness = f[beams[i] + '/land_segments/canopy/canopy_openness']
#     canopy_openness = np.array(canopy_openness).reshape((canopy_openness.shape[0], 1))
#     # print('canopy_openness dimensions: ', canopy_openness.shape)
    
#     segment_id_beg = f[beams[i] + '/land_segments/segment_id_beg']
#     segment_id_beg = np.array(segment_id_beg).reshape((segment_id_beg.shape[0], 1))
#     # print('segment_id_beg dimensions: ', segment_id_beg.shape)
    
#     segment_id_end = f[beams[i] + '/land_segments/segment_id_end']
#     segment_id_end = np.array(segment_id_end).reshape((segment_id_end.shape[0], 1))
#     # print('segment_id_end dimensions: ', segment_id_end.shape)
    
    
#     night_flag = f[beams[i] + '/land_segments/night_flag']
#     night_flag = np.array(night_flag).reshape((night_flag.shape[0], 1))
#     # print('night_flag dimensions: ', night_flag.shape)
    
    
#     # Creating column for just the beam name, will have the same dimensions as the latitude column
#     beam_name = beams[i]
#     beam = np.full_like(latitude, beam_name, dtype=np.dtype('U100'))
   
    
#     df = np.hstack((beam, latitude, longitude, h_canopy, canopy_h_metrics, h_canopy_20m ))
#     print(df[0:5, :])