# Downloading and Saving NOAA GOES-16 ABI Files

This code retrieves and downloads NOAA GOES-16 ABI (Advanced Baseline Imager) data files for a specified time period. It accomplishes the following:

1. **Initialize S3 Access**: Uses the `s3fs` library to interact with the AWS S3 bucket where NOAA GOES-16 data is stored.
2. **File Retrieval**: Selects files based on the specified start and end year, day, and hour ranges.
3. **Data Extraction**: Extracts meaningful information (e.g., product base name, date, and time) from file paths to create unique file names.
4. **Download and Save**: Reads the selected files from the S3 bucket and saves them locally in a specified directory.

The script is designed to handle potential errors, such as missing files, and skips over them without interrupting the process. The downloaded files are saved in NetCDF format for further analysis.


In [None]:
# Import necessary libraries
import s3fs  # Library to interact with AWS S3 storage, used for reading and writing data directly from/to S3 buckets.
import numpy as np  # Library for numerical operations, used for working with arrays and performing mathematical computations.

In [None]:
# Function to collect and save in a list the files available within the specified period of interest
def get_files(s_year, e_year, s_day, e_day, s_hour, e_our):
    """
    Parameters:
    s_year (int): Start year of the period of interest.
    e_year (int): End year of the period of interest.
    s_day (int): Start day of the year (Julian day).
    e_day (int): End day of the year (Julian day).
    s_hour (int): Start hour (UTC).
    e_our (int): End hour (UTC).
    
    Returns:
    list: A list of file paths for the specified period.
    """
    print('Getting files')  # Log progress
    aux = []  # Initialize an empty list to store file paths
    
    # Loop through each year in the specified range
    for y in range(s_year, e_year + 1):
        # Loop through each day (Julian day) within the specified range
        for d in range(s_day, e_day):
            # Loop through each hour within the specified range
            for j in range(s_hour, e_our):
                try:
                    # Construct the path to the desired files in the NOAA GOES-16 dataset on S3
                    # Example path structure: 'noaa-goes16/ABI-L2-FDCF/<year>/<Julian day>/<hour>/'
                    FD = fs.ls('noaa-goes16/ABI-L2-FDCF/' + str(y) + '/' + str(d).zfill(3) + '/' + str(j).zfill(2) + '/')
                    
                    # Append the retrieved file paths to the list
                    aux = np.append(aux, FD)
                except FileNotFoundError as e:
                    # Log a warning and skip if the file is not found
                    print(f"FileNotFoundError for path 'noaa-goes16/ABI-L2-FDCF/{y}/{str(d).zfill(3)}/{str(j).zfill(2)}/': {e}. Skipping.")
                    continue  # Skip to the next file in the loop
    
    return aux  # Return the list of file paths

In [None]:
# Open an ABI netCDF4 data file

# Step 1: Initialize the S3 file system
fs = s3fs.S3FileSystem(anon=True, use_listings_cache=True)
# The `anon=True` parameter indicates anonymous access to the AWS S3 bucket (no credentials required).
# `use_listings_cache=True` improves performance by caching directory listings locally.

# Step 2: Define the period of interest for retrieving files
start_year, end_year = 2020, 2020  # Start and end year
start_day, end_day = 210, 211  # Start and end Julian days (day of year)
start_hour, end_our = 19, 20  # Start and end UTC hours (24-hour format)

# Step 3: Use the `get_files` function to retrieve file paths from the specified period
data_list = get_files(start_year, end_year, start_day, end_day, start_hour, end_our)

# The resulting `data_list` will contain the paths to the files from the NOAA GOES-16 ABI dataset
# that match the specified time range.

In [None]:
# Define the output directory for the data
datadir = '/home/jovyan/...'  # Replace this with the desired directory path where the data will be saved.
# Example: This path is specific to the current environment (e.g., a JupyterHub instance or local machine).
# Update it based on your system's directory structure.

In [None]:
# Loop through the files selected previously
for i in range(0, len(data_list)):
    # Extract the base product name from the file path
    # Example: 'OR_ABI-L2-FDCF-M6_G16_s20202101840180_e20202101845163_c20202101845226.nc'
    prodbase = data_list[i].split('/')[5][:23]
    
    # Extract the date-time information from the file name
    starttime = data_list[i].split('/')[5].split('_')[3]  # Example: 's20202101840180'
    year, julian, hhmm = starttime[1:5], starttime[5:8], starttime[8:12]
    # 'year' = '2020', 'julian' = '210' (day of year), 'hhmm' = '1840' (UTC time)
    
    # Format the date-time information for readability
    date_info = year + '-' + julian + '-' + hhmm  # Example: '2020-210-1840'
    
    # Create a short file title using the product base and date-time information
    fileshtitle = prodbase + '-' + date_info  # Example: 'OR_ABI-L2-FDCF-M6_G16_s2020-210-1840'
    
    # Open the remote file from S3 and save it locally in the defined directory
    with fs.open(data_list[i], 'rb') as f:
        # Write the file content to the local directory
        with open(datadir + fileshtitle + '.nc', 'wb') as local_f:
            local_f.write(f.read())