# Flux of Emission Estimates from GOES-16 Data: FRP and Burned Area

This script is designed to process fire data retrieved from GOES-16 geostationary satellite observations. It focuses on estimating fire-related emissions such as TPM (Total Particulate Matter), CO2, CO, and CH4 by utilizing the Fire Radiative Power (FRP) and burned area data. The script is structured to read the data, calculate the flux of emission for each grid cell using the FEER (Fire Emission Estimation) coefficients, and save the results for specific grid regions in a CSV file. 

### Key Steps in the Script:

1. **File Collection**: 
   - Collects a list of files within a specified time range (year, day, and hour).
   - Each file contains data on fire radiative power (FRP), burned area, and other related variables.

2. **Indexing**:
   - Retrieves the relevant latitude and longitude coordinates for the region of interest.
   - Extracts the indexes and matrix for the selected grid using geographic bounds (longitude and latitude).

3. **Data Processing**:
   - For each file, reads the FRP and burned area data.
   - Uses the indexes matrix to extract the relevant data points within the defined grid area.
   - Calculates the emission flux for each grid cell based on FRP and burned area, using the FEER coefficients to estimate the Total Particulate Matter (TPM), CO2, CO, and CH4 flux of emissions.
   - The emissions are averaged over the specified time period (e.g., monthly), providing estimates for each grid cell.

4. **Result Compilation**:
   - For each grid cell, the script computes the monthly average emission flux values for TPM, CO2, CO, and CH4.
   - The results, including the calculated emission fluxes for each grid cell and the associated latitudes and longitudes, are compiled into a string format.
   - These results are saved in a CSV file for further analysis.

5. **Output**:
   - A CSV file is generated containing emissions data for each grid cell. This file includes detailed information on latitude,longitude, and the calculated emissions (TPM, CO2, CO, CH4) for each cell, allowing for spatial and temporal analysis.

In [None]:
#Import libraries
import os
from io import BytesIO
import s3fs
import xarray as xr
import numpy as np
import glob
from pyproj import Proj
import pandas as pd
import warnings
import calendar

In [None]:
# Define input and output directories and file names
datadir = '/...'  # Directory where the processed CSV file will be saved

# Collect the FEER (Fire Emission Inventory) emission coefficients CSV files
data = sorted(glob.glob(datadir+'/FEER*.csv'))  # Retrieves all CSV files starting with 'FEER' in the data directory
feer_data = pd.read_csv(data[0])  # Reads the first CSV file into a pandas DataFrame containing emission coefficients

# Define output directory, the year to process the data, and the output file name
# It is recommended to run this algorithm one year at a time
Year = 2020  # The specific year of data to process
outfile = datadir+'amazon_'+str(Year)+'_mean_montly_flux.csv' # Output file name, including year and other details

In [None]:
# Define constants for emission factors and species emission calculations

# According to Nguyen and Wooster, 2020, the species emission can be calculated as:
# Ce_species = (EF_species / EF_TMP) * Ce_TPM
# where EF_species is the emission factor for the species and EF_TMP is the emission factor for TPM (Total Particulate Matter)

# Emission factors (EF) for CO2, CO, CH4, and TPM in tropical forests (from Andreae, 2019)
EF_CO2 = 1620  # Emission factor for CO2 in g/kg_burned
s_EF_CO2 = 70  # Standard deviation of the CO2 emission factor

EF_CO = 104  # Emission factor for CO in g/kg_burned
s_EF_CO = 39  # Standard deviation of the CO emission factor

EF_CH4 = 6.5  # Emission factor for CH4 in g/kg_burned
s_EF_CH4 = 1.6  # Standard deviation of the CH4 emission factor

EF_TPM = 8.7  # Emission factor for TPM in g/kg_burned
s_EF_TPM = 3.1  # Standard deviation of the TPM emission factor

# Calculate the emission coefficients (Ce) for CO2, CO, and CH4 in kg/MJ
Ce_CO2 = (EF_CO2 / EF_TPM)  # Emission coefficient for CO2
Ce_CO = (EF_CO / EF_TPM)    # Emission coefficient for CO
Ce_CH4 = (EF_CH4 / EF_TPM)  # Emission coefficient for CH4

# Calculate the uncertainty (sigma) in the emission coefficients using error propagation
sigma_Ce_CO2 = np.sqrt((s_EF_CO2 / EF_TPM) ** 2 + (EF_CO2 * s_EF_TPM / (EF_TPM) ** 2) ** 2)
sigma_Ce_CO = np.sqrt((s_EF_CO / EF_TPM) ** 2 + (EF_CO * s_EF_TPM / (EF_TPM) ** 2) ** 2)
sigma_Ce_CH4 = np.sqrt((s_EF_CH4 / EF_TPM) ** 2 + (EF_CH4 * s_EF_TPM / (EF_TPM) ** 2) ** 2)

# Define the Region of Interest (ROI) in degrees (latitude and longitude)
# The coordinates below correspond to the Amazon region
minlon, maxlon, minlat, maxlat = -72, -48, -11, -3  # Coordinates for Amazon region ROI
# For example, coordinates for the Cerrado region could be:
# minlon, maxlon, minlat, maxlat = -57.5, -56.5, -17.5, -16.5

In [None]:
#Define the file header
aux1 ='year,month,central_lat,central_lon,mean_flux,TPM_flux,CO2_flux,CO_flux,CH4_flux\n'
header = aux1
outstring=''
outfn = open(outfile, 'w')
outfn.writelines(header)

# Define the file header for the output CSV
aux1 ='year,month,central_lat,central_lon,mean_flux,TPM_flux,CO2_flux,CO_flux,CH4_flux\n'
# Define the header format for the output CSV file
header = aux1  # The header consists of column names separated by commas
outstring = ''  # Initialize an empty string to accumulate data (if needed)

# Open the output file in write mode
outfn = open(outfile, 'w')  
outfn.writelines(header)  # Write the header to the output file

# Initialize the S3 file system for interacting with AWS S3
fs = s3fs.S3FileSystem(anon=True)  # Allows access to S3 buckets without authentication

In [None]:
# Initialize geometric variables to extract satellite-specific data for the calculation of latitudes and longitudes

# Function to extract latitude and longitude based on satellite coordinates
def get_lat_lon(file_system):
    # List files from the specified directory in the S3 bucket corresponding to a specific day and time
    files = file_system.ls('noaa-goes16/ABI-L2-FDCF/2020/'+str(200).zfill(3)+'/'+str(15).zfill(2)+'/')  
    # The list contains 6 files for day 200 of 2021 (for UTC times 15:00, 15:10, 15:20, ..., 15:50)

    # Open the first file in the list to extract the dataset
    with fs.open(files[0], 'rb') as f:
        ds0 = xr.open_dataset(BytesIO(f.read()), engine='h5netcdf')  # Load the dataset into xarray

    # Extract the satellite parameters from the dataset (projection height, origin longitude, and sweep axis)
    sat_h = ds0.goes_imager_projection.perspective_point_height  # Satellite height (distance from Earth surface)
    sat_lon = ds0.goes_imager_projection.longitude_of_projection_origin  # Longitude of the projection origin
    sat_sweep = ds0.goes_imager_projection.sweep_angle_axis  # Sweep axis for satellite projection

    # Use the PyProj library to set up the geostationary projection based on extracted satellite parameters
    p = Proj(proj='geos', h=sat_h, lon_0=sat_lon, sweep=sat_sweep)  # Define the projection for the satellite view

    # Calculate the X and Y coordinates in the satellite’s coordinate system (multiplying by satellite height)
    X = np.array(ds0.x) * sat_h
    Y = np.array(ds0.y) * sat_h

    # Create a meshgrid for the X and Y coordinates to represent a grid of satellite points
    XX, YY = np.meshgrid(X, Y)

    # Transform the satellite coordinates (XX, YY) to latitudes and longitudes using the geostationary projection
    rlon, rlat = p(XX, YY, inverse=True)

    return rlat, rlon  # Return the calculated latitudes and longitudes

In [None]:
# Function to create a grid of 0.5° x 0.5° and assign the corresponding FEER coefficients to each grid element within the Region of Interest (ROI)
def get_indexes_v3(min_lon, max_lon, min_lat, max_lat, rlat, rlon, dados_feer):
    # Calculate the center of latitude and longitude for each element in a 0.5° x 0.5° grid inside the ROI
    centers_lon = np.linspace(min_lon + 0.25, max_lon - 0.25, num=int(max_lon - min_lon) * 2)  # 0.5° grid
    centers_lat = np.linspace(min_lat + 0.25, max_lat - 0.25, num=int(max_lat - min_lat) * 2)
    
    # Calculate the center of latitude and longitude for each element in a 1° x 1° grid (for comparison purposes)
    centers_lon2 = np.linspace(min_lon + 0.5, max_lon - 0.5, num=int(max_lon - min_lon))  # 1° grid
    centers_lat2 = np.linspace(min_lat + 0.5, max_lat - 0.5, num=int(max_lat - min_lat))

    aux_list = []
    
    # Calculate the matching FEER coefficient for each element of the 1° x 1° grid
    for i in range(0, len(centers_lat2)):
        # Extract the FEER coefficients for the current latitude and longitude bounds from the 'dados_feer' DataFrame
        df2 = dados_feer.loc[(dados_feer['Latitude'] == centers_lat2[i]) & 
                             (dados_feer['Longitude'] <= max_lon) &
                             (dados_feer['Longitude'] >= min_lon), 'Ce_850'].to_numpy()
        
        # Append the corresponding FEER coefficients to the list
        aux_list = np.append(aux_list, df2)
        
        if i == 0:
            # Create a coordinate pair matrix for the 1° x 1° grid (latitudes and longitudes)
            aux2 = np.repeat(centers_lat2[i], len(centers_lon2))
            lat_lon_feer2 = np.column_stack((aux2, centers_lon2))
        else:
            aux2 = np.repeat(centers_lat2[i], len(centers_lon2))
            aux2 = np.column_stack((aux2, centers_lon2))
            lat_lon_feer2 = np.vstack((lat_lon_feer2, aux2))
    
    # Stack the matching FEER coefficients into the coordinate matrix for the 1° x 1° grid
    lat_lon_feer2 = np.column_stack((lat_lon_feer2, aux_list))

    # Calculate the FEER coefficient corresponding to each element of the 0.5° x 0.5° grid
    for i in range(0, len(centers_lat)):
        if i == 0:
            aux = np.repeat(centers_lat[i], len(centers_lon))
            lat_lon_feer = np.column_stack((aux, centers_lon))
        else:
            aux = np.repeat(centers_lat[i], len(centers_lon))
            aux2 = np.column_stack((aux, centers_lon))
            lat_lon_feer = np.vstack((lat_lon_feer, aux2))

    # Match the corresponding FEER coefficient to each 0.5° x 0.5° grid element based on proximity
    aux_list_2 = []
    for j in range(0, len(lat_lon_feer)):
        for n in range(0, len(lat_lon_feer2)):
            if ((lat_lon_feer[j, 0] == lat_lon_feer2[n, 0] + 0.25) or (lat_lon_feer[j, 0] == lat_lon_feer2[n, 0] - 0.25)):
                if ((lat_lon_feer[j, 1] == lat_lon_feer2[n, 1] + 0.25) or (lat_lon_feer[j, 1] == lat_lon_feer2[n, 1] - 0.25)):
                    # Append the FEER coefficient to the auxiliary list for the matching grid
                    aux_list_2 = np.append(aux_list_2, lat_lon_feer2[n, 2])
    
    # Stack the 0.5° x 0.5° grid coordinates and their corresponding FEER coefficients
    lat_lon_feer = np.column_stack((lat_lon_feer, aux_list_2))
    matrix = lat_lon_feer  # Final matrix of latitudes, longitudes, and FEER coefficients

    # Create a mask for valid values within the entire ROI (used to select corresponding data from Full disk matrix)
    I = np.where((rlat >= min_lat) & (rlat <= max_lat) & (rlon >= min_lon) & (rlon <= max_lon))
    index_list = []
    index_list.insert(0, I)

    # Repeat the process for each element of the 0.5° x 0.5° grid to find corresponding indices in the satellite data
    for k in range(0, len(matrix)):
        aux1 = np.where((rlat >= matrix[k, 0] - 0.25) & (rlat <= matrix[k, 0] + 0.25) & 
                        (rlon >= matrix[k, 1] - 0.25) & (rlon <= matrix[k, 1] + 0.25))
        index_list.insert(k + 1, aux1)

    return index_list, matrix  # Return the list of indices and the final grid matrix

In [None]:
# Function to collect and save file names in a list for the given period of interest with error handling
# This function iterates over the specified year, day, and hour ranges and collects the corresponding file names
# from the NOAA GOES-16 directory structure, with added error handling for missing files.

def get_files(s_year, e_year, s_day, e_day, s_hour, e_hour):
    print('Getting file names')
    aux = []  # List to store the file names
    # Loop over the years in the specified range
    for y in range(s_year, e_year + 1):
        # Loop over the days in the specified range
        for d in range(s_day, e_day):
            # The variable 'd' determines the days of the product (e.g., day 228 corresponds to 15:00, 15:10, etc.)
            for j in range(s_hour, e_hour):
                try:
                    # List the files for a specific year, day, and hour directory
                    # These directories contain 6 files for each 10-minute interval (e.g., 15:00, 15:10, ..., 15:50 UTC)
                    FD = fs.ls('noaa-goes16/ABI-L2-FDCF/' + str(y) + '/' + str(d).zfill(3) + '/' + str(j).zfill(2) + '/')
                    aux = np.append(aux, FD)  # Append the found files to the list
                except FileNotFoundError as e:
                    # In case a file is not found, print an error message and skip to the next file
                    print(f"FileNotFoundError file {'noaa-goes16/ABI-L2-FDCF/'+str(y)+'/'+str(d).zfill(3)+'/'+str(j).zfill(2)+'/'}: {e}. Skipping this file.")
                    continue  # Skip to the next file in the list
    return aux

In [None]:
# Main function that processes the input files and computes emission flux estimates for the region of interest (ROI)
# The function calculates the emission flux for FRP and burned area using the FEER coefficients and stores the results
# in an array, to be later used for spatial and temporal analysis.
def process_data_v7(rlat, rlon, files, matrix, indexes):

    # Loop through the list of files to process each one
    for i in range(0, len(files)):
        with fs.open(files[i], 'rb') as f:
            # Open the current file using xarray and read the data
            ds = xr.open_dataset(BytesIO(f.read()), engine='h5netcdf')
            try:
                # Extract date and time information from the file path for later use in plots or metadata
                prodbase = files[i].split('/')[5][:23]
                starttime = files[i].split(prodbase)[1].split('_')[0]
                year, julian, hhmm = starttime[:4], starttime[4:7], starttime[7:11]
                plottitle = year + ',' + julian + ',' + hhmm
                fpart = starttime + ',' + plottitle
                print(f'Processing year: {year}, day: {julian}, hour: {hhmm}')  # Processing status message

                # Convert hours and minutes into a percentage of the day for easier plotting
                code = int(julian) + (int(hhmm) / 100) / 24 + (int(hhmm) % 100) / 60 / 24

                #####################################################################
                # Extract the FRP (Fire Radiative Power) and Burned Area data from the file
                P = np.array(ds.Power)  # FRP Matrix data
                A = np.array(ds.Area)   # Burned Area Matrix data

                # Use the indexes matrix to collect data from the region of interest (ROI) based on the grid
                P_box_amazon = P[indexes[1]]
                A_box_amazon = A[indexes[1]]

                # Calculate the emission flux as FRP divided by the burned area (P / A)
                P_A_box_amazon = P_box_amazon / A_box_amazon

                # Remove any NaN values from the resulting array to reduce processing time
                array_F_box_amazon = P_A_box_amazon[~np.isnan(P_A_box_amazon)]

                # Get the latitude and longitude for the current grid element
                lat = matrix[0, 0]  # Latitude of the grid element
                lon = matrix[0, 1]  # Longitude of the grid element

                # Handle potential NaN values by ignoring warnings and computing the mean emission flux and other species' fluxes
                with warnings.catch_warnings():
                    warnings.simplefilter("ignore", category=RuntimeWarning)
                    mean_flux = np.mean(array_F_box_amazon)  # Mean flux of FRP
                    RE_flux = np.mean(array_F_box_amazon * matrix[0, 2])  # Mean flux of TPM (total particulate matter)
                    CO2_flux = np.mean(array_F_box_amazon * Ce_CO2 * matrix[0, 2])  # Mean CO2 flux
                    CO_flux = np.mean(array_F_box_amazon * Ce_CO * matrix[0, 2])  # Mean CO flux
                    CH4_flux = np.mean(array_F_box_amazon * Ce_CH4 * matrix[0, 2])  # Mean CH4 flux

                # If the computed fluxes are NaN, set them to 0 to avoid invalid results
                mean_flux = 0 if np.isnan(mean_flux) else mean_flux
                RE_flux = 0 if np.isnan(RE_flux) else RE_flux
                CO2_flux = 0 if np.isnan(CO2_flux) else CO2_flux
                CO_flux = 0 if np.isnan(CO_flux) else CO_flux
                CH4_flux = 0 if np.isnan(CH4_flux) else CH4_flux

                # Store the computed values for the grid element
                array_lat = lat
                array_lon = lon
                array_mean_flux = mean_flux
                array_mean_TPM = RE_flux
                array_CO2_flux = CO2_flux
                array_CO_flux = CO_flux
                array_CH4_flux = CH4_flux

                # Repeat the process for each grid element in the matrix
                for k in range(1, len(matrix)):
                    P_box_amazon = P[indexes[k + 1]]
                    A_box_amazon = A[indexes[k + 1]]
                    P_A_box_amazon = P_box_amazon / A_box_amazon
                    array_F_box_amazon = P_A_box_amazon[~np.isnan(P_A_box_amazon)]

                    lat = matrix[k, 0]  # Latitude of the grid element
                    lon = matrix[k, 1]  # Longitude of the grid element

                    # Calculate mean fluxes for each species and handle NaN values
                    with warnings.catch_warnings():
                        warnings.simplefilter("ignore", category=RuntimeWarning)
                        mean_flux = np.mean(array_F_box_amazon)
                        RE_flux = np.mean(array_F_box_amazon * matrix[k, 2])
                        CO2_flux = np.mean(array_F_box_amazon * Ce_CO2 * matrix[k, 2])
                        CO_flux = np.mean(array_F_box_amazon * Ce_CO * matrix[k, 2])
                        CH4_flux = np.mean(array_F_box_amazon * Ce_CH4 * matrix[k, 2])

                    # Replace NaN values with 0
                    mean_flux = 0 if np.isnan(mean_flux) else mean_flux
                    RE_flux = 0 if np.isnan(RE_flux) else RE_flux
                    CO2_flux = 0 if np.isnan(CO2_flux) else CO2_flux
                    CO_flux = 0 if np.isnan(CO_flux) else CO_flux
                    CH4_flux = 0 if np.isnan(CH4_flux) else CH4_flux

                    # Append the results for each grid element to the arrays
                    array_lat = np.append(array_lat, lat)
                    array_lon = np.append(array_lon, lon)
                    array_mean_flux = np.append(array_mean_flux, mean_flux)
                    array_mean_TPM = np.append(array_mean_TPM, RE_flux)
                    array_CO2_flux = np.append(array_CO2_flux, CO2_flux)
                    array_CO_flux = np.append(array_CO_flux, CO_flux)
                    array_CH4_flux = np.append(array_CH4_flux, CH4_flux)

            # Handle any errors that may occur during processing
            except OSError as error:
                print(error)

    # Return the results as arrays containing latitude, longitude, and the corresponding fluxes for each species
    return array_lat, array_lon, array_mean_flux, array_mean_TPM, array_CO2_flux, array_CO_flux, array_CH4_flux

In [None]:
###############################################################################
# Starting message
print('Compiling statistics')

# Retrieve latitude and longitude data using the get_lat_lon function
rlat, rlon = get_lat_lon(fs)

# Get the matrix and indexes for the geographic area defined by minlon, maxlon, minlat, maxlat
Indexes, M = get_indexes_v3(minlon, maxlon, minlat, maxlat, rlat, rlon, feer_data)
print('Got indexes and matrix')

# List of month names from the calendar library (excluding the first element, which is empty)
month_names = list(calendar.month_name)[1:]

# Initialize an empty list to store the matrix of days for each month
matrix_days_months = []

# Initialize julian_day to 1 (starting point for day counting)
julian_day = 1

# Loop through each month to populate the matrix with month names and corresponding Julian days
for mes_num in range(1, 13):
    # Get the last day of the current month using the calendar.monthrange function
    last_day_month = calendar.monthrange(Year, mes_num)[1]
    
    # Generate an array of Julian days for the current month
    Julian_days = np.arange(julian_day, julian_day + last_day_month)

    # Append the month name and Julian days to the matrix
    matrix_days_months.append([month_names[mes_num - 1], Julian_days])

    # Update julian_day for the next month
    julian_day += last_day_month

# Convert the matrix of days/months to a numpy array
matrix_days_months = np.array(matrix_days_months, dtype=object)

# Loop through specific months (in this case, months 7 and 8, corresponding to July and August)
for m in range(7, 9):
    # Define the start and end year, days, and hours for data collection based on the selected month
    start_year, end_year, start_day, end_day, start_hour, end_hour = Year, Year, \
        matrix_days_months[m, 1][0], matrix_days_months[m, 1][1], 15, 16

    # List the data files corresponding to the start and end definitions
    data_list = get_files(start_year, end_year, start_day, end_day, start_hour, end_hour)
    print('Data listed for ' + str(matrix_days_months[m, 0]))

    # Change the working directory to the output directory
    os.chdir(datadir)

    # Call the process_data_v7 function to retrieve emission flux data for the defined period
    array_lat, array_lon, array_mean_flux, array_mean_TPM, array_CO2_flux, array_CO_flux, array_CH4_flux = process_data_v7(rlat, rlon, data_list, M, Indexes)

    # Get unique latitude and longitude values from the results
    lats = np.unique(array_lat)
    lons = np.unique(array_lon)
    
    # Reverse the latitude array to align with the correct geographic orientation
    lats = lats[::-1]

    # Loop through each grid element (latitude and longitude) to calculate monthly averages
    for k in range(0, len(lats)):
        for n in range(0, len(lons)):
            # Find the index of the current grid element in the data arrays
            index = np.where((array_lat == lats[k]) & (array_lon == lons[n]))

            # Calculate the monthly average for each emission flux type
            mean_flux_montly = np.mean(array_mean_flux[index])
            mean_TPM_montly = np.mean(array_mean_TPM[index])
            mean_CO2_montly = np.mean(array_CO2_flux[index])
            mean_C0_montly = np.mean(array_CO_flux[index])
            mean_CH4_montly = np.mean(array_CH4_flux[index])

            # Format the results as a string to be written to the output file
            outstring = str(Year) + ',' + str(matrix_days_months[m, 0]) + ',' + str(lats[k]) + ',' + str(lons[n]) + ',' \
                        + str(mean_flux_montly) + ',' + str(mean_TPM_montly) + ',' + str(mean_CO2_montly) + ',' \
                        + str(mean_C0_montly) + ',' + str(mean_CH4_montly) + '\n'
            
            # Write the formatted string to the output file
            outfn.writelines(outstring)

# Close the output file after processing
outfn.close()

# Print the completion message
print('Done')