# Fire Emission Estimates from GOES-16 Data: FRP and Temperature

This script is designed to process fire data retrieved from GOES-16 geostationary satellite observations. It specifically focuses on estimating fire-related emissions such as TPM (Total Particulate Matter), CO2, CO, and CH4 by utilizing the Fire Radiative Power (FRP) and temperature data. The script is structured to read the data, compute various emissions using the FEER (Fire Emission Estimation) coefficients, and save the results for specific grid regions in a CSV file. 

### Key Steps in the Script:

1. **File Collection**: 
   - Collects a list of files within a specified time range (year, day, and hour).
   - Each file contains data on fire radiative power (FRP), temperature, and other related variables.

2. **Indexing**:
   - Retrieves the relevant latitude and longitude coordinates for the region of interest.
   - Extracts the indexes and matrix for the selected grid using geographic bounds (longitude and latitude).

3. **Data Processing**:
   - For each file, reads the FRP and temperature data.
   - Uses the indexes matrix to extract the relevant data points within the defined grid area.
   - Calculates emission estimates for Total Particulate Matter (TPM), CO2, CO, and CH4 using the FEER coefficients.
   - Uncertainty values are calculated for the emission estimates.

4. **Result Compilation**:
   - Compiles results into a string format that includes the emission estimates and associated uncertainties.
   - Stores the compiled results in a CSV file for further analysis.

5. **Output**:
   - A CSV file is generated with emissions data for each grid cell, containing detailed information on FRP, temperature, and the calculated emissions (TPM, CO2, CO, CH4) along with their uncertainties.

This script enables the analysis of fire emissions over time and space, contributing to better understanding the impacts of biomass burning on air quality and climate.


In [None]:
# Import necessary libraries
import os  # Provides functions to interact with the operating system, such as file management
from io import BytesIO  # Used for working with byte data in memory, often for handling file-like objects
import s3fs  # Allows interaction with AWS S3 file systems
import xarray as xr  # Provides advanced data structures and methods for working with multi-dimensional arrays
import numpy as np  # Used for numerical operations, such as arrays, matrices, and mathematical functions
import glob  # Used to retrieve files and directories matching a specified pattern
from pyproj import Proj  # A Python interface for the PROJ library, used for coordinate transformations
import pandas as pd  # Provides data structures and functions for data manipulation and analysis
import warnings  # Allows for issuing warning messages in the code when necessary

In [None]:
# Define input and output directories and file names
datadir = '/...'  # Directory where the processed CSV file will be saved

# Collect the FEER (Fire Emission Inventory) emission coefficients CSV files
data = sorted(glob.glob(datadir+'/FEER*.csv'))  # Retrieves all CSV files starting with 'FEER' in the data directory
feer_data = pd.read_csv(data[0])  # Reads the first CSV file into a pandas DataFrame containing emission coefficients

# Define output directory, the year to process the data, and the output file name
# It is recommended to run this algorithm one year at a time
Year = 2020  # The specific year of data to process
outfile = datadir+'goes_data_emission_rate_test_'+str(Year)+'_150_350.csv'  # Output file name, including year and other details

In [None]:
# Define constants for emission factors and species emission calculations

# According to Nguyen and Wooster, 2020, the species emission can be calculated as:
# Ce_species = (EF_species / EF_TMP) * Ce_TPM
# where EF_species is the emission factor for the species and EF_TMP is the emission factor for TPM (Total Particulate Matter)

# Emission factors (EF) for CO2, CO, CH4, and TPM in tropical forests (from Andreae, 2019)
EF_CO2 = 1620  # Emission factor for CO2 in g/kg_burned
s_EF_CO2 = 70  # Standard deviation of the CO2 emission factor

EF_CO = 104  # Emission factor for CO in g/kg_burned
s_EF_CO = 39  # Standard deviation of the CO emission factor

EF_CH4 = 6.5  # Emission factor for CH4 in g/kg_burned
s_EF_CH4 = 1.6  # Standard deviation of the CH4 emission factor

EF_TPM = 8.7  # Emission factor for TPM in g/kg_burned
s_EF_TPM = 3.1  # Standard deviation of the TPM emission factor

# Calculate the emission coefficients (Ce) for CO2, CO, and CH4 in kg/MJ
Ce_CO2 = (EF_CO2 / EF_TPM)  # Emission coefficient for CO2
Ce_CO = (EF_CO / EF_TPM)    # Emission coefficient for CO
Ce_CH4 = (EF_CH4 / EF_TPM)  # Emission coefficient for CH4

# Calculate the uncertainty (sigma) in the emission coefficients using error propagation
sigma_Ce_CO2 = np.sqrt((s_EF_CO2 / EF_TPM) ** 2 + (EF_CO2 * s_EF_TPM / (EF_TPM) ** 2) ** 2)
sigma_Ce_CO = np.sqrt((s_EF_CO / EF_TPM) ** 2 + (EF_CO * s_EF_TPM / (EF_TPM) ** 2) ** 2)
sigma_Ce_CH4 = np.sqrt((s_EF_CH4 / EF_TPM) ** 2 + (EF_CH4 * s_EF_TPM / (EF_TPM) ** 2) ** 2)

# Define the Region of Interest (ROI) in degrees (latitude and longitude)
# The coordinates below correspond to the Amazon region
minlon, maxlon, minlat, maxlat = -72, -48, -11, -3  # Coordinates for Amazon region ROI
# For example, coordinates for the Cerrado region could be:
# minlon, maxlon, minlat, maxlat = -57.5, -56.5, -17.5, -16.5

In [None]:
# Define the file header for the output CSV
aux1 = 'sat,year,julian,hhmm,code,central_lat,central_lon,FRP(MW),N_FRP,FRE(MJ),RE(kg/s),ME(kg),CO_2(kg),sigma_CO_2(kg),CO(kg),sigma_CO(kg),CH4(kg),sigma_CH4(kg),mean_FRP(MW),mean_temp(K)\n'  
# Define the header format for the output CSV file
header = aux1  # The header consists of column names separated by commas
outstring = ''  # Initialize an empty string to accumulate data (if needed)

# Open the output file in write mode
outfn = open(outfile, 'w')  
outfn.writelines(header)  # Write the header to the output file

# Initialize the S3 file system for interacting with AWS S3
fs = s3fs.S3FileSystem(anon=True)  # Allows access to S3 buckets without authentication

In [None]:
# Initialize geometric variables to extract satellite-specific data for the calculation of latitudes and longitudes

# Function to extract latitude and longitude based on satellite coordinates
def get_lat_lon(file_system):
    # List files from the specified directory in the S3 bucket corresponding to a specific day and time
    files = file_system.ls('noaa-goes16/ABI-L2-FDCF/2020/'+str(200).zfill(3)+'/'+str(15).zfill(2)+'/')  
    # The list contains 6 files for day 200 of 2021 (for UTC times 15:00, 15:10, 15:20, ..., 15:50)

    # Open the first file in the list to extract the dataset
    with fs.open(files[0], 'rb') as f:
        ds0 = xr.open_dataset(BytesIO(f.read()), engine='h5netcdf')  # Load the dataset into xarray

    # Extract the satellite parameters from the dataset (projection height, origin longitude, and sweep axis)
    sat_h = ds0.goes_imager_projection.perspective_point_height  # Satellite height (distance from Earth surface)
    sat_lon = ds0.goes_imager_projection.longitude_of_projection_origin  # Longitude of the projection origin
    sat_sweep = ds0.goes_imager_projection.sweep_angle_axis  # Sweep axis for satellite projection

    # Use the PyProj library to set up the geostationary projection based on extracted satellite parameters
    p = Proj(proj='geos', h=sat_h, lon_0=sat_lon, sweep=sat_sweep)  # Define the projection for the satellite view

    # Calculate the X and Y coordinates in the satellite’s coordinate system (multiplying by satellite height)
    X = np.array(ds0.x) * sat_h
    Y = np.array(ds0.y) * sat_h

    # Create a meshgrid for the X and Y coordinates to represent a grid of satellite points
    XX, YY = np.meshgrid(X, Y)

    # Transform the satellite coordinates (XX, YY) to latitudes and longitudes using the geostationary projection
    rlon, rlat = p(XX, YY, inverse=True)

    return rlat, rlon  # Return the calculated latitudes and longitudes

In [None]:
# Function to create a grid of 0.5° x 0.5° and assign the corresponding FEER coefficients to each grid element within the Region of Interest (ROI)
def get_indexes_v3(min_lon, max_lon, min_lat, max_lat, rlat, rlon, dados_feer):
    # Calculate the center of latitude and longitude for each element in a 0.5° x 0.5° grid inside the ROI
    centers_lon = np.linspace(min_lon + 0.25, max_lon - 0.25, num=int(max_lon - min_lon) * 2)  # 0.5° grid
    centers_lat = np.linspace(min_lat + 0.25, max_lat - 0.25, num=int(max_lat - min_lat) * 2)
    
    # Calculate the center of latitude and longitude for each element in a 1° x 1° grid (for comparison purposes)
    centers_lon2 = np.linspace(min_lon + 0.5, max_lon - 0.5, num=int(max_lon - min_lon))  # 1° grid
    centers_lat2 = np.linspace(min_lat + 0.5, max_lat - 0.5, num=int(max_lat - min_lat))

    aux_list = []
    
    # Calculate the matching FEER coefficient for each element of the 1° x 1° grid
    for i in range(0, len(centers_lat2)):
        # Extract the FEER coefficients for the current latitude and longitude bounds from the 'dados_feer' DataFrame
        df2 = dados_feer.loc[(dados_feer['Latitude'] == centers_lat2[i]) & 
                             (dados_feer['Longitude'] <= max_lon) &
                             (dados_feer['Longitude'] >= min_lon), 'Ce_850'].to_numpy()
        
        # Append the corresponding FEER coefficients to the list
        aux_list = np.append(aux_list, df2)
        
        if i == 0:
            # Create a coordinate pair matrix for the 1° x 1° grid (latitudes and longitudes)
            aux2 = np.repeat(centers_lat2[i], len(centers_lon2))
            lat_lon_feer2 = np.column_stack((aux2, centers_lon2))
        else:
            aux2 = np.repeat(centers_lat2[i], len(centers_lon2))
            aux2 = np.column_stack((aux2, centers_lon2))
            lat_lon_feer2 = np.vstack((lat_lon_feer2, aux2))
    
    # Stack the matching FEER coefficients into the coordinate matrix for the 1° x 1° grid
    lat_lon_feer2 = np.column_stack((lat_lon_feer2, aux_list))

    # Calculate the FEER coefficient corresponding to each element of the 0.5° x 0.5° grid
    for i in range(0, len(centers_lat)):
        if i == 0:
            aux = np.repeat(centers_lat[i], len(centers_lon))
            lat_lon_feer = np.column_stack((aux, centers_lon))
        else:
            aux = np.repeat(centers_lat[i], len(centers_lon))
            aux2 = np.column_stack((aux, centers_lon))
            lat_lon_feer = np.vstack((lat_lon_feer, aux2))

    # Match the corresponding FEER coefficient to each 0.5° x 0.5° grid element based on proximity
    aux_list_2 = []
    for j in range(0, len(lat_lon_feer)):
        for n in range(0, len(lat_lon_feer2)):
            if ((lat_lon_feer[j, 0] == lat_lon_feer2[n, 0] + 0.25) or (lat_lon_feer[j, 0] == lat_lon_feer2[n, 0] - 0.25)):
                if ((lat_lon_feer[j, 1] == lat_lon_feer2[n, 1] + 0.25) or (lat_lon_feer[j, 1] == lat_lon_feer2[n, 1] - 0.25)):
                    # Append the FEER coefficient to the auxiliary list for the matching grid
                    aux_list_2 = np.append(aux_list_2, lat_lon_feer2[n, 2])
    
    # Stack the 0.5° x 0.5° grid coordinates and their corresponding FEER coefficients
    lat_lon_feer = np.column_stack((lat_lon_feer, aux_list_2))
    matrix = lat_lon_feer  # Final matrix of latitudes, longitudes, and FEER coefficients

    # Create a mask for valid values within the entire ROI (used to select corresponding data from Full disk matrix)
    I = np.where((rlat >= min_lat) & (rlat <= max_lat) & (rlon >= min_lon) & (rlon <= max_lon))
    index_list = []
    index_list.insert(0, I)

    # Repeat the process for each element of the 0.5° x 0.5° grid to find corresponding indices in the satellite data
    for k in range(0, len(matrix)):
        aux1 = np.where((rlat >= matrix[k, 0] - 0.25) & (rlat <= matrix[k, 0] + 0.25) & 
                        (rlon >= matrix[k, 1] - 0.25) & (rlon <= matrix[k, 1] + 0.25))
        index_list.insert(k + 1, aux1)

    return index_list, matrix  # Return the list of indices and the final grid matrix

In [None]:
# Function to collect and save in a list the files that are in the informed period of interest
def get_files(s_year, e_year, s_day, e_day, s_hour, e_hour):
    print('Getting file names')
    aux = []  # List to store the file paths
    
    # Iterate over the years from start to end year (inclusive)
    for y in range(s_year, e_year + 1):
        # Iterate over the days of the year
        for d in range(s_day, e_day + 1):
            # Iterate over the hours of the day
            for j in range(s_hour, e_hour):
                # Build the directory path based on year, day (zero-padded), and hour (zero-padded)
                FD = fs.ls('noaa-goes16/ABI-L2-FDCF/' + str(y) + '/' + str(d).zfill(3) + '/' + str(j).zfill(2) + '/')
                # Append the list of files found in that directory to the aux list
                aux = np.append(aux, FD)

    # Return the collected list of file paths
    return aux

In [None]:
#Main function. It will access the files and collect the products: FRP and Temperature
#It will calculate the emission estimates using the FEER coefficients and the FRP
#And save the spatial, temporal information and the results in a CSV file
def process_data_v6(rlat,rlon,files,matrix,indexes):

    #Open the directory to save the csv file
    os.chdir(datadir)
    #Loop through the files selected previously
    for i in range(0,len(files)):
      with fs.open(files[i], 'rb') as f:
        #Open the files. The ds variable contain all the information inside the netCDF files
        ds = xr.open_dataset(BytesIO(f.read()), engine='h5netcdf')
        try:
            #Gather the date-time information from the file
            prodbase = files[i].split('/')[5][:23]
            starttime=files[i].split(prodbase)[1].split('_')[0]
            year,julian,hhmm=starttime[:4],starttime[4:7],starttime[7:11]
            plottitle=year+','+julian+','+hhmm
            fpart = starttime+','+plottitle
            print('Processing year: {}, day: {}, hour: {}'.format(year,julian,hhmm)) #Processing Message
            #the variable code is to convert the hours and minutes information in percentages of the day to facilitate future plots
            code = int(julian)+(int(hhmm)/100)/24+(int(hhmm) % 100)/60/24

            #####################################################################
            P = np.array(ds.Power) #FRP Matrix data
            T = np.array(ds.Temp)  #Temperature Matrix data
            #A = np.array(ds.Area) #Burned area Matrix data

            #Use the indexes matrix to collect the data only for the area of each element of the 0.5x0.5 grid
            P_box_amazon = P[indexes[1]]
            T_box_amazon = T[indexes[1]]

            #The resulting elements using the index matrix does not have to main its matrix structure
            #So the valid elements are transformed into an array to save processing demand
            array_P_box_amazon = P_box_amazon[~np.isnan(P_box_amazon)]
            array_T_box_amazon = T_box_amazon[~np.isnan(T_box_amazon)]


            lat = matrix[0,0] #latitude of the element of the 0.5x0.5 grid
            lon = matrix[0,1] #longitute of the element of the 0.5x0.5 grid
            sum_frp = np.sum(array_P_box_amazon) #Total FRP on the grid element
            N_frp = len(array_P_box_amazon) #Number of FRP data on the grid element
            FRE = np.sum(array_P_box_amazon*600) #Total FRE
            RE = np.sum(array_P_box_amazon*matrix[0,2]) #Multipling the FRP with the FEER emission coefficient results on the RATE of Emission of TPM
            ME = np.sum(array_P_box_amazon*matrix[0,2]*600) #Multipling the FRE with the FEER emission coefficient results on the MASS of Emission of TPM
            #And using the correlation with FEER coefficients and the tropical forest Factors of emission
            #Calculate the estimates of emission of C02,CO and CH4
            MCO2 = np.sum(array_P_box_amazon*Ce_CO2*matrix[0,2]*600)
            MCO = np.sum(array_P_box_amazon*Ce_CO*matrix[0,2]*600)
            MCH4 = np.sum(array_P_box_amazon*Ce_CH4*matrix[0,2]*600)

            #Using the uncertanties of the Factors of emission calculate the uncertanties for the estimate of emission
            s_RCO2 = array_P_box_amazon*sigma_Ce_CO2*matrix[0,2]
            s_RCO = array_P_box_amazon*sigma_Ce_CO*matrix[0,2]
            s_RCH4 = array_P_box_amazon*sigma_Ce_CH4*matrix[0,2]
            s_MCO2 = (s_RCO2*600)**2
            s_MCO = (s_RCO*600)**2
            s_MCH4 = (s_RCH4*600)**2
            s_sum_me_CO2 = np.sqrt(np.sum(s_MCO2))
            s_sum_me_CO = np.sqrt(np.sum(s_MCO))
            s_sum_me_CH4 = np.sqrt(np.sum(s_MCH4))

            #Calculate the means of FRP and Temperature ignoring the warnings when there is NaN values
            with warnings.catch_warnings():
                warnings.simplefilter("ignore", category=RuntimeWarning)
                mean_temp = np.nanmean(array_T_box_amazon)
                mean_frp = np.nanmean(array_P_box_amazon)
            #Set the unvalid results as -9999
            if np.isnan(mean_temp):
                mean_temp = -9999
            if np.isnan(mean_frp):
                mean_frp = -9999


            results = str(code) +','+ str(lat)+','+ str(lon)+','+ str(sum_frp)+','+ str(N_frp)+','\
                +str(FRE) +','+ str(RE)+','+ str(ME)+','+ str(MCO2)+','+ str(s_sum_me_CO2)+','\
                +str(MCO) +','+ str(s_sum_me_CO)+','+ str(MCH4)+','+ str(s_sum_me_CH4)+','+ str(mean_frp)+','+str(mean_temp)
            #Compose results in a string and save them
            outstring = fpart+','+results + '\n'
            outfn.writelines(outstring)

            #Loop through all the 0.5x0.5 grid elements repeating the process
            for k in range(1,len(matrix)):

                P_box_amazon = P[indexes[k+1]]
                array_P_box_amazon = P_box_amazon[~np.isnan(P_box_amazon)]
                T_box_amazon = T[indexes[k+1]]
                array_T_box_amazon = T_box_amazon[~np.isnan(T_box_amazon)]

                lat = matrix[k,0]
                lon = matrix[k,1]

                sum_frp = np.sum(array_P_box_amazon)
                N_frp = len(array_P_box_amazon)

                with warnings.catch_warnings():
                    warnings.simplefilter("ignore", category=RuntimeWarning)
                    mean_temp = np.nanmean(array_T_box_amazon)
                    mean_frp = np.nanmean(array_P_box_amazon)
                if np.isnan(mean_temp):
                    mean_temp = -9999
                if np.isnan(mean_frp):
                    mean_frp = -9999

                FRE = np.sum(array_P_box_amazon*600)
                RE = np.sum(array_P_box_amazon*matrix[k,2])
                ME = np.sum(array_P_box_amazon*matrix[k,2]*600)
                MCO2 = np.sum(array_P_box_amazon*Ce_CO2*matrix[k,2]*600)
                MCO = np.sum(array_P_box_amazon*Ce_CO*matrix[k,2]*600)
                MCH4 = np.sum(array_P_box_amazon*Ce_CH4*matrix[k,2]*600)


                s_RCO2 = array_P_box_amazon*sigma_Ce_CO2*matrix[k,2]
                s_RCO = array_P_box_amazon*sigma_Ce_CO*matrix[k,2]
                s_RCH4 = array_P_box_amazon*sigma_Ce_CH4*matrix[k,2]
                s_MCO2 = (s_RCO2*600)**2
                s_MCO = (s_RCO*600)**2
                s_MCH4 = (s_RCH4*600)**2
                s_sum_me_CO2 = np.sqrt(np.sum(s_MCO2))
                s_sum_me_CO = np.sqrt(np.sum(s_MCO))
                s_sum_me_CH4 = np.sqrt(np.sum(s_MCH4))

                # results = str(code)+','+ str(lat)+','+ str(lon)+','+ str(sum_frp)+','+ str(N_frp)
                results = str(code) +','+ str(lat)+','+ str(lon)+','+ str(sum_frp)+','+ str(N_frp)+','\
                    +str(FRE) +','+ str(RE)+','+ str(ME)+','+ str(MCO2)+','+ str(s_sum_me_CO2)+','\
                    +str(MCO) +','+ str(s_sum_me_CO)+','+ str(MCH4)+','+ str(s_sum_me_CH4)+','+ str(mean_frp)+','+str(mean_temp)
                #Compose results in a string and save them
                outstring = fpart+','+results + '\n'
                outfn.writelines(outstring) #Write the results on the CSV file

        #Catch a excepion and just warning
        except OSError as error:
             print(error)
    #Close the file
    outfn.close()
    return print('Done')

In [None]:
###############################################################################
#Starting message
print('Compiling statistics')

rlat,rlon = get_lat_lon(fs)

Indexes,M = get_indexes_v3(minlon,maxlon,minlat,maxlat,rlat,rlon,feer_data)
print('Got indexes and matrix')

start_year,end_year,start_day,end_day,start_hour,end_our = Year,Year,150,151,15,16
data_list = get_files(start_year,end_year,start_day,end_day,start_hour,end_our)
print('Data listed')

print('Starting process data')
process_data_v6(rlat,rlon,data_list,M,Indexes)