## Comparison of reservoir emissions in Myanmar calculated explicitly with GeoCARET and ReEmission, and the estimates obtained by extrapolating and binning measured emissions using latitude and climate zone parameterizations

### Author:
##### Tomasz Janus, University of Manchester, tomasz.janus@manchester.ac.uk; tomasz.k.janus@gmail.com
##### Date: 30-10-24

### Data Sources:
We use two sets of data representing gridded areal CO$_2$ and CH$_4$ emission estimates:
1) The dataset created by Harrison et al. 2021 [[1](https://doi.org/10.1029/2020GB006888)]
2) The dataset prepared by Soued et al. 2022 [[2](https://doi.org/10.1038/s41561-022-01004-2)]. This dataset shows emission factors vs. climatic zones. It was shared as a table in Supplementary Information and is an extension of emission factors proposed in the 2019 refinement to the 2006 IPCC emission guidelines, that includes additional degassing and ebullition CH$_4$ emission pathways.

### Notes on Data Preprocessing and Algorithms

#### Estimation of reservoir emissions from gridded data

We calculated the net anthropogenic contribution of reservoirs to emissions from the gridded estimates for CO$_2$ and CH$_4$ emissions respectively using equation 1 from Almeida et al. 2019 [[3](https://doi.org/10.1038/s41467-019-12179-5)], presented below:

**Eq. 1** : ${\mathrm{TE}}_{{\mathrm{reservoir}}} = A_{{\mathrm{reservoir}}} \times \left( {{\mathrm{net}}_{{\mathrm{CO}}_{2}} \times F_{{\mathrm{CO}}_{\mathrm{2}}} + {\mathrm{net}}_{{\mathrm{CH}}_{4}} \times F_{{\mathrm{CH}}_{\mathrm{4}}} \times {\mathrm{GWP}}_{{\mathrm{CH}}_{4}}} \right) \times \left( {1 + R_{{\mathrm{downstream}}}} \right)$

where $A_{{\mathrm{reservoir}}}$ is the reservoir area and $F_{{\mathrm{CO}}_{\mathrm{2}}}$ and $F_{{\mathrm{CH}}_{\mathrm{4}}}$ are the areal CO$_2$ and CH$_4$ emissions (fluxes), respectively. ${\mathrm{GWP}}_{{\mathrm{CH}}_{4}}$ is a conversion factor for the global warming potential of CH$_4$ relative to CO$_2$ over the corresponding time horizon (e.g. 20 or 100 years). We have assumed the time horizon of 100 years. The parameters $\mathrm{net}_{\mathrm{CO}_2}$ , $\mathrm{net}_{\mathrm{CH}_4}$ and $R_\mathrm{downstream}$ are introduced by Almeida et al. 2019 [[3](https://doi.org/10.1038/s41467-019-12179-5)] to account only for the true net (anthropogenic) change in GHG emissions associated with reservoir creation from the gross emissions data that account for both the anthropogenic and non-anthropogenic emissions. These coefficients model the discount factors for CO$_2$ and CH$_4$ emissions that determine the fraction of net anthropogenic emissions in the total emissions of CO$_2$ and CH$_4$, respectively, and the ratio of downstream emissions to reservoir-surface emissions. We adopted the original values presented in Almeida et al. 2019 [[3](https://doi.org/10.1038/s41467-019-12179-5)], i.e. $\mathrm{net}_{\mathrm{CO}_2} = 0.25$, $\mathrm{net}_{\mathrm{CH}_4} = 0.90$, and $R_\mathrm{downstream} = 0.17$. Additionally, we fitted new values using linear regression to obtain the best fit between emission predictions from G-res (facilitated via the ReEmission software) and the factors per climatic zone of Soued et al. 

#### IPCC Guidelines

IPCC provides main guidance on reservoir GHG emissions in the Chapter 7, Volume 4 of the 2019 Refinement for a more complete inventory of greenhouse gas emissions from managed lands, ([Lovelock et al. 2019](https://www.ipcc-nggip.iges.or.jp/public/2019rf/pdf/4_Volume4/19R_V4_Ch07_Wetlands.pdf))

### Purpose of the Notebook
The purpose of this notebook is to calculate and visualise differences in estimates of emissions of individual reservoirs as well as total country-wide emissions from reservoirs from the two approaches: (a) direct calculation using G-res emission model for each individual reservoir, and (b) using interpolated values from a limited number of known emissions from reservoirs in different locations across the world.

### References:
* [1] Harrison, J. A., Prairie, Y. T., Mercier-Blais, S., & Soued, C. (2021). Year-2020 global distribution and pathways of reservoir methane and carbon dioxide emissions according to the greenhouse gas from reservoirs (G-res) model. Global Biogeochemical Cycles, 35, e2020GB006888. https://doi.org/10.1029/2020GB006888 
* [2] Soued, C., Harrison, J.A., Mercier-Blais, S. et al. Reservoir CO2 and CH4 emissions and their climate impact over the period 1900–2060. Nat. Geosci. 15, 700–705 (2022). https://doi.org/10.1038/s41561-022-01004-2
* [3] Almeida, R.M., Shi, Q., Gomes-Selman, J.M. et al. Reducing greenhouse gas emissions of Amazon hydropower with strategic dam planning. Nat Commun 10, 4281 (2019). https://doi.org/10.1038/s41467-019-12179-5
* [4] Carlino, A., Schmitt, R., Clark, A. et al. Rethinking energy planning to mitigate the impacts of African hydropower. Nat Sustain 7, 879–890 (2024). https://doi.org/10.1038/s41893-024-01367-x
* [5] Bridget R. Deemer, John A. Harrison, Siyue Li, Jake J. Beaulieu, Tonya DelSontro, Nathan Barros, José F. Bezerra-Neto, Stephen M. Powers, Marco A. dos Santos, J. Arie Vonk, Greenhouse Gas Emissions from Reservoir Water Surfaces: A New Global Synthesis, BioScience, Volume 66, Issue 11, 1 November 2016, Pages 949–964, https://doi.org/10.1093/biosci/biw117
* [6] Catherine Ellen Lovelock, Christopher Evans, Nathan Barros, Yves Prairie, Jukka Alm, David Bastviken, Jake J. Beaulieu, Michelle Garneau, Atle Harby, John Harrison, David Pare, Hanne Lerche Raadal, Bradford Sherman, Chengyi Zhang, Stephen Michael Ogle, 2019 Refinement to the 2006 IPCC Guidelines for National Greenhouse Gas Inventories, Volume 4: Agriculture, Forestry and Other Land Use, Chapter 7 Wetlands, https://www.ipcc-nggip.iges.or.jp/public/2019rf/pdf/4_Volume4/19R_V4_Ch07_Wetlands.pdf
* [7] Carly Hansen, Rachel Pilla, Paul Matson, Bailey Skinner, Natalie Griffiths, & Henriette Jager (2023). Variability in modelled reservoir greenhouse gas emissions: comparison of select US hydropower reservoirs against global estimates. Environmental Research Communications, 4(12), 121008. https://iopscience.iop.org/article/10.1088/2515-7620/acae24#ercacae24s6
* [8] Bridget R. Deemer, John A. Harrison, Siyue Li, Jake J. Beaulieu, Tonya DelSontro, Nathan Barros, José F. Bezerra-Neto, Stephen M. Powers, Marco A. dos Santos, J. Arie Vonk, Greenhouse Gas Emissions from Reservoir Water Surfaces: A New Global Synthesis, BioScience, Volume 66, Issue 11, 1 November 2016, Pages 949–964, https://doi.org/10.1093/biosci/biw117
* [9]  Tangi, M., Schmitt, R., Almeida, R., Bossi, S., Flecker, A., Sala, F., & Castelletti, A. (2024). Robust hydropower planning balances energy generation, carbon emissions and sediment connectivity in the Mekong River Basin. Earth's Future, 12, e2023EF003647. https://doi.org/10.1029/2023EF003647 

### Notes on adoption of GHG fluxes in most recent planning studies

1. Almeida et al. 2019 [[3](https://doi.org/10.1038/s41467-019-12179-5)] used the gross flux data from one of the earlier global reservoir emission synthesis by Deemer et al. [[5](https://doi.org/10.1093/biosci/biw117)].
2. Carlino et al. 2024 [[4](https://doi.org/10.1038/s41893-024-01367-x)] analyzed the gridded estimates from Harrison et al. 2021 [[1](https://doi.org/10.1029/2020GB006888)] that are parameterized vs. latitude and the parameterization of Soued et al. 2022 [[2](https://doi.org/10.1038/s41561-022-01004-2)] who parameterized the emissions vs. climatic zone. They processed both datasets using Eq. 1 adopted after Almeida et al. 2019 [[3](https://doi.org/10.1038/s41467-019-12179-5)]. Nevetheless, since only a few points in the dataset of [[1](https://doi.org/10.1029/2020GB006888)] were available for Africa (which was the geographical context of their study), and most of the estimates at these latitudes were from Southeast Asia or Latin America, the authors based their results solely on the greenhouse gas emission estimations of Soued et al. (2022) [[2](https://doi.org/10.1038/s41561-022-01004-2)].

### Notes on selected cited studies

#### * Harrison et al.
The data with the estimated CO$_2$ and CH$_4$ emission fluxes per given latitude has been shared in this [Zenodo link](https://zenodo.org/records/4632428/files/G-res_Output_x_Deg_Latitude_20210423.xlsx?download=1). The authors also present data globally on a 1degx1deg grid, albeit it has not been clearly explained what data each dataset contains. All three datasets in `.txt` and `.xlsx` formats are contained in this [Zenodo link](https://zenodo.org/records/4632428) and the 1$^o$ x 1$^o$ .


#### * Soued et al.

#### * Hansen et al.
The authors argue that hydropower reservoirs, based on an example from the U.S.A. span a wide range of climate regions and have diverse design and operational characteristics compared to those most heavily represented in model literature (i.e., large, tropical reservoirs). It is therefore not clear whether estimates based on measurements and modeling of other subsets of reservoirs describe those diverse types of hydropower reservoirs. The authors then go on to compare the emission estimates from extrapolations from other measured subsets of reservoirs against direct modelling using G-res. The authors observed that the net GHG reservoir footprint calculated with G-res was less variable and towards the lower end of the range observed from modeling larger global reservoirs, with a range of 138 to 1,052 g CO2 eq m−2 y−1, while the global study reported a range of 115 to 145,472 g CO2 eq m−2 y−1. The authods concluded that due to high variation in emissions normalized with respect to area and generation, we need to be cautious when using area or generation in predicting or communicating emissions footprints for reservoirs relative to those of other energy sources, e.g. in hydropower planning studies. The authors compared their estimates against the published data from [Deemer et al. 2016](https://doi.org/10.1093/biosci/biw117).and the 1$o$ x 1$^o$ gridded estimates of total GHG emissions across all emission pathways [Harrison et al. 2021](https://doi.org/10.1029/2020GB006888). The authors concluded with a good overview of limitations of G-res, particularly with respect to dynamically changing inputs, e.g. due to climate change conditions - see the notes in [Hansen_notes_on_limitations.txt](Hansen_notes_on_limitations.txt)

#### * Tangi et al. 

Tangi et al formulated a robust optimization problem for hydropower planning using GHG emisisons as a novel objective. The authors employed a two-step optimal planning with three objectives : HP generation, sediment transport, and GHG emissions using a multiobjective evolutionary algorithm (MOEA) for obtaining non-dominated (Pareto-optimal) portfolios of dams. The approach is two step: (1) Initial optimization using central estimates of all three objectives producing Pareto-optimal sequences of dams and (2) Portfolio robustness step via screening of solutions to include only robust solutions. 

#### * Carlino et al.



### Notes on limitations of our approach (for the manuscript)

1. We cannot model cascading systems because our approach does not explicitly account for reductions or inputs to GHG emission processes that might be occurring in upstream reservoirs (Prairie et al 2017)


### TODOS:
Write down two research questions for **Main**

In [None]:
# Package imports
from typing import Tuple, List, Dict
import pathlib
import os
import numpy as np
import pandas as pd
from geopy.distance import geodesic
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager
import matplotlib
import seaborn as sns
sns.set(style='ticks', context='talk')

from sklearn.linear_model import LinearRegression

matplotlib.rcParams['pdf.fonttype'] = 3
matplotlib.rcParams['ps.fonttype'] = 42

import warnings
warnings. filterwarnings('ignore', category=UserWarning)

In [None]:
# Dataset imports
harrison_fluxes = pathlib.Path('inputs/gridded_emissions_datasets/harrison2021/G-res_Output_x_Deg_Latitude_20210423.xlsx')
climate_zones_soued = pathlib.Path('inputs/gridded_emissions_datasets/soued2022/1976-2000_ASCII.txt')
soued_emission_factors = pathlib.Path('inputs/gridded_emissions_datasets/soued2022/soued2022.csv')
reemission_file = pathlib.Path('outputs/reemission/outputs_MIN_LOW_PRIM.xlsx')

# Define and create the folder for storing the output figures
fig_folder = pathlib.Path("figures/emission_comparison")
os.makedirs(fig_folder, exist_ok=True)
# Define and create the folder for storing the output files (tabular data)
outfile_folder = pathlib.Path("outputs/emissions_comparison")
os.makedirs(outfile_folder, exist_ok=True)
save_figure_global: bool = True

# Parameters from Eq. 1 - used to disentangle anthropogenic and natural reservoir emissions
R_downstream = 0.17
net_CO2 = 0.25
net_CH4 = 0.90
# Mapping between Koppen-Geiger climate classifications (detailed) and Climatic Zones (coarse)
kopgei_to_climate_cfg = {
    ('Af','Am','As','Aw'): 'Tropical wet moist',
    ('BWh','BSh'): 'Tropical dry montane',
    ('Cfa','Cfb','Cfc','Dfa'): 'Warm temperate moist',
    ('Csa','Csb','Csc','Cwa','Cwb','Cwc'): 'Warm temperate dry',
    ('BWk','BSk','Dfb','Dsa','Dsb','Dwa','Dwb'): 'Cool temperate moist dry',
    ('EF','ET','Dwc','Dwd','Dfc','Dfd','Dsc','Dsd') : 'Polar moist boreal dry moist'
}

In [None]:
# Custom functions

# A function decorator that skips execution of a function. Used to switch execution of functions off without removing code
def skip_execution(reason="Function skipped"):
    def decorator(func):
        def wrapper(*args, **kwargs):
            print(f"{func.__name__} skipped: {reason}")
            # Optionally, you could return None or a custom value if needed
            return None
        return wrapper
    return decorator

def load_climate_zone_soued(filename: str = climate_zones_soued) -> pd.DataFrame:
    """
    The code loads text data with geodesic coordinates and Koppen-Geiger climate classification codes
    Note: Borrowed from Carlino et al. 
    """
    kopgei = pd.read_csv(climate_zones_soued, header=1)
    count = 0
    for col in kopgei.columns[0].split():
    	kopgei[col] = [x.split()[count] for x in kopgei[kopgei.columns[0]].values]
    	if count < 2:
    			kopgei[col] = pd.to_numeric(kopgei[col])
    	count += 1
    kopgei = kopgei.drop([kopgei.columns[0]], axis=1)
    return kopgei

def expand_kopgei_climate_mapping(kopgei_climate_map: Dict[Tuple[str, ...], str]) -> Dict[str, str]:
    """ """
    return {item: value for key_tuple, value in kopgei_climate_map.items() for item in key_tuple}

def find_nearest_kopgei_classification(df: pd.DataFrame, lat: float, lon: float, eps: float = 0.25) -> str:
    """Finds the Koppen Geiger classification based on latitude and longitude of a point"""
    # Find rows of data near the location (fast pre-screening)
    df_trimmed = df.loc[
    			(abs(df['Lat'].values - lat) <= eps) & \
    			(abs(df['Lon'].values - lon) <= eps)]
    # Calculate geodesic distance for each point in the dataframe (slow on large dataframes)
    distances = df_trimmed.apply(lambda row: geodesic((lat, lon), (row['Lat'], row['Lon'])).kilometers, axis=1)
    # Find the index of the minimum distance
    nearest_index = distances.idxmin()
    # Return the classification of the closest point
    return df.loc[nearest_index, 'Cls']

def add_emission_factors(em_factors: pd.DataFrame, koppen_geiger: str): 
    czone = kopgei_to_czone[koppen_geiger]
    data_dict = em_factors[em_factors['Climate']==czone].iloc[0].to_dict()
    return data_dict

# Error functions
def _abs_err(x_1, x_2) -> float | np.float64:
    return np.abs(x_1-x_2)

def _rel_err(x_1, x_2) -> float | np.float64:
    return (x_2 - x_1) / x_1

def _rel_abs_err(x_1, x_2) -> float | np.float64:
    return _abs_err(x_1, x_2) / x_1

# Mean Absolute Error (MAE)
def mae(x_1, x_2) -> float | np.float64:
    return _abs_err(x_1, x_2).mean()

# Root Mean Squared Error (RMSE)
def rmse(x_1, x_2) -> float | np.float64:
    return np.sqrt((_abs_err(x_1, x_2)**2).mean())

# Mean Absolute Percentage Error
def mape(x_1, x_2) -> float | np.float64:
    return _rel_abs_err(x_1, x_2).mean() * 100

# Relative Mean Absolute Error
def rmae(x_1, x_2) -> float | np.float64:
    return _rel_abs_err(x_1, x_2).mean()
    
# Relative Root Mean Squared Error (RRMSE)
def rrmse(x_1, x_2) -> float | np.float64:
    return np.sqrt((_rel_abs_err(x_1, x_2) ** 2).mean())

# Symmetric Mean Absolute Percentage Error (SMAPE)
def smape(x_1, x_2) -> float | np.float64:
    return (_abs_err(x_1, x_2) / ((np.abs(x_1) + np.abs(x_2)) / 2)).mean() * 100

def r_squared(x_1, x_2) -> np.float64:
    # Convert inputs to numpy arrays for calculation
    x_1 = np.array(x_1)
    x_2 = np.array(x_2)
    # Calculate mean of full model outputs
    x_1_mean = np.mean(x_1)
    # Calculate SSE (Sum of Squared Errors) and SST (Total Sum of Squares)
    sse = np.sum((x_1 - x_2) ** 2)
    sst = np.sum((x_1 -x_1_mean) ** 2)
    # Calculate R-squared
    r2 = 1 - (sse / sst)
    return r2

def report_errors(x_1, x_2) -> None:
    print(f"Mean Absolute Error (MAE): {mae(x_1, x_2)}")
    print(f"Coefficient of determination (R-squared): {r_squared(x_1, x_2)}")
    print(f"Mean Absolute Percentage Error (MAPE): {mape(x_1, x_2)}")
    print(f"Root Mean Squared Error (RMSE): {rmse(x_1, x_2)}")
    print(f"Relative Mean Absolute Error (RMAE): {rmae(x_1, x_2)}")
    print(f"Relative Root Mean Squared Error (RRMSE): {rrmse(x_1, x_2)}")
    print(f"Symmetric Mean Absolute Percentage Error (SMAPE): {smape(x_1, x_2)}")

## Preprocess the gridded emission datasets

### * Parameterization on the basis of latitude by Harrison et al. (2021)
- **NOTE:** We are not using those as they the data is for a single point in time (year 2020) whilst we are looking for average emissions over the lifetime of a reservoir (e.g. 100 years)

In [None]:
# Read GHG fluxes from Harrison et al. (2021)
GHGFluxes = (pd.read_excel(harrison_fluxes).
                assign(
                    CO2_net=lambda x: x['avg.CO2.diff.g.m2.y.CO2.eq.x.lat'] * net_CO2 * (1 + R_downstream),
                    CH4_net=lambda x: x['avg.CH4.tot.g.m2.y.CO2.eq.x.lat'] * net_CH4 * (1 + R_downstream),
                    tot_em_net=lambda x: x['CO2_net'] + x['CH4_net']).
                 drop([
                     'sum.CH4.tot.Mg.CO2.eq.x.lat', 'sum.CH4.deg.Mg.CO2.eq.x.lat', 'sum.CH4.eb.Mg.CO2.eq.x.lat',
                     'sum.CH4.diff.Mg.CO2.eq.x.lat', 'sum.CO2.diff.Mg.CO2.eq.x.lat', 'sum.C.gas.tot.Mg.CO2.eq.x.lat'],
                     axis=1)
            )
GHGFluxes.head()

### * Parameterization on the basis of climatic zones by Soued et al. (2022)
* **NOTE**: Related to the refined emission factors from the 2019 IPCC Guidelines

In [None]:
# Calculate areal emissions from climate zone parameterization included in Soued et al. 2022; 
# The code was adapted from Carlino et al.
# Read Koppen-Geiger climate zone classification codes per location
kopgei = load_climate_zone_soued(filename = climate_zones_soued)
# Get a list of all Koppen-Geiger classifications
kopgeizone = kopgei['Cls'].unique().tolist()
# Load emission factors per climatic zones
soued2022 = pd.read_csv(soued_emission_factors).dropna()
czone = soued2022['Climate'].tolist()
# Create a mapping betwee Koppen-Geiger codes and coarser climate zone classifications used for 
kopgei_to_czone = expand_kopgei_climate_mapping(kopgei_to_climate_cfg)

In [None]:
soued2022.head(10)

## Load and process emission outputs calculated with ReEmission

* **NOTE:** `res_area` and `catch_area` are given in km$^2$

In [None]:
input_columns_to_keep = [
    'Name', 'coordinates_0', 'coordinates_1', 'id', 'type', 'catch_area', 
    'res_area', 'res_volume']
reemission_inputs = pd.read_excel(reemission_file, sheet_name="inputs", usecols=input_columns_to_keep)
output_columns_to_keep = ['Name', 'co2_diffusion', 'ch4_diffusion', 'ch4_ebullition', 'ch4_degassing', 'co2_net', 'ch4_net']
reemission_outputs = pd.read_excel(reemission_file, sheet_name="outputs", usecols=output_columns_to_keep)
# Join both dataframes on Name
reemission_df = pd.merge(reemission_inputs, reemission_outputs, on='Name', how='inner')
# Double check that we have not lost any rows
assert len(reemission_df) == len(reemission_inputs) == len(reemission_outputs)
# Add climate classification value for each row
reemission_df['Cls'] = reemission_df.apply(
    lambda row: find_nearest_kopgei_classification(df = kopgei, lat=row['coordinates_0'], lon=row['coordinates_1']), axis=1)
# Apply the function to create a DataFrame from the list returned by add_blank
soued_em = reemission_df.apply(lambda row: add_emission_factors(soued2022, row['Cls']), axis='columns', result_type='expand')
reemission_df = pd.concat([reemission_df, soued_em], axis=1).\
    rename(columns={
        'CO2 diffusive 20 yrs integrated [gCO2eq/ (m2 yr)]': 'co2_diffusion_soued',
        'CH4 diffusive 20 yrs integrated [gCO2eq/ (m2 yr)]': 'ch4_diffusion_soued',
        'CH4 ebullition [gCO2eq/ (m2 yr)]': 'ch4_ebullition_soued',
        'CH4 degassing [gCO2eq/ (m2 yr)]': 'ch4_degassing_soued'})

In [None]:
reemission_df.head()

In [None]:
reemission_df.describe().drop('count')

In [None]:
# Compare emission intensities
report = (reemission_df.
              describe().
              loc[
                  :, 
                  ['co2_diffusion', 'co2_diffusion_soued', 'ch4_diffusion', 'ch4_diffusion_soued',
                   'ch4_ebullition', 'ch4_ebullition_soued', 'ch4_degassing', 'ch4_degassing_soued']
              ].
              T.drop_duplicates().T.
              drop('count')
         )

In [None]:
report

In [None]:
# Compare total emissions
def total_emission(row, em_column: str, area_column: str, conv_coefficient: float = 1.0) -> None:
    """Calculate total emissions. Given that emission intensities are in gCO2e/m2/year and areas in km2, the
    calculated total emission at `conv_coefficient = 1.0` is in the unit of tonnesCO2e/year."""
    return row[em_column] * row[area_column] * conv_coefficient

In [None]:
coeff = 1e-3
emissions_total = (reemission_df.
    assign(co2=lambda x: x['co2_diffusion']).
    assign(co2_soued=lambda x: x['co2_diffusion_soued']).
    assign(ch4=lambda x: x['ch4_diffusion'] + x['ch4_ebullition'] + x['ch4_degassing']).
    assign(ch4_soued=lambda x: x['ch4_diffusion_soued'] + x['ch4_ebullition_soued'] + x['ch4_degassing_soued']).                   
    assign(co2_net_total=lambda x: total_emission(x, 'co2_net', 'res_area', coeff)).
    assign(ch4_net_total=lambda x: total_emission(x, 'ch4_net', 'res_area', coeff)).
    assign(co2_diffusion_total=lambda x: total_emission(x, 'co2_diffusion', 'res_area', coeff)).
    assign(co2_diffusion_total_soued=lambda x: total_emission(x, 'co2_diffusion_soued', 'res_area', coeff)).
    assign(ch4_diffusion_total=lambda x: total_emission(x, 'ch4_diffusion', 'res_area', coeff)).
    assign(ch4_diffusion_total_soued=lambda x: total_emission(x, 'ch4_diffusion_soued', 'res_area', coeff)).
    assign(ch4_ebullition_total=lambda x: total_emission(x, 'ch4_ebullition', 'res_area', coeff)).
    assign(ch4_ebullition_total_soued=lambda x: total_emission(x, 'ch4_ebullition_soued', 'res_area', coeff)).
    assign(ch4_degassing_total=lambda x: total_emission(x, 'ch4_degassing', 'res_area', coeff)).
    assign(ch4_degassing_total_soued=lambda x: total_emission(x, 'ch4_degassing_soued', 'res_area', coeff)).
    assign(co2_total = lambda x: x['co2_diffusion_total']).
    assign(co2_total_soued = lambda x: x['co2_diffusion_total_soued']).
    assign(ch4_total = lambda x : x['ch4_diffusion_total'] + x['ch4_ebullition_total'] + x['ch4_degassing_total']).
    assign(ch4_total_soued = lambda x : x['ch4_diffusion_total_soued'] + x['ch4_ebullition_total_soued'] + x['ch4_degassing_total_soued']).
    #T.drop([
    #    'co2_diffusion', 'co2_net', 'ch4_diffusion', 'ch4_net', 'ch4_ebullition', 'ch4_degassing',
    #    'co2_diffusion_soued', 'ch4_diffusion_soued', 'ch4_ebullition_soued', 'ch4_degassing_soued']).T.
    loc[:, [
            'co2_diffusion', 'co2_net', 'ch4_diffusion', 'ch4_net', 'ch4_ebullition', 'ch4_degassing',
            'co2_diffusion_soued', 'ch4_diffusion_soued', 'ch4_ebullition_soued', 'ch4_degassing_soued',
            'co2', 'co2_soued', 'ch4', 'ch4_soued',
            'co2_diffusion_total', 'co2_diffusion_total_soued',
            'ch4_diffusion_total', 'ch4_diffusion_total_soued', 
            'ch4_ebullition_total', 'ch4_ebullition_total_soued',
            'ch4_degassing_total', 'ch4_degassing_total_soued',
            'co2_total', 'co2_total_soued',
            'ch4_total', 'ch4_total_soued', 
            'co2_net_total', 'ch4_net_total',
            'type', 'res_area', 'res_volume']]
)
emissions_total = emissions_total.astype({col: 'float' for col in emissions_total.columns if col != 'type'})
emissions_total['type'] = emissions_total['type'].astype('category')

In [None]:
emissions_total.head()

In [None]:
emissions_total['type'].unique()

In [None]:
emissions_total.describe().drop('count')

In [None]:
emissions_total[[
            'co2_diffusion_total', 'co2_diffusion_total_soued',
            'ch4_diffusion_total', 'ch4_diffusion_total_soued', 
            'ch4_ebullition_total', 'ch4_ebullition_total_soued',
            'ch4_degassing_total', 'ch4_degassing_total_soued',
            'co2_total', 'co2_total_soued',
            'ch4_total', 'ch4_total_soued', 
            'co2_net_total', 'ch4_net_total']].sum()

In [None]:
emissions_total[[
    'co2_diffusion', 'co2_diffusion_soued', 'ch4_diffusion', 'ch4_diffusion_soued', 
    'ch4_ebullition', 'ch4_ebullition_soued', 
    'ch4_degassing', 'ch4_degassing_soued',]].mean()

## Make comparison plots

In [None]:
def make_pair_plots_total_emissions(
        df: pd.DataFrame, dxdy1: float = 0.0, dxdy2: float = 300, 
        save_fig: bool = save_figure_global) -> None:
    # Define the pairs of columns to plot
    sns.set_theme(style="ticks")
    sns.set_context("paper", font_scale=1.6) 
    label_fontsize = 15
    tick_fontsize = 15
    pairs = [
        ('co2_diffusion_total', 'co2_diffusion_total_soued'), 
        ('ch4_diffusion_total', 'ch4_diffusion_total_soued'), 
        ('co2_total', 'co2_total_soued'), 
        ('ch4_ebullition_total', 'ch4_ebullition_total_soued'), 
        ('ch4_degassing_total', 'ch4_degassing_total_soued'), 
        ('ch4_total', 'ch4_total_soued')
    ]
    axis_labels = {
        'co2_diffusion_total' : 'CO$_2$ diffusion (model), ktCO$_{2e}$/yr',
        'co2_diffusion_total_soued': 'CO$_2$ diffusion (EF), ktCO$_{2e}$/yr',
        'ch4_diffusion_total': 'CH$_4$ diffusion (model), ktCO$_{2e}$/yr',
        'ch4_diffusion_total_soued': 'CH$_4$ diffusion (EF), ktCO$_{2e}$/yr',
        'ch4_ebullition_total': 'CH$_4$ ebullition (model), ktCO$_{2e}$/yr',
        'ch4_ebullition_total_soued': 'CH$_4$ ebullition (EF), ktCO$_{2e}$/yr',
        'ch4_degassing_total': 'CH$_4$ degassing (model), ktCO$_{2e}$/yr',
        'ch4_degassing_total_soued': 'CH$_4$ degassing (EF), ktCO$_{2e}$/yr',
        'co2_total': 'CO$_2$ emission (model), ktCO$_{2e}$/yr',
        'co2_total_soued': 'CO$_2$ emission (EF), ktCO$_{2e}$/yr',
        'ch4_total': 'CH$_4$ emission (model), ktCO$_{2e}$/yr',
        'ch4_total_soued': 'CH$_4$ emission (EF), ktCO$_{2e}$/yr'}
    # Create the grid for subplots
    fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(15, 10))
    axes = axes.flatten()
    # Generate q-q plots for each pair
    for idx, (col_x, col_y) in enumerate(pairs):
        ax = axes[idx]
        legend_status='auto'
        sns.scatterplot(
            x=col_x, y=col_y, style = 'type', hue='type', size='res_area', ax=ax, data=df,
            markers=['o', 's', '^'], alpha=0.4, sizes=(50, 500),
            **{'edgecolor':'black'})
        ax.set_xscale('log')
        ax.set_yscale('log')
        max_val = max(df[[col_x, col_y]].max())
        min_val = min(df[[col_x, col_y]].min())
        ax.plot([min_val, max_val], [min_val, max_val], 'k--', lw=3, alpha = 0.6)
        # Customize each subplot
        ax.set_xlabel(axis_labels[col_x])
        ax.set_ylabel(axis_labels[col_y])
        #ax.set_xticklabels(ax.get_xticks(), size = tick_fontsize)
        #ax.set_yticklabels(ax.get_yticks(), size = tick_fontsize)
        ax.set_ylim([min_val-dxdy1, max_val+dxdy2])
        ax.set_xlim([min_val-dxdy1, max_val+dxdy2])

    handles, labels = ax.get_legend_handles_labels()
    for handle in handles:
        if handle._label in ('hydroelectric', 'multipurpose', 'irrigation'):
            handle.set_markersize(handle.get_markersize() * 2)
            
    fig.legend(
        handles, [label.replace('type', 'Reservoir Type').replace('res_area', 'Area, km$^2$') for label in labels],
        loc='right', bbox_to_anchor=(1.085, 0.5), 
        scatterpoints=1, frameon=False)

    # Iterate again and remove legend from each subplot
    for idx, _ in enumerate(pairs):
        ax = axes[idx]
        ax.legend().remove()
    plt.tight_layout(rect=[0, 0, 0.95, 0.95])  # Adjust layout to make space for the legend
    fig.suptitle(
        "Comparison of Emissions: Explicit Calculations vs. Emission Factor-Based Estimates", 
        fontsize=18, fontweight='bold')
    if save_fig:
        fig.savefig(fig_folder / 'emission_pathways_comparison.pdf', bbox_inches='tight')
        fig.savefig(fig_folder / 'emission_pathways_comparison.svg', bbox_inches='tight')
    plt.show()
make_pair_plots_total_emissions(df=emissions_total)

In [None]:
print("Columns in emissions_total dataframe")
print("------------------------------------")
print("\n".join(emissions_total.columns))
print("------------------------------------")

In [None]:
@skip_execution("These plots are not used in the publication.")
def make_pair_plots_emission_fluxes(
        df: pd.DataFrame, dxdy1: float = 0.0, dxdy2: float = 300,
        save_fig: bool = save_figure_global) -> None:
    # Define the pairs of columns to plot
    sns.set_theme(style="ticks")
    sns.set_context("paper", font_scale=1.6) 
    label_fontsize = 15
    tick_fontsize = 15
    pairs = [
        ('co2_diffusion', 'co2_diffusion_soued'), 
        ('ch4_diffusion', 'ch4_diffusion_soued'), 
        ('co2', 'co2_soued'), 
        
        ('ch4_ebullition', 'ch4_ebullition_soued'), 
        ('ch4_degassing', 'ch4_degassing_soued'), 
        ('ch4', 'ch4_soued')
    ]
    axis_labels = {
        'co2_diffusion' : 'CO$_2$ diffusion flux (model), gCO$_{2e}$m$^2$/yr',
        'co2_diffusion_soued': 'CO$_2$ diffusion flux (EF), gCO$_{2e}$m$^2$/yr',
        'ch4_diffusion': 'CH$_4$ diffusio fluxn (model), gCO$_{2e}$m$^2$/yr',
        'ch4_diffusion_soued': 'CH$_4$ diffusion flux (EF), gCO$_{2e}$m$^2$/yr',
        'ch4_ebullition': 'CH$_4$ ebullition flux (model), gCO$_{2e}$m$^2$/yr',
        'ch4_ebullition_soued': 'CH$_4$ ebullition flux (EF), gCO$_{2e}$m$^2$/yr',
        'ch4_degassing': 'CH$_4$ degassing flux (model), gCO$_{2e}$m$^2$/yr',
        'ch4_degassing_soued': 'CH$_4$ degassing flux (EF), gCO$_{2e}$m$^2$/yr',
        'co2': 'CO$_2$ emission flux (model), gCO$_{2e}$m$^2$/yr',
        'co2_soued': 'CO$_2$ emission flux (EF), gCO$_{2e}$m$^2$/yr',
        'ch4': 'CH$_4$ emission flux (model), gCO$_{2e}$m$^2$/yr',
        'ch4_soued': 'CH$_4$ emission flux (EF), gCO$_{2e}$m$^2$/yr'}
    # Create the grid for subplots
    fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(15, 10))
    axes = axes.flatten()
    # Generate q-q plots for each pair
    for idx, (col_x, col_y) in enumerate(pairs):
        ax = axes[idx]
        legend_status='auto'
        sns.scatterplot(
            x=col_x, y=col_y, style = 'type', hue='type', size='res_area', ax=ax, data=df,
            markers=['o', 's', '^'], alpha=0.5, sizes=(50, 500),
            **{'edgecolor':'black'})
        #ax.set_xscale('log')
        #ax.set_yscale('log')
        max_val = max(df[[col_x, col_y]].max())
        min_val = min(df[[col_x, col_y]].min())
        ax.plot([min_val, max_val], [min_val, max_val], 'k--', lw=3, alpha = 0.6)
        # Customize each subplot
        ax.set_xlabel(axis_labels[col_x])
        ax.set_ylabel(axis_labels[col_y])
        #ax.set_xticklabels(ax.get_xticks(), size = tick_fontsize)
        #ax.set_yticklabels(ax.get_yticks(), size = tick_fontsize)
        ax.set_ylim([min_val-dxdy1, max_val+dxdy2])
        ax.set_xlim([min_val-dxdy1, max_val+dxdy2])

    handles, labels = ax.get_legend_handles_labels()
    for handle in handles:
        if handle._label in ('hydroelectric', 'multipurpose', 'irrigation'):
            handle.set_markersize(handle.get_markersize() * 2)
            
    fig.legend(
        handles, [label.replace('type', 'Type').replace('res_area', 'Area, km$^2$') for label in labels],
        loc='right', bbox_to_anchor=(1.085, 0.48), 
        scatterpoints=1, frameon=False)

    # Iterate again and remove legend from each subplot
    for idx, _ in enumerate(pairs):
        ax = axes[idx]
        ax.legend().remove()
    
    plt.tight_layout(rect=[0, 0, 0.95, 0.85])  # Adjust layout to make space for the legend
    if save_fig:
        fig.savefig(fig_folder / 'emission_flux_pathways_comparison.svg', bbox_inches='tight')
        fig.savefig(fig_folder / 'emission_flux_pathways_comparison.pdf', bbox_inches='tight')
    plt.show()
make_pair_plots_emission_fluxes(df=emissions_total)

## Calculate and compare net emissions

In [None]:
net_emisions = (emissions_total.
     assign(
         em_net_total = lambda x: x['co2_net_total'] + x['ch4_net_total'],
         co2_net_total_soued=lambda x: x['co2_total_soued'] * net_CO2 * (1 + R_downstream),
         ch4_net_total_soued=lambda x: x['ch4_total_soued'] * net_CH4 * (1 + R_downstream),
         em_net_total_soued=lambda x: x['co2_net_total_soued'] + x['ch4_net_total_soued'])
)

In [None]:
def make_pair_plots_total_emissions(
        df: pd.DataFrame, dxdy1: float = 0.0, dxdy2: float = 300,
        save_fig: bool = save_figure_global) -> None:
    # Define the pairs of columns to plot
    sns.set_theme(style="ticks")
    sns.set_context("paper", font_scale=1.4) 
    label_fontsize = 10
    tick_fontsize = 15
    pairs = [
        ('co2_net_total', 'co2_net_total_soued'), 
        ('ch4_net_total', 'ch4_net_total_soued'), 
        ('em_net_total', 'em_net_total_soued')
    ]
    axis_labels = {
        'co2_net_total' : 'Net total CO$_2$ emission (model), ktCO$_{2e}$/yr',
        'co2_net_total_soued': 'Net total CO$_2$ emission (EF), ktCO$_{2e}$/yr',
        'ch4_net_total': 'Net total CH$_4$ emission (model), ktCO$_{2e}$/yr',
        'ch4_net_total_soued': 'Net total CH$_4$ emission (EF), ktCO$_{2e}$/yr',
        'em_net_total': 'Net total emission (model), ktCO$_{2e}$/yr',
        'em_net_total_soued': 'Net total emission (EF), ktCO$_{2e}$/yr'}
    # Create the grid for subplots
    fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 5.0))
    axes = axes.flatten()
    # Generate q-q plots for each pair
    for idx, (col_x, col_y) in enumerate(pairs):
        ax = axes[idx]
        legend_status='auto'
        sns.scatterplot(
            x=col_x, y=col_y, style = 'type', hue='type', size='res_area', ax=ax, data=df,
            markers=['o', 's', '^'], alpha=0.5, sizes=(50, 500),
            **{'edgecolor':'black'})
        ax.set_xscale('log')
        ax.set_yscale('log')
        max_val = max(df[[col_x, col_y]].max())
        min_val = min(df[[col_x, col_y]].min())
        ax.plot([min_val, max_val], [min_val, max_val], 'k--', lw=3, alpha = 0.6)
        # Customize each subplot
        ax.set_xlabel(axis_labels[col_x])
        ax.set_ylabel(axis_labels[col_y])
        #ax.set_xticklabels(ax.get_xticks(), size = tick_fontsize)
        #ax.set_yticklabels(ax.get_yticks(), size = tick_fontsize)
        ax.set_ylim([min_val-dxdy1, max_val+dxdy2])
        ax.set_xlim([min_val-dxdy1, max_val+dxdy2])

    handles, labels = ax.get_legend_handles_labels()
    for handle in handles:
        if handle._label in ('hydroelectric', 'multipurpose', 'irrigation'):
            handle.set_markersize(handle.get_markersize() * 2)
            
    fig.legend(
        handles, [label.replace('type', 'Type').replace('res_area', 'Area, km$^2$') for label in labels],
        loc='right', bbox_to_anchor=(1.085, 0.48), 
        scatterpoints=1, frameon=False)

    # Iterate again and remove legend from each subplot
    for idx, _ in enumerate(pairs):
        ax = axes[idx]
        ax.legend().remove()
    
    plt.tight_layout(rect=[0, 0, 0.95, 0.90])  # Adjust layout to make space for the legend
    fig.suptitle(
        "Comparison of Net Anthropogenic Emissions: Explicit Calculations vs. Emission Factor Estimates", 
        fontsize=18, fontweight='bold')
    if save_fig:
        fig.savefig(fig_folder / 'total emisssions_comparison.svg', bbox_inches='tight')
        fig.savefig(fig_folder / 'total emisssions_comparison.pdf', bbox_inches='tight')
    plt.show()
make_pair_plots_total_emissions(df=net_emisions)

## Calculate and plot errors between emission pathways and total emissions predicted from the model and using the emission factors (EF)

In [None]:
emissions_total.head()

In [None]:
df = emissions_total
comparison_pairs = [
        ('co2_diffusion', 'co2_diffusion_soued'), 
        ('ch4_diffusion', 'ch4_diffusion_soued'),  
        ('ch4_ebullition', 'ch4_ebullition_soued'), 
        ('ch4_degassing', 'ch4_degassing_soued'), 
        ('co2', 'co2_soued'),
        ('ch4', 'ch4_soued')]
emissions_long = pd.DataFrame({
    'g-res': pd.concat([df[model_value] for model_value, _ in comparison_pairs]),
    'soued': pd.concat([df[ef_value] for _, ef_value in comparison_pairs]),
    'type': pd.concat([df['type'] for _ in comparison_pairs]),
    'category': ['co2_diffusion'] * len(df) + ['ch4_diffusion'] * len(df) + ['ch4_ebullition'] * len(df) + 
                ['ch4_degassing'] * len(df) + ['co2'] * len(df) + ['ch4'] * len(df)
})
# Calculate errors
emissions_long['mape'] = _rel_err(emissions_long['g-res'], emissions_long['soued']) * 100
# Drop "outlier" values above threshold over 20,000
# Drop rows where 'val' > 1000
emissions_long = emissions_long.loc[emissions_long['mape'] <= 4_000]

In [None]:
emissions_long['category'].unique()

In [None]:
emissions_long['mape'].describe()

In [None]:
@skip_execution("This plot is replaced with a multifacet plot for each reservoir type.")
def make_box_plot_mape_all_dams(save_fig: bool = save_figure_global) -> None:
    # Visualise the errors on a box plot
    sns.set_theme(style="ticks")
    sns.set_context("paper", font_scale=1.6)
    # Initialize the figure with a logarithmic x axis
    f, ax = plt.subplots(figsize=(7, 6))
    sns.boxplot(
        emissions_long, y="mape", x="category", hue="category",
        whis=[0, 100], width=.6, palette="vlag"
    )
    sns.stripplot(emissions_long, y="mape", x="category", size=4, color=".3", alpha=0.6, edgecolor='black')
    # Tweak the visual presentation
    ax.xaxis.grid(False)
    ax.set(ylabel='Mean Absolute Percentage Error (MAPE)', xlabel="")
    ax.spines['left'].set_linewidth(0.7)
    ax.spines['bottom'].set_linewidth(0.7)
    
    # Adjust labels
    new_labels = [
        "CO$_2$ diffusion", "CH$_4$ diffusion", "CH$_4$ ebullition", 
        "CH$_4$ degassing", "CO$_2$ total", "CH$_4$ total"]
    ax.set_xticklabels(new_labels, rotation=45, ha="right")
    sns.despine(trim=True, left=False)
    if save_fig:
        f.savefig(fig_folder / 'mape_all_dams.svg', bbox_inches='tight')
        f.savefig(fig_folder / 'mape_all_dams.pdf', bbox_inches='tight')
    plt.show()
make_box_plot_mape_all_dams()

In [None]:
def make_box_plot_mape_multifaceted(save_fig: bool = save_figure_global) -> None:
    # Visualise the errors on a box plot
    titles = {
        "hydroelectric": "Hydroelectric Dams",
        "irrigation": "Irrigation Dams",
        "multipurpose": "Multipurpose Dams"
    }
    sns.set_theme(style="ticks")
    sns.set_context("paper", font_scale=1.8)
    # Initialize the figure with a logarithmic x axis
    g = sns.FacetGrid(emissions_long, col="type", height=6, aspect=1)
    g.map_dataframe(
        sns.boxplot, y="mape", x="category", hue="category", whis=[0, 100],
        width=0.6, palette="vlag"
    )
    g.map_dataframe(
        sns.stripplot, y="mape", x="category", size=4, color=".3", alpha=0.6, edgecolor='black'
    )
    # Adjust labels
    new_labels = [
        "CO$_2$ diffusion", "CH$_4$ diffusion", "CH$_4$ ebullition", 
        "CH$_4$ degassing", "CO$_2$ total", "CH$_4$ total"]
    for ax in g.axes.flat:
        ax.xaxis.grid(False)
        ax.set_xlabel("")
        ax.set_ylabel("Mean Absolute Percentage Error (MAPE)")
        ax.spines['left'].set_linewidth(0.7)
        ax.spines['bottom'].set_linewidth(0.7)
        ax.set_xticklabels(new_labels, rotation=45, ha="right")
    for ax, title in zip(g.axes.flat, g.col_names):
        ax.set_title(titles[title], fontsize=14, fontweight="bold", pad=25)  # Use the custom title dictionary
    plt.tight_layout(rect=[0, 0, 1, 0.93])  # Adjust layout to make space for the legend
    g.fig.suptitle(
        "Relative Errors in Reservoir Emission Predictions: Explicit Calculations vs. Emission Factor Estimates", 
        fontsize=18, fontweight='bold')
    g.add_legend()
    g.despine(trim=True, left=False)
    if save_fig:
        g.fig.savefig(fig_folder / 'mape_all_dams_faceted.svg', bbox_inches='tight')
        g.fig.savefig(fig_folder / 'mape_all_dams_faceted.pdf', bbox_inches='tight')
    plt.show()
make_box_plot_mape_multifaceted()

## Calculate and plot errors between net emissions from the model and the emission factors (EF)

In [None]:
report_errors(net_emisions['co2_net_total'], net_emisions['co2_net_total_soued'])

In [None]:
report_errors(net_emisions['ch4_net_total'], net_emisions['ch4_net_total_soued'])

In [None]:
report_errors(net_emisions['em_net_total'], net_emisions['em_net_total_soued'])

In [None]:
net_emisions['em_net_mape'] = _rel_err(net_emisions['em_net_total'], net_emisions['em_net_total_soued']) * 100

In [None]:
def make_mape_histogram(save_fig: bool = save_figure_global) -> None:
    """ """
    hydro = net_emisions["em_net_mape"].where(net_emisions['type']=='hydroelectric').dropna()
    irr = net_emisions["em_net_mape"].where(net_emisions['type']=='irrigation').dropna()
    multi = net_emisions["em_net_mape"].where(net_emisions['type']=='multipurpose').dropna()
    label_fontsize = 15
    tick_fontsize = 14
    plt.style.use('seaborn-v0_8-ticks')
    f, ax = plt.subplots(figsize=(7, 5.5))
    ax.tick_params(axis='both', which='major', labelsize=tick_fontsize)
    min_val = net_emisions['em_net_mape'].min()
    max_val = net_emisions['em_net_mape'].max()
    bins = np.linspace(int(min_val), int(max_val), 18)
    plt.xlabel('Mean Absolute Percentage Error (MAPE)', fontsize=label_fontsize)
    plt.ylabel('Frequency', fontsize=label_fontsize)
    plt.hist([hydro, irr, multi], label=['hydro', 'irrigation', 'multipurpose'], histtype='bar', bins=bins, edgecolor='black', alpha=0.7)
    plt.legend(loc='upper right', frameon=False, fontsize=14)
    # Remove ticks from top and right side
    ax.tick_params(top=False, right=False)
    f.tight_layout(rect=[0, 0, 1, 0.90])  # Adjust layout to make space for the legend
    f.suptitle(
        "Relative Errors in Net Anthropogenic Emissions:\nExplicit Calculations vs. Emission Factor Estimates",
        fontweight='bold')
    # Decrease the width of the plot border
    for spine in ax.spines.values():
        spine.set_linewidth(0.6)
    if save_fig:
        f.savefig(fig_folder / 'mape_all_dams_histogram.svg', bbox_inches='tight')
        f.savefig(fig_folder / 'mape_all_dams_histogram.pdf', bbox_inches='tight')
    plt.show()
make_mape_histogram()

In [None]:
# Visualise the errors on a box plot
def make_box_plot_mape_net_emissions(save_fig: bool = save_figure_global) -> None:
    """ """
    sns.set_theme(style="ticks")
    sns.set_context("paper", font_scale=1.6)
    # Initialize the figure with a logarithmic x axis
    f, ax = plt.subplots(figsize=(7, 6))
    # Plot the orbital period with horizontal boxes
    sns.boxplot(
        net_emisions, y="em_net_mape", x="type", hue="type",
        whis=[0, 100], width=.6, palette="vlag"
    )
    # Add in points to show each observation
    sns.stripplot(net_emisions, y="em_net_mape", x="type", size=4, color=".3")
    # Tweak the visual presentation
    f.tight_layout(rect=[0, 0, 1, 0.90])  # Adjust layout to make space for the legend
    f.suptitle(
        "Relative Errors in Net Anthropogenic Emissions:\nExplicit Calculations vs. Emission Factor Estimates",
        fontweight='bold')
    ax.xaxis.grid(False)
    ax.set(ylabel='Mean Absolute Percentage Error (MAPE)', xlabel="")
    ax.spines['left'].set_linewidth(0.7)
    ax.spines['bottom'].set_linewidth(0.7)
    sns.despine(trim=True, left=False)
    if save_fig:
        f.savefig(fig_folder / 'mape_all_dams_histogram_net_emissions.svg', bbox_inches='tight')
        f.savefig(fig_folder / 'mape_all_dams_histogram_net_emissions.pdf', bbox_inches='tight')
    plt.show()
make_box_plot_mape_net_emissions()

### Run Linear Regression to calibrate net$_{CO_2}$ and net$_{CH_4}$ parameters that are used to estimate net anthropogenic emissions from total emissions

* **NOTE**: Differences between explicitly calculated emissions and emission factors originate from two sources:
* 1. Emission factors are binned into categories and are therefore constant over an area of interest e.g. latitude band or climatic zone, whilst in reality, emissions vary continuously in response to many continuously changing emission drivers.
  2. Disentanglement of anthropogenic emissions from total, e.g. measured emissions is a function of complex relationships such as pre-impoundment emissions, levels of landuse, impacts of nutrients originating from human activities, whilst in case of applying simple emission factors, a single regression is applied individuall to CO$_2$ and CH$_4$ emissions. This simplifies the analysis but introduces a new source of error.

Here, we fit these regressions to our emission estimates and show that the obtained parameters substantially differ from the coefficient values adopted in previous studies, e.g. Almeida et al. 2019 [[3](https://doi.org/10.1038/s41467-019-12179-5)] and Carlino et al. 2024 [[4](https://doi.org/10.1038/s41893-024-01367-x)] 

In [None]:
# Fit the model
model_co2 = LinearRegression(fit_intercept=False)
model_ch4 = LinearRegression(fit_intercept=False)
# CO2 net emissions
co2_net_total = net_emisions[['co2_net_total']]
co2_net_total_soued = net_emisions[['co2_total_soued']]
# Perform linear regression
model_co2.fit(co2_net_total_soued, co2_net_total)
# Get the slope (m)
m = model_co2.coef_[0]
# Calculate R^2 score
r2_score = model_co2.score(co2_net_total_soued, co2_net_total)
net_CO2_fitted = m / (1+R_downstream)
# Print the result
print("CO2 net emissions")
print("-----------------")
print(f"Calculated net_CO2 coefficient (-): {net_CO2_fitted[0]}, original value: {net_CO2}")
print(f"R_downstream (-): {R_downstream}")
print(f"R2_fit: {r2_score}")
print(f"R2_original: {r_squared(net_emisions['co2_net_total'], net_emisions['co2_net_total_soued'])}")

# CH4 net emissions
ch4_net_total = net_emisions[['ch4_net_total']]
ch4_net_total_soued = net_emisions[['ch4_total_soued']]
# Perform linear regression
model_ch4.fit(ch4_net_total_soued, ch4_net_total)
# Get the slope (m)
m = model_ch4.coef_[0]
# Calculate R^2 score
r2_score = model_ch4.score(ch4_net_total_soued, ch4_net_total)
net_CH4_fitted = m / (1+R_downstream)
# Print the result
print("")
print("CH4 net emissions")
print("-----------------")
print(f"Calculated net_CH4 coefficient (-): {net_CH4_fitted[0]}, original value: {net_CH4}")
print(f"R_downstream (-): {R_downstream}")
print(f"R2_fit: {r_value}")
print(f"R2_original: {r_squared(net_emisions['ch4_net_total'], net_emisions['ch4_net_total_soued'])}")

### Save the outputs to files for further processing

In [None]:
output_data = pd.concat([
    reemission_df[["Name", "coordinates_0", "coordinates_1", "id"]], 
    emissions_total, 
    net_emisions[['em_net_total', 'co2_net_total_soued', 'ch4_net_total_soued', 'em_net_total_soued']]
], axis=1)
output_data.to_csv(outfile_folder / "emissions_comparison.csv", index=False)
output_data.to_excel(outfile_folder / "emissions_comparison.xlsx", index=False)