# Setup

This is a notebook which converts NZIP model outputs into the CB7 Sector Databook.
If you are unfamiliar with Jupyter notebooks, have a look at the following links [1](https://colab.research.google.com/drive/16pBJQePbqkz3QFV54L4NIkOn1kwpuRrj), [2](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html), [3](https://colab.research.google.com/?utm_source=scs-index).

[**Open in colab**](https://colab.research.google.com/github/thecccuk/sector_databook_conversion/blob/main/nzip/nb.ipynb)

If you are running from Colab, you first need to download some files from the [CCC GitHub repo](https://github.com/thecccuk/sector_databook_conversion/tree/main/nzip) and install some packages:

In [None]:
colab = False # change to False if running on your local machine, or True if running on colab

# only run on colab!
if colab:
    get_ipython().system('wget -q https://raw.githubusercontent.com/thecccuk/sector_databook_conversion/main/nzip/nzip.py -O nzip.py')
    get_ipython().system('wget -q https://raw.githubusercontent.com/thecccuk/sector_databook_conversion/main/nzip/nzip_model_sector_map.csv -O nzip_model_sector_map.csv')
    get_ipython().system('wget -q https://raw.githubusercontent.com/thecccuk/sector_databook_conversion/main/nzip/requirements.txt -O requirements.txt')
    get_ipython().system('pip install -q -r requirements.txt')

Next we need to import the nzip module (that we just downloaded) and the pandas module (which is already installed in the colab runtime).

In [None]:
%load_ext autoreload
%autoreload 2

# imports
import pandas as pd
import nzip

# ignore some junk output
import warnings
warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)

### Load data

In the following block we set some parameters for the conversion.

In [None]:
# NZIP sectors are 'Industry' or 'Fuel Supply'
nzip.SECTOR = 'Industry'
nzip.SCENARIO = 'Balanced pathway'

# path to a local NZIP model run. If you are running on colab, this variable will be overwritten when you upload a file in the next code cell
nzip_path = 'N-ZIP-Model_version1_2_AG_updated_19_12_2023.xlsb'

# filename of the csv which maps NZIP sectors to CCC sectors
sector_defs_path = 'nzip_model_sector_map.csv'

# where to save the output file
output_file = f"sd-{nzip.SECTOR.replace(' ', '-')}-test.xlsx"

When running on google colab, we need to upload the NZIP model outputs you want to convert.
Running the following cell will produce a button that will let you upload a file from your local machine.

At the moment this is annoyingly slow, so you may want to try uploading the NZIP model to your google drive and loading it from there (then it only needs to be uploaded once).

In [None]:
if colab:
    from google.colab import files
    uploaded = files.upload()
    assert len(uploaded) == 1, 'You must upload exactly one file, which should be the NZIP model outputs file'
    nzip_path = list(uploaded.keys())[0]

After uploading, we can read the Excel file to a pandas dataframe.
This is also annoyingly slow (2-3 mins) and would be much faster with a CSV.

In [None]:
# load the nzip data and add some columns as intermediate calculations
df = nzip.load_nzip(nzip_path, sector_defs_path, nzip.SECTOR)
df = nzip.add_cols(df.copy())

### Measure level data

The cell below this one contains the configuration for process the different measure level outputs. Each list element is a dictionary with the following keys:

- `timeseries`: the name of the NZIP columns that contain the relevant data, with the year removed
- `variable_name`: the name of the variable in the CB7 Sector Databook
- `variable_unit`: the unit of the variable in the CB7 Sector Databook
- `weight_col`: if specified, the code will look for a NZIP column with this name and use it to weight the timeseries data
- `scale`: if specified, this will apply a fixed scaling factor to the timeseries data

Each dictionary will be processed in turn, and the resulting tables will be appended together.

In [48]:
measure_level_kwargs = [
    # Add total direct and indirect emissions
    {
        "timeseries": "Total direct emissions abated (MtCO2e)",
        "variable_name": "Abatement total direct",
        "variable_unit": "MtCO2e",
    },
    {
        "timeseries": "Total indirect emissions abated (MtCO2e)",
        "variable_name": "Abatement total indirect",
        "variable_unit": "MtCO2e",
    },

    # Add emissions by gas
    {
        "timeseries": "Total direct emissions abated (MtCO2e)",
        "variable_name": "Abatement emissions CO2",
        "variable_unit": "MtCO2",
        "weight_col": "% CARBON Emissions",
    },
    {
        "timeseries": "Total direct emissions abated (MtCO2e)",
        "variable_name": "Abatement emissions CH4",
        "variable_unit": "MtCO2e",
        "weight_col": "% CH4 Emissions",
    },
    {
        "timeseries": "Total direct emissions abated (MtCO2e)",
        "variable_name": "Abatement emissions N2O",
        "variable_unit": "MtCO2e",
        "weight_col": "% N2O Emissions",
    },

    # Add demand
    {
        "timeseries": "Change in electricity use (GWh)",
        "variable_name": "Additional demand electricity",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Change in natural gas use (GWh)",
        "variable_name": "Additional demand gas",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Change in petroleum use (GWh)",
        "variable_name": "Additional demand petroleum",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Change in solid fuel use (GWh)",
        "variable_name": "Additional demand solid fuel",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Change in primary bioenergy use (GWh)",
        "variable_name": "Additional demand final bioenergy",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Change in hydrogen use (GWh)",
        "variable_name": "Additional demand hydrogen",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Change in non bio waste",
        "variable_name": "Additional demand final non-bio waste",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },

    # Add capex and opex
    {
        "timeseries": "capex",
        "variable_name": "Additional capital expenditure",
        "variable_unit": "£m",
    },
    {
        "timeseries": "AM levelised capex (£m)",
        "variable_name": "Additional capital expenditure annualised",
        "variable_unit": "£m",
    },
    {
        "timeseries": "capex low carbon",
        "variable_name": "Total capital expenditure low carbon",
        "variable_unit": "£m",
    },
    {
        "timeseries": "opex",
        "variable_name": "Additional operating expenditure",
        "variable_unit": "£m",
    },
    {
        "timeseries": "opex low carbon",
        "variable_name": "Total operating expenditure low carbon",
        "variable_unit": "£m",
    },

    # CCS
    {
        "timeseries": "Tonnes of CO2 captured (MtCO2)",
        "variable_name": "Additional CCS",
        "variable_unit": "MtCO2",
    },

    # these are intermediate variables
    {
        "timeseries": "total emissions abated",
        "variable_name": "total emissions abated",
        "variable_unit": "MtCO2e",
    },
    {
        "timeseries": "cost differential",
        "variable_name": "cost differential",
        "variable_unit": "£m",
    },
    {
        "timeseries": "cum total emissions abated",
        "variable_name": "cum total emissions abated",
        "variable_unit": "MtCO2e",
    },
    {
        "timeseries": "cum cost differential",
        "variable_name": "cum cost differential",
        "variable_unit": "£m",
    },
]

We have to process the REEE measures differently because they do not follow the same output format as the other measures.

For now we are using the abatement EE fraction to compute all the variables.

In [49]:
reee_kwargs = [
    {
        "baseline_col": "Baseline emissions (MtCO2e)",
        "post_reee_col": "Post REEE baseline emissions (MtCO2e)",
        "out_col": "Abatement emissions CO2",
        "variable_unit": "MtCO2",
    },
    {
        "baseline_col": "Baseline electricity use (GWh)",
        "post_reee_col": "Post REEE baseline electricity use (GWh)",
        "out_col": "Additional demand electricity",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "baseline_col": "Baseline in natural gas use (GWh)",
        "post_reee_col": "Post REEE baseline in natural gas use (GWh)",
        "out_col": "Additional demand gas",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "baseline_col": "Baseline in petroleum use (GWh)",
        "post_reee_col": "Post REEE baseline in petroleum use (GWh)",
        "out_col": "Additional demand petroleum",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "baseline_col": "Baseline in solid fuel use (GWh)",
        "post_reee_col": "Post REEE baseline in solid fuel use (GWh)",
        "out_col": "Additional demand solid fuel",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
]

Now we have defined the configuration, we can process the data and write the outputs to excel

In [51]:
# write out the measure level data for this pathway
sd_df = nzip.sd_measure_level(df.copy(), measure_level_kwargs, reee_kwargs, nzip_path=nzip_path, baseline=False)
sd_df.to_excel(output_file, index=False, sheet_name='BP Measure level data')

In [42]:
# write a sheet containing the measure definitions
measure_defs_df = pd.DataFrame({
    'Sector': pd.Series(sd_df['Sector'].unique()).sort_values(),
    'Subsector': pd.Series(sd_df['Subsector'].unique()).sort_values(),
    'Measure Name': pd.Series(sd_df['Measure Name'].unique()).sort_values(),
    **{f'Category{i+3}: {category}': pd.Series(sd_df[f'Category{i+3}: {category}'].unique()).sort_values() for i, category in enumerate(nzip.CATEGORIES)}
})
with pd.ExcelWriter(output_file, mode='a', if_sheet_exists='replace') as writer:
    measure_defs_df.to_excel(writer, index=False, sheet_name='Measure definitions')


PermissionError: [Errno 13] Permission denied: 'sd-Industry-test.xlsx'

### Baseline pathway

Now we have computed the measure level sector databook outputs, we can compute the baseline pathway in a similar fashion.

In [None]:
baseline_kwargs = [
    {
        "timeseries": "Baseline emissions (MtCO2e)",
        "variable_name": "Baseline emissions CO2",
        "variable_unit": "MtCO2",
        "weight_col": "% CARBON Emissions",
    },
    {
        "timeseries": "Baseline emissions (MtCO2e)",
        "variable_name": "Baseline emissions CH4",
        "variable_unit": "MtCO2e",
        "weight_col": "% CH4 Emissions",
    },
    {
        "timeseries": "Baseline emissions (MtCO2e)",
        "variable_name": "Baseline emissions N2O",
        "variable_unit": "MtCO2e",
        "weight_col": "% N2O Emissions",
    },
    {
        "timeseries": "Baseline electricity use (GWh)",
        "variable_name": "Baseline demand electricity",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Baseline in natural gas use (GWh)",
        "variable_name": "Baseline demand gas",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Baseline in petroleum use (GWh)",
        "variable_name": "Baseline demand petroleum",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Baseline in solid fuel use (GWh)",
        "variable_name": "Baseline demand solid fuel",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Baseline in primary bioenergy use (GWh)",
        "variable_name": "Baseline demand final bioenergy",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Baseline in hydrogen use (GWh)",
        "variable_name": "Baseline demand hydrogen",
        "variable_unit": "TWh",
        "scale": 1e-3,
    },
    {
        "timeseries": "Counterfactual capex (£m)",
        "variable_name": "Baseline capital expenditure",
        "variable_unit": "£m",
    },
    {
        "timeseries": "Counterfactual opex (£m)",
        "variable_name": "Baseline operating expenditure",
        "variable_unit": "£m",
    },
]

In [None]:
bl_df = nzip.sd_measure_level(df, baseline_kwargs, baseline=True)
bl_df = nzip.baseline_from_measure_level(bl_df)
with pd.ExcelWriter(output_file, mode='a', if_sheet_exists='replace') as writer:
    bl_df.to_excel(writer, index=False, sheet_name='Baseline data')

### Aggregate results

Finally, we can aggregate the measure level and baseline pathway outputs to produce the final sector databook outputs.

In [None]:
agg_df = nzip.get_aggregate_df(df, measure_level_kwargs, baseline_kwargs, nzip.SECTOR)
with pd.ExcelWriter(output_file, mode='a', if_sheet_exists='replace') as writer:
    agg_df.to_excel(writer, index=False, sheet_name='Aggregate data')

# Finished!

Now you can just run the cell below to download the results!

In [None]:
if colab:
    files.download(output_file) 