# China Steel Case Study: Utilization, Emissions, and Projections

This notebook implements the tasks described in `instructions.md`, alternating between short markdown reminders of the task and Python cells that perform each step.

Data source: `GIST.xlsx` (same folder). We will:
- Load and inspect the data
- Build an operational plants dataset for years 2020-2030
- Compute China utilization rates (production/capacity) and optionally technology-level utilization rates (BF-BOF and EAF)
- Attribute utilization rates to plants and derive plant-level production
- Collect/define emission factors for China by technology and compute plant-level emissions
- Aggregate emissions at the company level
- Project company-level emissions into the near future (with simple scenarios/uncertainty)

Notes:
- Replace placeholder values for China production/capacity (WSA/OECD) and emission factors with authoritative figures.
- Units: capacities in the sheets appear as t/yr (ttpa). We will treat them as tonnes per year consistently.



In [50]:
# Imports and configuration
from pathlib import Path
import pandas as pd
import numpy as np

# Display
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 160)

# Paths
BASE_DIR = Path('/Users/yassine/Desktop/Signal/ESSEC3A/Research & Emerging Topics in Data Science/Lab 4')
XLSX_PATH = BASE_DIR / 'GIST.xlsx'
OUTPUT_DIR = BASE_DIR / 'outputs'
OUTPUT_DIR.mkdir(exist_ok=True)

XLSX_PATH

PosixPath('/Users/yassine/Desktop/Signal/ESSEC3A/Research & Emerging Topics in Data Science/Lab 4/GIST.xlsx')

## Task: Load Excel data and inspect structure

We will load all relevant sheets from `GIST.xlsx`, show sheet names, row counts, and preview headers. If `openpyxl` is not installed, please install it (`pip install openpyxl`) and re-run.


In [51]:
# Load all sheets
xls = pd.ExcelFile(XLSX_PATH)
print('Sheets:', xls.sheet_names)

sheets = {
    'about': pd.read_excel(xls, sheet_name='About'),
    'metadata': pd.read_excel(xls, sheet_name='Metadata'),
    'plant_data': pd.read_excel(xls, sheet_name='Plant data'),
    'plant_cap_status': pd.read_excel(xls, sheet_name='Plant capacities and status'),
    'plant_production': pd.read_excel(xls, sheet_name='Plant production'),
}

# Quick overview
for name, df in sheets.items():
    print(f"\n=== {name} === rows={len(df)} cols={len(df.columns)}")
    print(list(df.columns)[:20])

# Keep individual variables for convenience
about = sheets['about']
metadata = sheets['metadata']
plant_data = sheets['plant_data']
plant_cap_status = sheets['plant_cap_status']
plant_production = sheets['plant_production']

# Preview core tables
plant_data.head(3), plant_cap_status.head(3), plant_production.head(3)

Sheets: ['About', 'Metadata', 'Plant data', 'Plant capacities and status', 'Plant production']

=== about === rows=66 cols=3
['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2']

=== metadata === rows=77 cols=2
['"Plant data"', 'Unnamed: 1']

=== plant_data === rows=1209 cols=44
['Plant ID', 'Plant name (English)', 'Plant name (other language)', 'Other plant names (English)', 'Other plant names (other language)', 'Owner', 'Owner (other language)', 'Owner GEM ID', 'Owner PermID', 'SOE Status', 'Parent', 'Parent GEM ID', 'Parent PermID', 'Location address', 'Municipality', 'Subnational unit (province/state)', 'Country/Area', 'Region', 'Other language location address', 'Coordinates']

=== plant_cap_status === rows=1744 cols=15
['Plant ID', 'Plant name (English)', 'Plant name (other language)', 'Country/Area', 'Status', 'Start date', 'Nominal crude steel capacity (ttpa)', 'Nominal BOF steel capacity (ttpa)', 'Nominal EAF steel capacity (ttpa)', 'Nominal OHF steel capacity (ttpa)', 'Other/unspecifie

(        Plant ID                     Plant name (English)  Plant name (other language) Other plant names (English) Other plant names (other language)  \
 0  P100000120004  Kurum International Elbasan steel plant  Kurum Kombinati metalurgjik                         NaN                                NaN   
 1  P100000120439        Algerian Qatari Steel Jijel plant      الجزائرية القطرية للصلب                         AQS                                NaN   
 2  P100000120442                 ETRHB Annaba steel plant                          NaN                         NaN                                NaN   
 
                      Owner Owner (other language)   Owner GEM ID Owner PermID SOE Status                                             Parent  \
 0  Kurum International ShA                    NaN  E100000130992   5037939021        NaN                   Kurum International ShA [100.0%]   
 1    Algerian Qatari Steel                    NaN  E100001000957   5076384326    Partial  Ind

## Task: Build operational plants dataset (2020-2030)

Use `Start date` as commissioning year and `Retired date` as end year if present; if end date is missing, assume the plant continues operating. We will:
- Normalize date columns to years
- Join `Retired date` from `Plant data` onto `Plant capacities and status`
- Produce a long format table: one row per plant per year with `is_operating`.


In [52]:
# Helpers to coerce dates to year

def to_year(series):
    s = pd.to_datetime(series, errors='coerce')
    return s.dt.year

# Prepare key columns
cap = plant_cap_status.copy()
core_cols = [
    'Plant ID', 'Plant name (English)', 'Country/Area', 'Status', 'Start date',
    'Nominal crude steel capacity (ttpa)', 'Nominal BOF steel capacity (ttpa)',
    'Nominal EAF steel capacity (ttpa)', 'Nominal OHF steel capacity (ttpa)'
]
cap = cap[[c for c in core_cols if c in cap.columns]].copy()
cap.rename(columns={'Plant name (English)': 'Plant name'}, inplace=True)

# Bring retired date from plant_data
retire = plant_data[['Plant ID', 'Retired date']].copy()
cap = cap.merge(retire, on='Plant ID', how='left')

cap['start_year'] = to_year(cap['Start date'])
cap['end_year'] = to_year(cap['Retired date'])  # NaN if still operating

# Build operational flags for 2020-2030
YEARS = list(range(2020, 2031))
records = []
for _, row in cap.iterrows():
    start = int(row['start_year']) if not pd.isna(row['start_year']) else None
    end = int(row['end_year']) if not pd.isna(row['end_year']) else None
    for y in YEARS:
        if start is None:
            is_op = False
        elif end is None:
            is_op = y >= start
        else:
            is_op = (y >= start) and (y <= end)
        records.append({
            'Plant ID': row['Plant ID'],
            'Plant name': row.get('Plant name'),
            'Country/Area': row.get('Country/Area'),
            'Status': row.get('Status'),
            'year': y,
            'is_operating': is_op
        })

operational = pd.DataFrame.from_records(records)
operational.head()

  s = pd.to_datetime(series, errors='coerce')


Unnamed: 0,Plant ID,Plant name,Country/Area,Status,year,is_operating
0,P100000120004,Kurum International Elbasan steel plant,Albania,mothballed,2020,True
1,P100000120004,Kurum International Elbasan steel plant,Albania,mothballed,2021,True
2,P100000120004,Kurum International Elbasan steel plant,Albania,mothballed,2022,True
3,P100000120004,Kurum International Elbasan steel plant,Albania,mothballed,2023,True
4,P100000120004,Kurum International Elbasan steel plant,Albania,mothballed,2024,True


## Task: Compute China utilization rate (production / capacity)

Enter yearly series for China crude steel production (WSA) and capacity (OECD). Replace placeholders with authoritative numbers.
- Units: tonnes (t). If using Mt, multiply by 1e6.
- We will compute overall utilization and prepare placeholders for technology split (BF-BOF, EAF).

Method:
- Derive operating capacity by year from the plant list (sum of `Nominal crude steel capacity (ttpa)` for plants operating in each year and in China).
- Input/replace production series per year.
- Utilization = production / operating_capacity.


In [53]:
# Compute China operating capacity by year and placeholder production
# Merge capacity columns onto operational flags
cap_cols = [
    'Plant ID',
    'Nominal crude steel capacity (ttpa)',
    'Nominal BOF steel capacity (ttpa)',
    'Nominal EAF steel capacity (ttpa)',
    'Nominal OHF steel capacity (ttpa)'
]
cap_for_merge = plant_cap_status[[c for c in cap_cols if c in plant_cap_status.columns]].copy()
cap_for_merge = cap_for_merge.drop_duplicates(subset=['Plant ID'])

oper = operational.merge(cap_for_merge, on='Plant ID', how='left')
# Country/Area already present in operational

china_oper = oper[oper['Country/Area'] == 'China'].copy()

# Capacity per plant per year (assumed constant over years if plant operating)
china_oper['crude_capacity_t'] = china_oper['Nominal crude steel capacity (ttpa)'].fillna(0)

china_capacity_by_year = (
    china_oper[china_oper['is_operating']]
    .groupby('year', as_index=False)['crude_capacity_t']
    .sum()
    .rename(columns={'crude_capacity_t': 'operating_capacity_t'})
)

# Placeholder production series (REPLACE with WSA data, tonnes)
prod_placeholder = {
    2019: np.nan,
    2020: 1050000000.0,  # 1050 Mt
    2021: 1030000000.0,
    2022: 1010000000.0,
    2023: 1012000000.0,
    2024: 1020000000.0,
}
prod_df = pd.DataFrame({'year': list(prod_placeholder.keys()), 'production_t': list(prod_placeholder.values())})

util = china_capacity_by_year.merge(prod_df, on='year', how='left')
util['utilization'] = util['production_t'] / util['operating_capacity_t']

util.sort_values('year').head(10)

  china_oper['crude_capacity_t'] = china_oper['Nominal crude steel capacity (ttpa)'].fillna(0)


Unnamed: 0,year,operating_capacity_t,production_t,utilization
0,2020,1642550,1050000000.0,639.249947
1,2021,1642030,1030000000.0,627.27234
2,2022,1628440,1010000000.0,620.225492
3,2023,1610220,1012000000.0,628.485549
4,2024,1605400,1020000000.0,635.355675
5,2025,1640143,,
6,2026,1648668,,
7,2027,1648668,,
8,2028,1648668,,
9,2029,1648668,,


## Task: Compute technology-level utilization rates (BF-BOF and EAF)

We derive operating capacity by technology for China per year using plant-level capacities.
Default: apply overall utilization to each technology unless custom tech-specific utilization is provided.


In [54]:
# Compute tech capacities per year for China
china_oper['bof_capacity_t'] = china_oper['Nominal BOF steel capacity (ttpa)'].fillna(0)
china_oper['eaf_capacity_t'] = china_oper['Nominal EAF steel capacity (ttpa)'].fillna(0)

tech_cap_by_year = (
    china_oper[china_oper['is_operating']]
    .groupby('year', as_index=False)
    .agg(bof_capacity_t=('bof_capacity_t', 'sum'), eaf_capacity_t=('eaf_capacity_t', 'sum'))
)

# Default: apply overall utilization to each technology (can be overridden later)
tech_util = tech_cap_by_year.merge(util[['year', 'utilization']], on='year', how='left')
tech_util.rename(columns={'utilization': 'util_overall'}, inplace=True)
tech_util['util_bof'] = tech_util['util_overall']
tech_util['util_eaf'] = tech_util['util_overall']

tech_util.head(10)

  china_oper['eaf_capacity_t'] = china_oper['Nominal EAF steel capacity (ttpa)'].fillna(0)


Unnamed: 0,year,bof_capacity_t,eaf_capacity_t,util_overall,util_bof,util_eaf
0,2020,1441995.0,200555,639.249947,639.249947,639.249947
1,2021,1440645.0,201385,627.27234,627.27234,627.27234
2,2022,1427415.0,201025,620.225492,620.225492,620.225492
3,2023,1409195.0,201025,628.485549,628.485549,628.485549
4,2024,1404875.0,200525,635.355675,635.355675,635.355675
5,2025,1437618.0,202525,,,
6,2026,1445318.0,203350,,,
7,2027,1445318.0,203350,,,
8,2028,1445318.0,203350,,,
9,2029,1445318.0,203350,,,


## Task: Attribute utilization to plants and compute plant-level production

Approach:
- Determine a plant's dominant technology from capacity columns (BF-BOF vs EAF; fallback to OHF/unknown).
- For each plant-year operating, multiply the relevant capacity by the tech-specific utilization for that year to derive production.


In [55]:
# -------------------------------
# Determine dominant technology per plant
# -------------------------------
plant_caps = plant_cap_status[[
    'Plant ID', 'Plant name (English)', 'Country/Area',
    'Nominal crude steel capacity (ttpa)',
    'Nominal BOF steel capacity (ttpa)',
    'Nominal EAF steel capacity (ttpa)',
    'Nominal OHF steel capacity (ttpa)'
]].drop_duplicates('Plant ID').copy()

plant_caps.rename(columns={'Plant name (English)': 'Plant name'}, inplace=True)

# Ensure columns exist
for c in [
    'Nominal BOF steel capacity (ttpa)',
    'Nominal EAF steel capacity (ttpa)',
    'Nominal OHF steel capacity (ttpa)',
    'Nominal crude steel capacity (ttpa)'
]:
    if c not in plant_caps.columns:
        plant_caps[c] = 0

def clean_capacity(df, src_col, dst_col):
    """
    Convert capacity column to numeric:
    - Handle strings and commas
    - Coerce errors to NaN, then fill with 0
    """
    s = df[src_col].astype(str).str.replace(',', '', regex=False)
    s = s.replace('nan', '')  # if NaN became 'nan'
    df[dst_col] = pd.to_numeric(s, errors='coerce').fillna(0)
    return df

# Clean and create numeric capacity columns
plant_caps = clean_capacity(plant_caps, 'Nominal BOF steel capacity (ttpa)', 'bof_cap')
plant_caps = clean_capacity(plant_caps, 'Nominal EAF steel capacity (ttpa)', 'eaf_cap')
plant_caps = clean_capacity(plant_caps, 'Nominal OHF steel capacity (ttpa)', 'ohf_cap')
plant_caps = clean_capacity(plant_caps, 'Nominal crude steel capacity (ttpa)', 'crude_cap')

# Dominant tech per plant (now safe: all numeric)
plant_caps['tech'] = np.select(
    condlist=[
        plant_caps['bof_cap'] > plant_caps['eaf_cap'],
        plant_caps['eaf_cap'] > plant_caps['bof_cap']
    ],
    choicelist=['BF-BOF', 'EAF'],
    default='Unknown'
)

# -------------------------------
# Build plant-year production for China only
# -------------------------------
china_py = operational.merge(plant_caps, on='Plant ID', how='left')

# --- Harmonize Country/Area after merge ---
if 'Country/Area' not in china_py.columns:
    if 'Country/Area_x' in china_py.columns:
        china_py['Country/Area'] = china_py['Country/Area_x']
    elif 'Country/Area_y' in china_py.columns:
        china_py['Country/Area'] = china_py['Country/Area_y']

# --- Harmonize Plant name after merge ---
if 'Plant name' not in china_py.columns:
    if 'Plant name_x' in china_py.columns:
        china_py['Plant name'] = china_py['Plant name_x']
    elif 'Plant name_y' in china_py.columns:
        china_py['Plant name'] = china_py['Plant name_y']

# Filter for China
china_py = china_py[china_py['Country/Area'] == 'China'].copy()

# Attach tech utilization per year
china_py = china_py.merge(
    tech_util[['year', 'util_bof', 'util_eaf', 'util_overall']],
    on='year',
    how='left'
)

# Choose capacity according to tech
china_py['cap_for_prod'] = np.where(
    china_py['tech'] == 'BF-BOF', china_py['bof_cap'],
    np.where(
        china_py['tech'] == 'EAF', china_py['eaf_cap'],
        china_py['crude_cap']  # fallback if tech is Unknown
    )
)

# Choose utilization according to tech
china_py['util_for_prod'] = np.where(
    china_py['tech'] == 'BF-BOF', china_py['util_bof'],
    np.where(
        china_py['tech'] == 'EAF', china_py['util_eaf'],
        china_py['util_overall']  # fallback if tech is Unknown
    )
)

# Production only when operating
china_py['production_t'] = np.where(
    china_py['is_operating'],
    china_py['cap_for_prod'] * china_py['util_for_prod'],
    0.0
)

china_py_prod = china_py[[
    'Plant ID', 'Plant name', 'year', 'tech',
    'cap_for_prod', 'util_for_prod', 'is_operating', 'production_t'
]].copy()

china_py_prod.head(10)

Unnamed: 0,Plant ID,Plant name,year,tech,cap_for_prod,util_for_prod,is_operating,production_t
0,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2020,BF-BOF,3600.0,639.249947,True,2301300.0
1,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2021,BF-BOF,3600.0,627.27234,True,2258180.0
2,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2022,BF-BOF,3600.0,620.225492,True,2232812.0
3,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2023,BF-BOF,3600.0,628.485549,True,2262548.0
4,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2024,BF-BOF,3600.0,635.355675,True,2287280.0
5,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2025,BF-BOF,3600.0,,True,
6,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2026,BF-BOF,3600.0,,True,
7,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2027,BF-BOF,3600.0,,True,
8,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2028,BF-BOF,3600.0,,True,
9,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2029,BF-BOF,3600.0,,True,


## Task: Set emission factors by technology and compute plant-level emissions

We will define China-specific emission factors by technology (from Hasanbeigi, replace with authoritative values), then compute emissions:
- emissions_tCO2 = production_t × emission_factor_tCO2_per_t.


In [56]:
# Placeholder emission factors (tCO2 per t crude steel) — replace with authoritative values
emission_factors = {
    'BF-BOF': 2.10,  # placeholder
    'EAF': 0.40,     # placeholder (scrap share/energy mix sensitive)
    'Unknown': 1.50  # fallback
}

china_py_emis = china_py_prod.copy()
china_py_emis['ef_tco2_per_t'] = china_py_emis['tech'].map(emission_factors).fillna(emission_factors['Unknown'])
china_py_emis['emissions_tco2'] = china_py_emis['production_t'] * china_py_emis['ef_tco2_per_t']

china_py_emis.head(10)


Unnamed: 0,Plant ID,Plant name,year,tech,cap_for_prod,util_for_prod,is_operating,production_t,ef_tco2_per_t,emissions_tco2
0,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2020,BF-BOF,3600.0,639.249947,True,2301300.0,2.1,4832730.0
1,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2021,BF-BOF,3600.0,627.27234,True,2258180.0,2.1,4742179.0
2,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2022,BF-BOF,3600.0,620.225492,True,2232812.0,2.1,4688905.0
3,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2023,BF-BOF,3600.0,628.485549,True,2262548.0,2.1,4751351.0
4,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2024,BF-BOF,3600.0,635.355675,True,2287280.0,2.1,4803289.0
5,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2025,BF-BOF,3600.0,,True,,2.1,
6,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2026,BF-BOF,3600.0,,True,,2.1,
7,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2027,BF-BOF,3600.0,,True,,2.1,
8,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2028,BF-BOF,3600.0,,True,,2.1,
9,P100000120163,Angang Group Xinyang Iron and Steel Co Ltd,2029,BF-BOF,3600.0,,True,,2.1,


## Task: Aggregate plant-level emissions at the company level

We’ll map plants to companies using `Parent` if available, else `Owner`, and sum production and emissions per company-year.


In [57]:
# Company mapping: prefer Parent, fallback to Owner
company_map = plant_data[['Plant ID', 'Owner', 'Parent']].drop_duplicates('Plant ID').copy()
company_map['company'] = company_map['Parent'].fillna(company_map['Owner'])

china_company = china_py_emis.merge(company_map[['Plant ID', 'company']], on='Plant ID', how='left')

company_year = (
    china_company.groupby(['company', 'year'], as_index=False)
    .agg(
        production_t=('production_t', 'sum'),
        emissions_tco2=('emissions_tco2', 'sum')
    )
)

company_year.sort_values(['year', 'emissions_tco2'], ascending=[True, False]).head(10)


Unnamed: 0,company,year,production_t,emissions_tco2
0,Angang Steel Co Ltd [100.0%],2020,32537820.0,68329430.0
132,Baoshan Iron & Steel Co Ltd [100.0%],2020,30692950.0,64455190.0
1881,Jinghua Rigang Holding Group Co Ltd [98.8%]; u...,2020,28702320.0,60274880.0
2178,Maanshan Iron & Steel Co Ltd [100.0%],2020,28095040.0,58999570.0
1397,Hengde International Co Ltd [28.2%]; Hong Kong...,2020,26603030.0,55866350.0
363,Bengang Steel Plates Co Ltd [100.0%],2020,23780100.0,49938210.0
1628,Inner Mongolia BaoTou Steel Union Co Ltd [100.0%],2020,19957380.0,41910510.0
385,British Asia Pan Ruogen International Co Ltd [...,2020,16869810.0,35426590.0
308,Beijing Shougang Co Ltd [100.0%],2020,16812270.0,35305770.0
627,Citic Pacific Special Steel Group Co Ltd [100.0%],2020,16236950.0,34097590.0


## Task: Project company-level emissions into the near future

We’ll create simple scenarios from the latest historical year to 2030:
- Baseline: flat emissions
- Growth: +1% CAGR
- Decline: −2% CAGR
Optionally add simple uncertainty bands.


In [None]:
# Build projections from last historical year
last_hist_year = int(company_year['year'].max()) if len(company_year) else 2024
proj_years = list(range(last_hist_year + 1, 2031))

scenarios = {
    'baseline': 0.00,
    'growth': 0.01,
    'decline': -0.02,
}

projections = []
for _, row in company_year[company_year['year'] == last_hist_year].iterrows():
    company = row['company']
    base_emis = row['emissions_tco2']
    for scen, rate in scenarios.items():
        val = base_emis
        for y in proj_years:
            val = val * (1 + rate)
            projections.append({'company': company, 'year': y, 'scenario': scen, 'emissions_tco2': val})

proj_df = pd.DataFrame(projections)

# Combine historical and projections for quick plotting/tables
hist_df = company_year.copy()
hist_df['scenario'] = 'historical'
company_emis_all = pd.concat([hist_df, proj_df], ignore_index=True)

company_emis_all.head(10)

Unnamed: 0,company,year,production_t,emissions_tco2,scenario
0,Angang Steel Co Ltd [100.0%],2020,32537820.0,68329430.0,historical
1,Angang Steel Co Ltd [100.0%],2021,31928160.0,67049140.0,historical
2,Angang Steel Co Ltd [100.0%],2022,31569480.0,66295900.0,historical
3,Angang Steel Co Ltd [100.0%],2023,31989910.0,67178820.0,historical
4,Angang Steel Co Ltd [100.0%],2024,32339600.0,67913170.0,historical
5,Angang Steel Co Ltd [100.0%],2025,0.0,0.0,historical
6,Angang Steel Co Ltd [100.0%],2026,0.0,0.0,historical
7,Angang Steel Co Ltd [100.0%],2027,0.0,0.0,historical
8,Angang Steel Co Ltd [100.0%],2028,0.0,0.0,historical
9,Angang Steel Co Ltd [100.0%],2029,0.0,0.0,historical


## Task: Export outputs

We will export key tables to `outputs/`:
- `operational.csv`
- `utilization_china.csv`
- `plant_production_emissions_china.csv`
- `company_emissions_historical.csv`
- `company_emissions_with_projections.csv`


In [59]:
# Export CSVs
operational.to_csv(OUTPUT_DIR / 'operational.csv', index=False)
util.sort_values('year').to_csv(OUTPUT_DIR / 'utilization_china.csv', index=False)
china_py_emis.to_csv(OUTPUT_DIR / 'plant_production_emissions_china.csv', index=False)
company_year.to_csv(OUTPUT_DIR / 'company_emissions_historical.csv', index=False)
company_emis_all.to_csv(OUTPUT_DIR / 'company_emissions_with_projections.csv', index=False)

OUTPUT_DIR, list(p.name for p in OUTPUT_DIR.iterdir() if p.suffix == '.csv')

(PosixPath('/Users/yassine/Desktop/Signal/ESSEC3A/Research & Emerging Topics in Data Science/Lab 4/outputs'),
 ['utilization_china.csv',
  'company_emissions_historical.csv',
  'company_emissions_with_projections.csv',
  'operational.csv',
  'plant_production_emissions_china.csv'])