# Combining COMTRADE and Flaring data

GGFR have a global flare datasets derived from Elvidge at point locations. They further map these point locations to operators via oil field mapping. This means there are a number of datasets:

1. Point locations of flaring
2. Mapped oil field locations (from Woods Mackenzie)
3. Tabulated national accounts (of flaring)

Harshit suggested there could be an interesting combination of the COMTRADE data to see what the comparison of nationally generated flaring vs imported flaring.

Flaring Supply Index (FSI) is a concept proposed for an indicator that would be able to describe and compare the ’flaring footprint’ of different oil supply corridors from their source to the destination country’s border. The formula(s) for such an index may be described as follows.
Flare Supply Index of a Country X= (∑_(i=1)^n▒〖Crude Import by X from Country Yi * Flare Intensity of Country Yi〗)/(Total Crude Imported by Country X)


## Definitions
Flare Intensity - ???  
TOE - Tonnes of oil equivalent

## TO DO
1. TOE is not the appropriate flaring comparison for intensity, figure out what it is
2. Right now, the intensity is taken just from 2019; should use yearly data

In [1]:
import sys, os, importlib
import json, geojson, pycountry

import pandas as pd
import geopandas as gpd
import numpy as np

In [29]:
# Define input variables
# COMTRADE data can be downloaded here - https://datacatalog.worldbank.org/dataset/global-comtrade-flows-data
#comtrade_file = '/home/wb411133/data/Projects/INFRA/FLOWS/OIL_CRUDE_ONLY_2021/GEOJSON/country_flows_imports.geojson'
comtrade_file = "/home/wb411133/data/Projects/INFRA/FLOWS/Oil UNComtrade 0115.csv"
flare_data = '/home/public/Data/GLOBAL/INFRA/FLARING/2019_flare_catalog.csv'
flare_national_data = '/home/public/Data/GLOBAL/INFRA/FLARING/National_flaring_2019.csv'
out_folder = "/home/wb411133/data/Projects/INFRA/FLARING/Data"
if not os.path.exists(out_folder):
    os.makedirs(out_folder)

inCom = gpd.read_file(comtrade_file)
inFlare = pd.read_csv(flare_data)
inF = pd.read_csv(flare_national_data)

The location specific flare data is not necessary for calculating the FSI, but it could be used in other calculations

In [51]:
inFlare.head()

Unnamed: 0,id,month,rh,area,areacorr,rhcorr,t_min,t_mean,t_max,ellipticit,...,id2016,id2017,type,lat,lon,iso,rhht,bcm,country,idkey
0,3591,01-Jan-0019,251.197,472.499,67.8311,39.7654,1704,1839.27,2245,1.6015,...,5171,5531,flare,9.652042,-63.623525,VEN,39.7654,1.167234,Venezuela,VEN_UPS_2015_63.6235W_9.6520N_v0.2
1,6116,01-Jan-0019,221.96,673.852,88.3175,37.2024,1446,1651.94,1825,4.78228,...,8469,8855,flare,31.025901,47.283392,IRQ,37.2024,1.092002,Iraq,IRQ_UPS_2015_47.2834E_31.0259N_v0.2
2,6450,01-Jan-0019,155.449,477.148,73.9962,32.2923,1293,1658.76,2039,2.31782,...,8865,9308,flare,28.494039,49.714096,IRN,32.2923,0.947876,Iran,IRN_UPS_2015_49.7141E_28.4940N_v0.2
3,3617,01-Jan-0019,158.343,318.041,52.0764,28.5706,1650,1789.81,2185,1.6015,...,5179,5543,flare,9.648367,-63.563771,VEN,28.5706,0.838633,Venezuela,VEN_UPS_2015_63.5638W_9.6484N_v0.2
4,6253,01-Jan-0019,137.394,379.726,62.2554,26.8446,1481,1660.03,1902,1.84388,...,8603,8999,flare,31.003182,48.13951,IRN,26.8446,0.78797,Iran,IRN_UPS_2015_48.1395E_31.0032N_v0.2


In [30]:
# Limit the flaring summaries to 2019
flare_2019 = inF.filter(regex="2019")
flare_2019.columns = ["Volume","Intensity"]
flare_2019['country'] = inF['country']
flare_2019.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,Volume,Intensity,country
0,23212.17006,5.862699884,Russia
1,17914.219,10.37717386,Iraq
2,17293.74252,3.868390139,United States
3,13781.15554,12.59168227,Iran
4,9541.420483,29.81330551,Venezuela


In [31]:
def get_country(x):
    ''' Convert country name to ISO3
    
    :param: x [string] - name of country to convert
    :returns: [string] - ISO3 code of country
    '''
    try:
        res = pycountry.countries.search_fuzzy(x)
        return(res[0].alpha_3)
    except:
        return("NA")

#xx = get_country("Canada")
flare_2019['ISO3'] = flare_2019['country'].apply(lambda x: get_country(x))
flare_2019.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,Volume,Intensity,country,ISO3
0,23212.17006,5.862699884,Russia,RUS
1,17914.219,10.37717386,Iraq,IRQ
2,17293.74252,3.868390139,United States,USA
3,13781.15554,12.59168227,Iran,IRN
4,9541.420483,29.81330551,Venezuela,VEN


In [32]:
# 5 of the country names don't grok, set them manually

#flare_2019.loc[flare_2019['ISO3'] == "NA"]
flare_2019.iloc[16, 3] = "COG"
flare_2019.iloc[30, 3] = "ARE"
flare_2019.iloc[44, 3] = "COD"
flare_2019.iloc[60, 3] = "NER"
flare_2019.loc[flare_2019['ISO3'] == "NA"]

Unnamed: 0,Volume,Intensity,country,ISO3


In [49]:
#Combine the flare intensity information with the comtrade data
in_flaring_combined = pd.merge(inCom, flare_2019, left_on="Partner ISO", right_on="ISO3", how="left")
#Drop unnecessary columns
in_flaring_combined = in_flaring_combined.drop(['geometry', 'country', 'ISO3'], axis=1)

def tryFloat(x):
    try:
        return(float(x))
    except:
        return(0.)

# Multiply the TOE from the comtrade data by the intensity information
### TODO: TOE is not the unit appropriate for intensity calculations; however, the relative rank is still useful
in_flaring_combined['Qty'] = in_flaring_combined['Qty'].apply(tryFloat)
in_flaring_combined['Trade Value (US$)'] = in_flaring_combined['Trade Value (US$)'].apply(tryFloat)
in_flaring_combined['Intensity'] = in_flaring_combined['Intensity'].apply(tryFloat)
in_flaring_combined['Qty_I'] = in_flaring_combined['Qty'] * in_flaring_combined['Intensity']

In [50]:
tempD = in_flaring_combined.loc[in_flaring_combined['Reporter ISO'].isin(['AUS','USA','CAN', 'NZL','JPN'])]
tempD = tempD.loc[tempD['Commodity Code'] == '333']
tempD.loc[:,['Reporter ISO', 'Partner ISO', 'Year', 'Qty', 'Trade Value (US$)']].groupby(['Reporter ISO', 'Year']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,Qty,Trade Value (US$)
Reporter ISO,Year,Unnamed: 2_level_1,Unnamed: 3_level_1
AUS,2010,81404880000.0,50283950000.0
AUS,2011,78627850000.0,66652690000.0
AUS,2012,78103120000.0,67747930000.0
AUS,2013,66783020000.0,57031820000.0
AUS,2015,55827560000.0,23601760000.0
AUS,2016,54560440000.0,18862520000.0
AUS,2017,59764360000.0,21397030000.0
CAN,2010,339358300000.0,147165400000.0
CAN,2011,348824000000.0,196603200000.0
CAN,2012,71234990000.0,209185100000.0


In [47]:
in_flaring_combined.head()

Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Qty,Netweight (kg),Gross weight (kg),Trade Value (US$),CIF Trade Value (US$),FOB Trade Value (US$),Flag,Volume,Intensity,Qty_I
0,S3,2010,2010,2010,3,0,1,Import,8,Albania,...,21.972122,21972122,,13047443,,,0,,,
1,S3,2010,2010,2010,3,0,2,Export,8,Albania,...,537.315248,537315248,,162736020,,,0,,,
2,S3,2010,2010,2010,3,0,2,Export,8,Albania,...,3e-06,3,,6,,,0,1052.103541,0.653955,1.961866
3,S3,2010,2010,2010,3,0,1,Import,8,Albania,...,8e-06,8,,48,,,0,2024.599591,1.45015,11.6012
4,S3,2010,2010,2010,3,0,2,Export,8,Albania,...,326.910458,326910458,,97263384,,,0,1.050911,0.037056,12114030.0


In [13]:
in_flaring_combined.to_csv(os.path.join(out_folder, "FLARING_TRADE_CRUDEONLY_COMBINED_ALLTRADE_JAN2021.csv"))

In [None]:
# Aggregate imported TOE, Value and TOE by import intensity
agg = {'TOE':'sum','Trade Value (US$)':'sum', 'Intensity':'mean','TOE_I':'sum'}
res = in_flaring_combined.groupby(['Reporter ISO', "Year"]).aggregate(agg)
res = res.reset_index()
res.head(20)