# Introduction

The goal of this initial analysis is to load data about imports, exports, and domestic production of various food types, relevant to Switzerland. Please see the `README` for overall project goals and background information.

In [None]:
# import external libraries
%matplotlib inline
import collections
import inspect
import pickle
import re

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import plotly.graph_objects as go #may not need this
import holoviews as hv
from holoviews import opts
import networkx as nx
from networkx.algorithms import bipartite
import warnings
warnings.filterwarnings("ignore", category=UserWarning) # prevents internal problem with networkx from showing error

%load_ext autoreload
%autoreload 2

In [None]:
# import local dependencies
import sys
sys.path.insert(1, "scripts")

from helpers import *
from plots import *
from impex_data_manipulation import *
from fao_data_manipulation import *
from emissions_data_manipulation import *
from data_analysis import *

The first step is to calculate, for each type of food, how much of what is consumed by the Swiss population is produced within Switzerland versus imported. To do this, we will combine 3 sets of data: imports, exports, and domestic production. The imports and exports data are sourced from [Swiss Impex](https://www.gate.ezv.admin.ch/swissimpex/index.xhtml), a website hosted by the Swiss Federal Customs Administration which provides data on Switzerland's global trade activity. Domestic production data comes from [FAOStat](http://www.fao.org/faostat/en/#data), the Food and Agriculture Organization of the United Nations which offers a variety of agricultural-related data. In theory, the amount of food consumed in Switzerland (including food waste) can be calculated from these datasets:
#### Also add the website from FAO that had the fish and seafood data

Food consumed = domestic production + imports - exports

Note that the Swiss Federal Statistics Office also provided relevant data--namely, it has a dataset on Swiss food consumption by type of food. Unfortunately, these data conflicted with the data from FAO and Swiss Impex. Since the Federal Statistics Office data was much less detailed (for instance, more general/broad food categories), we decided to focus on Impex and FAO, knowing that the numbers must be taken with a grain of salt since it is difficult to accurately quantify such data.

Let's load all the data and then combine the various data sets to get the values of interest. First we'll load imports and exports data from Impex. The data is spread across multiple Excel files and sheets.

***
**Data loading and manipulation**
***

Let's start with loading the Swiss Impex into the `impex` dataframe:

In [None]:
impex = load_impex()
impex.head()

To ensure that our data is properly processed and handled as is goes through the changes we are about to expose it to, we will pull out one value of this dataframe (kiwi fruits imported from New Zealand) and check at the end to make sure its value is still properly matched with the country and fruit.

In [None]:
impex.loc["New Zealand"].fruits.kiwi_fruit.imports

We then create a dataframe `impex_total` for storing total quantities for indicator variables:

In [None]:
# 1. Select only first row (total), creates a series
# 2. Unstack first level (indicator) to create a dataframe
impex_total = impex.iloc[0].unstack("indicator")
impex_total.head()

Let's now further manipulate the original `impex` dataframe to include information about the continents and sub-continents in the index:

In [None]:
# First we drop the total row
impex = impex.drop("total")

# Second, we load the country-continent info
continents = load_countries_continents()

# Finally, we compute the new index
countries_mindex = pd.MultiIndex.from_arrays(continents.values.T, names=continents.columns)
impex_mindex = countries_mindex[countries_mindex.get_level_values(2).isin(impex.index)]

# re-sort the original impex dataframe so it matches the incoming impex_mindex
# since impex is sorted alphabetically, and the impex_mindex does NOT match on country, it just matches by row number
impex = impex.reindex(impex_mindex.get_level_values(2))

# now apply the new index to the properly sorted impex df
impex.set_index(impex_mindex, inplace=True)
impex.index.names

In [None]:
# double check that the numbers copied over correctly/stayed with their correct labels
impex.xs("New Zealand", level="country", drop_level=False).fruits.kiwi_fruit.imports

Note that the `impex` frame should only contain information about food destined for humans. We were careful to ensure this was true: all items which can also be imported for the use of animal feed were selected by their subcategory of being specifically for human consumption. 

Now, let's load the data on Swiss domestic `production` from FAO:

In [None]:
production = load_fao()
production.head()

In this data as well, we were careful that only crops produced for human consumption are measured. This is specified in the FAO metadata.

Next we perform an outer join of `impex_total` and `production` to create a meta-dataframe, `suisse` with all total macroeconomic indicators for each commodity subtype:

In [None]:
suisse = impex_total.join(production, how="outer")
suisse.head()

We have three possibilities for each commodity
1. Neither production or import/export values are given or they all sum up to 0
2. Either only production values are given or only import/export values are given
3. Production, import and export values are all given (and are non-zero)

Commodities fulfilling the first condition will be removed. Commodities fulfilling the second or third condition will be kept.

<em style="font-size: 12px">Note: Missing or 0-valued data may be due to either the values actually being 0 or the data not being collected on these items. Since there is no way of knowing which is the case, we will assume that the values are indeed truly 0 to enable their utilization in the analysis.</em>

In [None]:
# Remove subtypes for which all given quantities sum up to 0
subtypes_no_info = suisse.index[suisse.sum(axis=1) == 0]
suisse.drop(subtypes_no_info, inplace=True)

Next, we shall add columns for `domestic_consumption` and `imported_consumption`.

Definitions:
* **domestic consumption**: goods and services consumed in the country where they are produced
* **imported consumption**: goods and services consumed in the country to which they are imported

**Note**: in this analysis we will make the following simplified assumptions:

* food waste is included as being "consumed"
* exported quantity is first satisfied by available produced quantity, and then imported quantity

In [None]:
# If exports > production, then we say all produced quantity
# is exported and the rest of exported quantity is satisfied by imports:
# 1. Set domestic_consumption to zero
# 2. Set imported_consumption to imports-(exports-production)

# If exports <= production, we say all exports
# is satisfied by domestic production: 
# 1. Set domestic_consumption to production-exports
# 2. Set imported_consumption to imports
suisse["domestic_consumption"] = np.where(
    suisse.exports > suisse.production,
    0,
    suisse.production - suisse.exports
)

suisse["imported_consumption"] = np.where(
    suisse.exports > suisse.production,
    suisse.imports - suisse.exports + suisse.production,
    suisse.imports
)

suisse["consumption"] = suisse.domestic_consumption + suisse.imported_consumption

suisse.head()

***
**Data analysis and visualization**
***

Let's plot imports and exports per meta-type:

In [None]:
# 1. Group by meta-type
# 2. Sum the totals
# 3. Unstack the columns to create a Series
impex_total_metatype = impex_total.groupby("type").sum().unstack()

In [None]:
impex_total_metatype

In [None]:
plt.figure(figsize=(12, 4))
ax = sns.barplot(
    x=impex_total_metatype.index.get_level_values("type"),
    y=impex_total_metatype.values / 1000,  # kg --> tonnes
    hue=impex_total_metatype.index.get_level_values("indicator"),
)

ax.set(
    title="Total imports vs. exports", xlabel="commodity", ylabel="quantity (tonnes)"
)
plt.xticks(rotation=90)
sns.despine()
plt.savefig("../docs/img/total_imports_vs_exports.jpg", dpi=100, bbox_inches='tight')

As you can see, Switzerland imports many more fruits, vegetables and meats than it exports. The `animal_products` category is interesting, because import and export quantities are about equal. What would make the most sense is if Switzerland imports different animal products than it exports (rather than the same products being both imported and exported). Let's test this theory by looking at imports and exports for the subcategories of `animal_products`:

In [None]:
impex_total_animal_prods = impex_total.loc["animal_products"].groupby("subtype").sum().unstack()

In [None]:
plt.figure(figsize=(12, 4))
ax = sns.barplot(
    x=impex_total_animal_prods.index.get_level_values("subtype"),
    y=impex_total_animal_prods.values / 1000,  # kg --> tonnes
    hue=impex_total_animal_prods.index.get_level_values("indicator"),
)

ax.set(
    title="Imports vs. exports of animal products", xlabel="commodity", ylabel="quantity (tonnes)"
)
plt.xticks(rotation=90)
sns.despine()
plt.savefig("../docs/img/imports_vs_exports_animal_products.jpg", dpi=100, bbox_inches='tight')

This graph brings more light to the topic. Cheese, a category of which there are many different flavors and consumers like variety, is both heavily imported and exported, likely giving consumers access to a wider variety of cheese types. Eggs, on the other hand, are nearly exclusively imported, while whey is mostly exported.

The butter category of this graph is a good transition to the continuation of the analysis; just because butter is hardly imported nor exported does not mean the Swiss do not eat butter! Rather, imports and exports are only part of the broader picture since domestic production is another important consideration. One possible hypothesis for why butter is neither imported nor exported in large quantities is because domestic production is nearly equal to domestic consumption.

#### Bipartite graph between continents and meta food groups, weighted by amounts of food imported from those continents

In [None]:
# make the nodes: continents on the left, food groups on the right
continents = impex.index.get_level_values(0).unique().array
meta_food_groups = impex.columns.levels[0].array
bi = nx.Graph()
bi.add_nodes_from(continents, bipartite=1)
bi.add_nodes_from(meta_food_groups, bipartite=0)

# make an edge between each continent and each food group
edges = []
for continent in continents:
    for food in meta_food_groups:
        edges.append((continent, food))
        
bi.add_edges_from(edges, weight=3)

In [None]:
# calculate the initial raw weights, which is the total amount of food for each continent-food pair
weights = []
for continent in continents:
    for food in meta_food_groups:
        total_food_amount = impex.xs(continent)[food].xs('imports', level=1, axis=1).to_numpy().sum()
        weights.append(total_food_amount)
        
# incorporate these weights into the data for each edge
for num, name in enumerate(bi.edges(data=True)):
    name[2]['weight'] = weights[num]

In [None]:
#make a list of all the weights so can edit them
all_weights = []
for (node1,node2,data) in bi.edges(data=True):
    all_weights.append(data['weight'])
    #print(data['weight']) # this is totals by continent per meta food group

# normalize the weights so they are appropriate for the graph
for num, wt in enumerate(all_weights):
    all_weights[num] = wt*len(continents)*5/sum(weights)
    
# incorporate the normalized weights into the edge data for use in the graph
for num, name in enumerate(bi.edges(data=True)):
    name[2]['weight'] = all_weights[num]


top = nx.bipartite.sets(bi)[0]
pos = nx.bipartite_layout(bi, top)

# the old, non-interactive graph
#nx.draw_networkx(bi, pos, font_size=12, node_size=900, node_color='red', node_shape='s', width=all_weights)

In [None]:
# make an interactive bipartite graph with Holoviews and Bokeh
bipartite = hv.Graph.from_networkx(bi, pos, width=all_weights)

# create text labels for each of the nodes
labs = ['Africa', 'America', 'Asia', 'Europe','Oceania','animal_products', 'cereals', 'fruits', 'meat', 'seafood', 'vegetables']
labels = hv.Labels({('x', 'y'): bipartite.nodes.array([0,1]), 'text': labs}, ['x', 'y'], 'text')

# plot the graph with labels
hv.extension('bokeh')
# this padding feature was working previously and then broke, so I edited it a bit...
#padding = dict(x=(-1.3, 1.3), y=(-0.9, 1)) # make graph display slightly larger than data
#(bipartite*labels).relabel("Food Type Imports by Continent").redim.range(**padding).opts(
(bipartite*labels).relabel("Food Type Imports by Continent").opts(
    opts.Labels(text_color='text', cmap='Category20', yoffset=0.13, fontsize=14, padding=0.2),
    #opts.Points(color='black', size=25), 
    opts.Graph(node_size=30, inspection_policy='edges', fontsize={'title':20},
               node_hover_fill_color='red', edge_line_width='weight', xaxis=None, yaxis=None, node_fill_color='lightgray',
               edge_hover_fill_color='red', frame_height=400, frame_width=600, padding=((0.1, 0.15), (0.1, 0.2))))

In [None]:
# TODO: change the weights that show when hovering--these are the normalized weights, we want the actual kg of food
# TODO: figure out why the hover edges are green instead of red
# TODO: find a way to color the continent labels one color and the food labels another color
# TODO: remove the "start" and "end" in the hover labels, just say continent and food
# TODO MAYBE: add a widget to toggle between inspection_policy of edges vs nodes

## Emissions


Now that we have calculated Switzerland's production and consumption, we want to have a look at how this translates in terms of equivalent CO$_2$ production. To do this, we could look at Swiss-specific values for the greenhouse gas production for different food types, however data from other countries is very sparse and reliable data for many food types is only available for certain countries. Later in the analysis, we want to compare Swiss production emissions with Swiss import emissions, therefore for consistency we need to either have data available for all (or most) countries for a given product, or use global averages. Comprehensive data is available on meat and cereal production emissions worldwide, however fruit and vegetable figures are much harder to obtain and only a limited number of studies have been carried out. These studies have been accumulated in a systematic review (https://www.sciencedirect.com/science/article/pii/S0959652616303584) and these values have been used to calculate averages for a number of different kinds of produce. To ensure our results are consistent, we will use this generalisation of global average values to estimate the domestic Swiss emissions for fruit and vegetables.

<!---
Other thought: I've currently created a dictionary to map the different fruit/veg to the impex categories. Was just thinking, this could be a potential spot to use machine learning, although admittedly not a very useful one...
--->

In [None]:
emissions = load_emissions()
emissions.head()

In [None]:
# integrate the median emissions values into this dataframe
suisse = add_emissions_data(suisse, emissions)
suisse.head()

So assuming that domestic transport is negligible, we can estimate the equivalent CO$_2$ emissions for each product type, using the global average values.

In [None]:
# calculate the emissions (resulting from production of the products)
suisse = production_emissions(suisse)
suisse.head()

The final column of the above table shows the total CO$_2$ equivalent (kg) that would be produced if everything that was consumed in Switzerland was domestically produced, i.e. no transport emissions were considered and Swiss-specific CO$_2$ emissions were used for meat and cereal production. Evidently, it is not possible to produce everything that a current consumer buys locally, so in the following analysis we will consider the effect that these imported products and 'food miles' have on the CO$_2$ emissions resulting from Swiss consumption. 

## Transport Analysis

In [None]:
countries = country_distances()
countries.head()

In [None]:
transport = load_impex_transport()
transport.tail(10)

Now, we'll remove the totals for each country and store in a separate dataframe

In [None]:
transport_total = transport.xs("total", level=1)
transport.drop(index="total", level=1)

In [None]:
# define a function which, for each row of 'transport', calculate the fractions for each food group transported by
# each method of transport
def calculate_percent_by_method(row):
    total = totals.loc[row.commercial_partner]
    fractions = [row['cereals']/total['cereals'], \
                 row['potatoes']/total['potatoes'], \
                 row['other_fresh_fruits_vegetables']/total['other_fresh_fruits_vegetables'], \
                 row['fish']/total['fish'], \
                 row['meat']/total['meat'], \
                 row['dairy_products']/total['dairy_products']]
    return pd.Series(np.nan_to_num(fractions))

In [None]:
# run the function on the whole df
np.seterr('ignore') # ignore divisions by 0 (when the total imports is 0)
transport[['cereals', 'potatoes', 'other_fresh_fruits_vegetables', 'fish', 'meat', 'dairy_products']] = transport.apply(calculate_percent_by_method, axis=1)

In [None]:
transport = transport.set_index(['commercial_partner', 'mode_of_transport'])
transport.head(10)

## Further Analysis

The following will describe initial observations we have made regarding the transport of food, global carbon emissions intensities, and how we plan to use this information to estimate the impact consumption of different foods in Switzerland has and how a Swiss consumer can minimise their environmental impact in their food choices.

'Food miles', or the distance that food has to travel to arrive on your plate, clearly have an impact on the carbon emissions of the products we consume. Let's look at how much of the food, beverages, and tobacco that Switzerland imports comes from its nearest neighboring countries. The farther a country is, the more carbon emissions it would cost to import that food.

In [None]:
percentage = glimpse()
print(str(round(percentage)) + "% of Switzerland's total imports come from countries within a 1000km radius.")

So we can see that a lot of Switzerland's imports come from nearby countries. We will continue in this vein and look at the origins of each product individually to see the impact these food miles/varying production methods have on their carbon footprint. One thing we will consider is the transport methods, clearly the impact of these food miles differs greatly depending on whether the food is transported by plane or ship...

<img width="400" height="400" src="https://icmattermost.epfl.ch/files/5zr1jyriupfsfgmr4dtg155ssw/public?h=_GPk0xYK1I16gWsY3GuIsrFC5bTb3Ioh4_W3h3oYDs8">

In [None]:
# divide all the values by 1000 to convert to kg CO2e / kg km
transportCO2 = {'Air traffic':0.000733, 'Rail traffic':0.000037, 'Road traffic':0.000303, 'Inland waterways':0.000019}

Transport methods for different commodoties in the USA are described in detail in the paper *Food-Miles and the Relative Climate Impacts of Food Choices in the United States* (Weber and Matthews, 2008). These values, or similar data for other countries worldwide could be used to estimate the transport means for different products and thus the impact this transport has on emissions for each product. Perishable products more frequently have to be transported by air, and therefore have a significantly larger carbon footprint.

---

## Calculating the CO2 emissions by food type based on where it is imported from

We will now use the country-specific and food item-specific data to calculate carbon costs. For each country and food item pair, we will first calculate how much of that food item imported from that country is consumed by Swiss consumers. This relies upon assumptions that we made previously (mentioned here). The two categories this data processing can fall into are:

* If the exported amount of a given food item is less than what is domestically produced, we assume that all of the Swiss produce is exported first before any of the imports start getting exported. That means there will still be some leftover Swiss-produced food available for Swiss consumption, in addition to all of the imported food from all other countries.

* If, however, there is more of a given food item exported than domestically produced, we must transition to the situation where all of the Swiss-produced goods are exported and some of the imported goods are also exported. In this case, we will assume that each country's imports are exported in an equal percentage (e.g. if Switzerland imports 100 kg of bananas from Country X and 50 kg of bananas from Country Y, and the deficit between Switzerland's production and its exports is 10 kg, a fixed percentage of bananas from each country will be assumed to be exported while the remainder is consumed in Switzerland). Hence, the fraction of that food item from that country consumed in Switzerland is (the total amount of that food item imported from that country) / (the total imports across all countries for that food item). This fraction is multiplied by the total amount of that food item consumed in Switzerland to get results by country.

In [None]:
# first manipulate the data to prepare for using it
impex_countries = impex.stack(["type","subtype"]).unstack(['continent','subcontinent','country']).stack(['continent','subcontinent','country']).drop(columns=['exports']).reset_index()
impex_countries['product'] = impex_countries['subtype']
impex_countries = impex_countries.set_index(["type","subtype",'continent','country']).drop(columns = 'subcontinent')

In [None]:
# calculate, for each country and food item pair, how much of that food item imported from that country is consumed
# by Swiss consumers
def find_consump(row):
    # if the exported amount of this food item is less than what is domestically produced, then
    # all imports go towards Swiss consumption (due to earlier assumptions we made about Swiss production
    # being the source of exports before imports being exported)
    if suisse.xs(row['product'], level=1)['domestic_consumption'][0] > 0:
        return row.imports
    else:
        # if everything which is produced in Switzerland is also exported (domestic_consumption == 0), then
        # some of the imports may also be exported, so the amount of imports consumed in Switzerland is taken
        # as the fraction of (the amount imported from that country) / (total imported amount from all countries)
        denominator = suisse.xs(row['product'], level=1)['imports'][0]
        fraction = row.imports/denominator
        return fraction * suisse.xs(row['product'], level=1)['consumption'][0]

# apply the function to all rows of the dataframe
impex_countries['swiss_consumption'] = impex_countries.apply(find_consump, axis=1)

In [None]:
impex_countries.head()

In [None]:
# reset the index to enable easier dataframe manipulation
impex_countries.reset_index(inplace=True)

In [None]:
# for some reason, there is no transport data for Libya. As we can see here, however, there are no rows of data from
# Libya which have a non-zero swiss_consumption, meaning there is nothing imported from Libya which is consumed in 
# Switzerland (it is either all exported or there are no imports).
temp = impex_countries[impex_countries['country'] == 'Libya']
temp = temp[temp['swiss_consumption'] != 0]
temp
# TODO: once the seafood data was incorporated, this is no longer true. Come up with a better explanation

In [None]:
# TODO think if there is a better strategy than just dropping all these countries since they're not in the transport data?

# this is also true for Angola, Eritrea, Sudan, St. Lucia
# so we will drop these countries from the dataframe
impex_countries = impex_countries[impex_countries['country'] != 'Libya']
impex_countries = impex_countries[impex_countries['country'] != 'Angola']
impex_countries = impex_countries[impex_countries['country'] != 'Eritrea']
impex_countries = impex_countries[impex_countries['country'] != 'Sudan']
impex_countries = impex_countries[impex_countries['country'] != 'St Lucia']
# start of countries taken out only for seafood
impex_countries = impex_countries[impex_countries['country'] != 'Seychelles']
impex_countries = impex_countries[impex_countries['country'] != 'Amer. Virgin']
impex_countries = impex_countries[impex_countries['country'] != 'Curaçao']
impex_countries = impex_countries[impex_countries['country'] != 'Greenland']
impex_countries = impex_countries[impex_countries['country'] != 'Guiana, French']
impex_countries = impex_countries[impex_countries['country'] != 'Faeroe Islands']

The next step is to convert the `impex_countries` table, the amount by country and by food item which is eaten in Switzerland, to the carbon costs associated with each country/food item pair. 

In [None]:
# map the categories used by the transport data to the categories used in our dataframe of imports per country and food
dict = {
    "animal_products": "dairy_products",
    "meat": "meat",
    "fruits": "other_fresh_fruits_vegetables",
    "vegetables": "other_fresh_fruits_vegetables",
    "cereals": "cereals",
    "seafood": "fish",
}

## TODO: 
Modify the `swiss_comsumption_transport` function (or do it after the function) to change the potatoes value. We have transport data specifically for potatoes, but all other fruits and vegetables are lumped into one category together. As of now, the potatoes are included with all the other fruits and vegetables.

In [None]:
def swiss_consumption_transport(row):
    # get the fractions for a given country and food product imported by each method of transport
    trans = transport.loc[row['country']][[dict[row['type']]]]
    
    # multiply the fractions by the amount consumed in Switzerland from that country to get kilos imported
    # by each method of transport
    trans['kilos_consumed'] = trans.iloc[:,0] * row['swiss_consumption']
    
    # multiply the carbon cost of each method of transport by the kilos imported by that transport method
    trans = trans.reset_index()
    trans['carbon_cost_per_km'] = trans.iloc[:,2] * trans['mode_of_transport'].map(transportCO2)
    
    # multiply the carbon cost by the km between that country and Switzerland to get total kg of CO2e produced
    trans['carbon_cost'] = trans.iloc[:,3] * countries.loc[row['country']][1]
    # IF YOU WANT DETAILS LATER (E.G. CARBON COST BY TRANSPORT METHOD), GRAB THE DATAFRAME HERE BEFORE THE NEXT STEP
    # return trans
    
    # sum the total carbon cost for this food item coming from this country
    return trans.iloc[:,4].sum()


In [None]:
pd.options.mode.chained_assignment = None # prevent the set with copy warning, which is not applicable in this
                                      # case since we are creating a new column (but pandas triggers the warning anyway)
impex_countries['kg_CO2e'] = impex_countries.apply(swiss_consumption_transport, axis=1)

In [None]:
# create the index again to get back to a multiindex dataframe
impex_countries.set_index(['type', 'subtype', 'continent', 'country'], inplace=True)

In [None]:
# show a slice of the dataframe (that has non-zero values) to demonstrate what data it has
# well, it used to be non-zero before seafood was incorporated at least
impex_countries.iloc[1305:1310]

In [None]:
# here is how to get specific carbon values for groups of items
impex_countries.iloc[impex_countries.index.get_level_values(0) == 'meat'].kg_CO2e.sum()

In [None]:
meta_food_CO2 = impex_countries.kg_CO2e.groupby("type").sum()

In [None]:
plt.figure(figsize=(12, 4))
ax = sns.barplot(
    x=meta_food_CO2.index.get_level_values("type"),
    y=meta_food_CO2.values / 1000,  # kg --> tonnes
)

ax.set(
    title="Carbon cost by food group", xlabel="commodity", ylabel="CO2e (tonnes)"
)
plt.xticks(rotation=90)
sns.despine();

Let's now look at a plot which compares the inherent carbon cost of a food item (that is, its global average emission value for production) with its Swiss-specific carbon cost (the inherent cost plus the transport carbon emissions).

In [None]:
# pull out the emissions for Switzerland specifically from the impex_countries df, and the inherent emissions
# value for each food type from the emissions df
food_CO2 = pd.DataFrame(impex_countries.kg_CO2e.groupby("subtype").sum())
emissions_median = pd.DataFrame(emissions['Median'])
lefton = emissions_median.index.to_numpy()

In [None]:
# merge everything of interest for this plot into one df for easy processing
inherent_and_swiss = emissions_median.merge(food_CO2, right_index=True, left_on = lefton)
inherent_and_swiss.drop(labels="key_0", axis=1, inplace=True)
inherent_and_swiss.head(3)

In [None]:
# make a dictionary to match the various categories and data subsets
matcher = {
    "Inherent CO2e of food item":inherent,
    "Swiss-specific CO2e of food item":swiss_co2,
    "Food item":food_names
}
co2_scores = pd.DataFrame(matcher).round(4)

In [None]:
import plotly.express as px
import plotly.offline as pyo

# Set notebook to work in offline mode
pyo.init_notebook_mode()

fig = px.scatter(
    co2_scores,
    x="Swiss-specific CO2e of food item",
    y="Inherent CO2e of food item",
    hover_name="Food item",
)

fig.update_layout(
    title={
        "text": "Carbon emissions with and without transport to Switzerland",
        "y": 0.95,
        "x": 0.5,
        "xanchor": "center",
        "yanchor": "top",
    },
    #title_font=dict(size=20),
)

fig.show()

In [None]:
# the boring static version
inherent = np.array(inherent_and_swiss['Median'])
swiss_co2 = np.array(inherent_and_swiss['kg_CO2e'])
food_names = np.array(inherent_and_swiss.index)


plt.figure(figsize=(12, 6))
p = sns.scatterplot(x=swiss_co2, y=inherent, s=100, legend=False)

p.set(
    title="Carbon emissions with and without transport to Switzerland",
    xlabel="Swiss CO2 emissions",
    ylabel="Inherent CO2 emissions",
)

for n in range(len(food_names)):
    p.text(
        inherent[n],
        swiss_co2[n],
        food_names[n],
        horizontalalignment="left",
        size="large",
        color="black",
        weight="semibold",
    )