# Introduction

The goal of this initial analysis is to load data about imports, exports, and domestic production of various food types, relevant to Switzerland. Please see the `README` for overall project goals and background information.

Note that in the current state of analysis, only fruits and vegetables have been considered. Functions for processing the data regarding meats, non-meat animal products, and other food groups have already been written, but this data was not yet incorporated into the notebook, for simplicity purposes and initial exploration.

In [None]:
# import external libraries
%matplotlib inline
import collections
import inspect
import pickle
import re

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

%load_ext autoreload
%autoreload 2

In [None]:
# import local dependencies
import sys
sys.path.insert(1, "scripts")

from helpers import *
from plots import *
from impex_data_manipulation import *
from fao_data_manipulation import *
from emissions_data_manipulation import *
from data_analysis import *

The first step is to calculate, for each type of food, how much of what is consumed by the Swiss population is produced within Switzerland versus imported. To do this, we will combine 3 sets of data: imports, exports, and domestic production. The imports and exports data are sourced from [Swiss Impex](https://www.gate.ezv.admin.ch/swissimpex/index.xhtml), a website hosted by the Swiss Federal Customs Administration which provides data on Switzerland's global trade activity. Domestic production data comes from [FAOStat](http://www.fao.org/faostat/en/#data), the Food and Agriculture Organization of the United Nations which offers a variety of agricultural-related data. In theory, the amount of food consumed in Switzerland (including food waste) can be calculated from these datasets:

Food consumed = domestic production + imports - exports

Note that the Swiss Federal Statistics Office also provided relevant data--namely, it has a dataset on Swiss food consumption by type of food. Unfortunately, these data conflicted with the data from FAO and Swiss Impex. Since the Federal Statistics Office data was much less detailed (for instance, more general/broad food categories), we decided to focus on Impex and FAO, knowing that the numbers must be taken with a grain of salt since it is difficult to accurately quantify such data.

Let's load all the data and then combine the various data sets to get the values of interest. First we'll load imports and exports data from Impex. The data is spread across multiple Excel files and sheets and we combine them all in a single dataframe (please refer to code in `scripts/impex_data_manipulation.py`.)

In [None]:
impex = load_impex()
impex.head()

In [None]:
# 1. Select only first row (total), creates a series
# 2. Drop trade values (only consider quantity)
# 3. Drop fourth level (metric)
# 4. Unstack first level (variable) to create a dataframe
impex_total = (
    impex.iloc[0]
    .drop("value", level="metric")
    .droplevel("metric")
    .unstack("indicator")
)
impex_total.head()

Let's plot imports and exports per meta-type:

In [None]:
# 1. Group by meta-type
# 2. Sum the totals
# 3. Unstack the columns to create a Series
impex_total_metatype = impex_total.groupby("type").sum().unstack()

In [None]:
impex_total_metatype

In [None]:
plt.figure(figsize=(12, 4))
ax = sns.barplot(
    x=impex_total_metatype.index.get_level_values("type"),
    y=impex_total_metatype.values / 1000,  # kg --> tonnes
    hue=impex_total_metatype.index.get_level_values("indicator"),
)

ax.set(
    title="Total imports vs. exports", xlabel="commodity", ylabel="quantity (tonnes)"
)
plt.xticks(rotation=90)
sns.despine();

As you can see, Switzerland imports many more fruits, vegetables and meats than it exports. The `animal_products` category is interesting, because import and export quantities are about equal. What would make the most sense is if Switzerland imports different animal products than it exports (rather than the same products being both imported and exported). Let's test this theory by looking at imports and exports for the subcategories of `animal_products`:

In [None]:
impex_total_animal_prods = impex_total.loc["animal_products"].groupby("subtype").sum().unstack()

In [None]:
plt.figure(figsize=(12, 4))
ax = sns.barplot(
    x=impex_total_animal_prods.index.get_level_values("subtype"),
    y=impex_total_animal_prods.values / 1000,  # kg --> tonnes
    hue=impex_total_animal_prods.index.get_level_values("indicator"),
)

ax.set(
    title="Imports vs. exports of animal products", xlabel="commodity", ylabel="quantity (tonnes)"
)
plt.xticks(rotation=90)
sns.despine();

This graph brings more light to the topic. Cheese, a category of which there are many different flavors and consumers like variety, is both heavily imported and exported, likely giving consumers access to a wider variety of cheese types. Eggs, on the other hand, are nearly exclusively imported, while whey is mostly exported.

The butter category of this graph is a good transition to the continuation of the analysis; just because butter is hardly imported nor exported does not mean the Swiss do not eat butter! Rather, imports and exports are only part of the broader picture since domestic production is another important consideration. One possible hypothesis for why butter is neither imported nor exported in large quantities is because domestic production is nearly equal to domestic consumption.

Let's now load the data on domestic production from FAO:

In [None]:
production = load_fao()
production

Next we perform an outer join of `impex_total` and `production` to create a meta-dataframe with all total macroeconomic indicators for each commodity subtype:

In [None]:
suisse = impex_total.join(production, how="outer")
suisse.head()

We have three possibilities for each commodity
1. Neither production or import/export values are given or they all sum up to 0
2. Either only production values are given or only import/export values are given
3. Production, import and export values are all given (and are non-zero)

Commodities fulfilling the first condition will be removed. Commodities fulfilling the second or third condition will be kept.

NB: Missing or 0-valued data may be due to either the values actually being 0 or the data not being collected on these items. Since there is no way of knowing which is the case, we will assume that the values are indeed truly 0 to enable their utilization in the analysis.

In [None]:
# Remove subtypes for which all given quantities sum up to 0
subtypes_no_info = suisse.index[suisse.sum(axis=1) == 0]
suisse.drop(subtypes_no_info, inplace=True)

Next, we shall add columns for `domestic_consumption` and `imported_consumption`.

Definitions:
* **domestic consumption**: goods and services consumed in the country where they are produced
* **imported consumption**: goods and services consumed in the country to which they are imported

**Note**: in this analysis we will make the following simplified assumptions:

* food waste is included as being "consumed"
* exported quantity is first satisfied by available produced quantity, and then imported quantity

In [None]:
# If exports > production, then we say all produced quantity
# is exported and the rest of exported quantity is satisfied by imports:
# 1. Set domestic_consumption to zero
# 2. Set imported_consumption to imports-(exports-production)

# If exports <= production, we say all exports
# is satisfied by domestic production: 
# 1. Set domestic_consumption to production-exports
# 2. Set imported_consumption to imports
suisse["domestic_consumption"] = np.where(
    suisse.exports > suisse.production,
    0,
    suisse.production - suisse.exports
)

suisse["imported_consumption"] = np.where(
    suisse.exports > suisse.production,
    suisse.imports - suisse.exports + suisse.production,
    suisse.imports
)

suisse.head()

***

After finishing what is listed in the cell above, we can go further into animal feed.....

One interesting aspect of meat and non-meat animal product production, which is not relevant to the other food groups, is that animal feed may be sourced from a different location than where the meat/animal product is produced. This implicates the sourcing of animal feed as a very important factor for the carbon intensity of these foods. For example, if Switzerland produced most of its meat domestically, but it imported all of its feed, the carbon intensity of its meat would be much higher than if the feed were grown domestically. Thus, both aspects of the final food product must be examined.

Next step will be to examine Swiss production, imports, and exports of *animal feed* (coming up in the next milestone!)

## Emissions


Now that we have calculated Switzerland's production and consumption, we want to have a look at how this translates in terms of equivalent CO$_2$ production. To do this, we could look at Swiss-specific values for the greenhouse gas production for different food types, however data from other countries is very sparse and reliable data for many food types is only available for certain countries. Later in the analysis, we want to compare Swiss production emissions with Swiss import emissions, therefore for consistency we need to either have data available for all (or most) countries for a given product, or use global averages. Comprehensive data is available on meat and cereal production emissions worldwide, however fruit and vegetable figures are much harder to obtain and only a limited number of studies have been carried out. These studies have been accumulated in a systematic review (https://www.sciencedirect.com/science/article/pii/S0959652616303584) and these values have been used to calculate averages for a number of different kinds of produce. To ensure our results are consistent, we will use this generalisation of global average values to estimate the domestic Swiss emissions for fruit and vegetables.

<!---
Other thought: I've currently created a dictionary to map the different fruit/veg to the impex categories. Was just thinking, this could be a potential spot to use machine learning, although admittedly not a very useful one...
--->

In [None]:
emissions = load_emissions()
emissions.head()

The food names in the above dataframe need to be mapped to the format of the production, imports and exports dataframe above. Taking an example dataframe containing consumption within Switzerland of Swiss produced products (Domestic Consumption), consumption of imported products (Imported Consumption) and Total Consumption (in thousand tonnes per year), we can use the above values to estimate the emissions resulting from the production of food consumed in Switzerland.

In [None]:
pretend = {"Product":["Onion","Beetroot"], "Domestic Consumption":[1000,2000], "Imported Consumption":[2000,4000], "Total Consumption":[3000,6000]}
domestic = pd.DataFrame.from_dict(pretend)

So assuming that domestic transport is negligible, we can estimate the equivalent CO$_2$ emissions for each product type, using the global average values.

In [None]:
# TODO update this code to work with merged dataframe above + include all categories
domestic = estimate_emissions(domestic, emissions)

In [None]:
domestic

The final column of the above table shows the total CO$_2$ equivalent that would be produced if everything that was consumed in Switzerland was domestically produced, i.e. no transport emissions were considered and Swiss-specific CO$_2$ emissions were used for meat and cereal production. Evidently, it is not possible to produce everything that a current consumer buys locally, so in the following analysis we will consider the effect that these imported products and 'food miles' have on the CO$_2$ emissions resulting from Swiss consumption. 

The following will describe initial observations we have made regarding the transport of food, global carbon emissions intensities, and how we plan to use this information to estimate the impact consumption of different foods in Switzerland has and how a Swiss consumer can minimise their environmental impact in their food choices.

## Further Analysis

'Food miles', or the distance that food has to travel to arrive on your plate, clearly have an impact on the carbon emissions of the products we consume. Let's look at how much of the food, beverages, and tobacco that Switzerland imports comes from its nearest neighboring countries. The farther a country is, the more carbon emissions it would cost to import that food.

In [None]:
percentage = glimpse()
print(str(round(percentage)) + "% of Switzerland's total imports come from countries within a 1000km radius.")

So we can see that a lot of Switzerland's imports come from nearby countries. We will continue in this vein and look at the origins of each product individually to see the impact these food miles/varying production methods have on their carbon footprint. One thing we will consider is the transport methods, clearly the impact of these food miles differs greatly depending on whether the food is transported by plane or ship...

<img width="400" height="400" src="https://icmattermost.epfl.ch/files/5zr1jyriupfsfgmr4dtg155ssw/public?h=_GPk0xYK1I16gWsY3GuIsrFC5bTb3Ioh4_W3h3oYDs8">

Transport methods for different commodoties in the USA are described in detail in the paper *Food-Miles and the Relative Climate Impacts of Food Choices in the United States* (Weber and Matthews, 2008). These values, or similar data for other countries worldwide could be used to estimate the transport means for different products and thus the impact this transport has on emissions for each product. Perishable products more frequently have to be transported by air, and therefore have a significantly larger carbon footprint.