# Goals for the upcoming milestone: focus on Switzerland
* Load all relevant FAO and Impex data, and justify why we used these sources (as opposed to the Federal Statistics Office data, which gives related numbers but does not match up with what is provided by FAO and Impex)
* From the following article, copy the emissions intensities by food group world averages data into an excel file, and load that data (we will use the global average for all countries, since by country is not available for all types of food): https://www.sciencedirect.com/science/article/pii/S0959652616303584
* How much does Switzerland consume? -- calculate from imports, exports, and domestic production
* How much does Switzerland produce vs. supply its needs with imports?
* Update the README

# Introduction

There will be some text here to introduce the notebook, but I'm not sure what will go in the README vs here, so this is just a placeholder for now :)

**Assumptions of this data analysis:**
* Using global average of food production emissions for every single country, which is not at all realistic
* Only dealing with major food groups to simplify the analysis (for example, not looking at nuts, oils, etc--just fruits, vegetables, grains, meats, and non-meat animal products)
* etc

In [None]:
# import external libraries
%matplotlib inline
import collections
import inspect
import pickle
import re

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

%load_ext autoreload
%autoreload 2

In [None]:
# import local dependencies
from scripts.helpers import *
from scripts.plots import *
from scripts.impex_data_manipulation import *
from scripts.fao_data_manipulation import *
from scripts.emissions_data_manipulation import *
from scripts.data_analysis import *

In [None]:
impex = load_impex()

In [None]:
impex.head()

In [None]:
impex_total = pd.DataFrame(impex.iloc[0]).drop("value", level=3).droplevel([0, 3])
# impex_total.index = impex_total.index.set_names(["subtype", "metric"])
impex_total.reset_index(inplace=True)
impex_total.head()

In [None]:
data = impex_total.copy()
data.head()

data.Total = data.Total / 1000

plt.figure(figsize=(12, 4))
ax = sns.barplot(x="subtype", y="Total", hue="impex", data=data)

ax.set(
    title="Total imports vs. exports",
    xlabel="Commodity",
    ylabel="Quantity (tonnes)"
)
plt.xticks(rotation=90)
sns.despine();

***
**Exploratory data analysis**
***

The first step is to calculate, for each type of food, how much of what is consumed by the Swiss population is produced within Switzerland versus imported. To do this, we will combine 3 sets of data: imports, exports, and domestic production. The imports and exports data are sourced from [Swiss Impex](https://www.gate.ezv.admin.ch/swissimpex/index.xhtml), a website hosted by the Swiss Federal Customs Administration which provides data on Switzerland's global trade activity. Domestic production data comes from [FAOStat](http://www.fao.org/faostat/en/#data), the Food and Agriculture Organization of the United Nations which offers a variety of agricultural-related data. In theory, the amount of food consumed in Switzerland (including food waste) can be calculated from these datasets:

Food consumed = domestic production + imports - exports

Note that the Swiss Federal Statistics Office also provided relevant data--namely, it has a dataset on Swiss food consumption by type of food. Unfortunately, these data conflicted with the data from FAO and Swiss Impex. Since the Federal Statistics Office data was much less detailed (for instance, more general/broad food categories), we decided to focus on Impex and FAO, knowing that the numbers must be taken with a grain of salt since it is difficult to accurately quantify such data.

Let's load all the data and then combine the various data sets to get the values of interest.

In [None]:
# only fruits and vegetables for now
fao = load_fao()

In [None]:
fao.head()

In [None]:
# mega = fao.join(impex_total, how="outer").sort_values("subtype")
mega = pd.concat([impex_total, fao]).sort_values("subtype")

In [None]:
# Goal for final milestone: 
# Clean this dataframe and add consumption info:
# 1. domestically produced and consumed
# 2. imported and consumed
mega

# TODO: download and load here domestic production of meats, animal products

# TODO: download and then load here all datasets for other food types (meat, animal products, grains/cereals)

Then, the idea going from here onwards: merge the various datasets so that we eventually have one dataframe with each type of food (e.g. bananas, potatoes, etc.) and one column for how much is grown domestically and not exported, and another column for how much is imported. Maybe also do some pretty visualizations, like make a ratio of those two columns and then plot to see if there are any identifiable clusters?

***

After finishing what is listed in the cell above, we can go further into animal feed.....

One interesting aspect of meat and non-meat animal product production, which is not relevant to the other food groups, is that animal feed may be sourced from a different location than where the meat/animal product is produced. This implicates the sourcing of animal feed as a very important factor for the carbon intensity of these foods. For example, if Switzerland produced most of its meat domestically, but it imported all of its feed, the carbon intensity of its meat would be much higher than if the feed were grown domestically. Thus, both aspects of the final food product must be examined.

Let's now examine Swiss production, imports, and exports of *animal feed*. 

In [None]:
%psource load_imported_feed

In [None]:
imported_feed = load_imported_feed()
imported_feed.head()

# TODO: download and load here the datasets for animal feed exported and grown domestically

In [None]:
ratio_nearby_feed = compute_nearby_imports_ratio(imported_feed)
print("{0:.1%} of imported feed comes from nearby countries".format(ratio_nearby_feed))

Nearly three-quarters of imported animal feed is imported from nearby countries, which means when Switzerland must import feed, they mostly minimize carbon emissions from the process by reducing the travel distance of the feed. More importantly, a report from the State Secretariat for Economic Affairs ("Concentrate Animal Feed as an Input Good in Swiss Agricultural Production - The Effects of Border Protection and Other Support Measures") claims 90% of animal feed used in Switzerland is domestically produced, drastically reducing the carbon impact of the meat industry in the country. This percentage can be found in [this PDF document](https://www.sbv-usp.ch/fileadmin/sbvuspch/00_Bilder/06_Services/Agristat/Statistiken/Produktionsmittel__Umwelt/SES2018_Kap04_Produktionsmittel-Umwelt.pdf) (see the table on page 8):

We will need to pay attention to where food is coming from by animal type. For example, 90% of cow and cattle feed is domestically produced, whereas that number is only 52% for pork and 30% for poultry (found in the Concentrate Animal Feed report above, where it was sourced from the PDF linked above). So even if chickens are less carbon intensive, they require more imported feed which might increase their carbon output!

As we can see from the above table, Switzerland imported 77 million kg of meat and edible offal in 2018 (equivalent to 77 thousand tonnes).

## Emissions


Now that we have calculated Switzerland's production and consumption, we want to have a look at how this translates in terms of equivalent CO$_2$ production. To do this, we could look at Swiss-specific values for the greenhouse gas production for different food types, however data from other countries is very sparse and reliable data for many food types is only available for certain countries. Later in the analysis, we want to compare Swiss production emissions with Swiss import emissions, therefore for consistency we need to either have data available for all (or most) countries for a given product, or use global averages. Comprehensive data is available on meat and cereal production emissions worldwide, however fruit and vegetable figures are much harder to obtain and only a limited number of studies have been carried out. These studies have been accumulated in a systematic review (https://www.sciencedirect.com/science/article/pii/S0959652616303584) and these values have been used to calculate averages for a number of different kinds of produce. To ensure our results are consistent, we will use this generalisation of global average values to estimate the domestic Swiss emissions for fruit and vegetables.

<!---
Other thought: I've currently created a dictionary to map the different fruit/veg to the impex categories. Was just thinking, this could be a potential spot to use machine learning, although admittedly not a very useful one...
--->

In [None]:
emissions = load_emissions()
emissions.head()

The food names in the above dataframe need to be mapped to the format of the production, imports and exports dataframe above. Taking an example dataframe containing consumption within Switzerland of Swiss produced products (Domestic Consumption), consumption of imported products (Imported Consumption) and Total Consumption (in thousand tonnes per year), we can use the above values to estimate the emissions resulting from the production of food consumed in Switzerland.

In [None]:
pretend = {"Product":["Onion","Beetroot"], "Domestic Consumption":[1000,2000], "Imported Consumption":[2000,4000], "Total Consumption":[3000,6000]}
domestic = pd.DataFrame.from_dict(pretend)

So assuming that domestic transport is negligible, we can estimate the equivalent CO$_2$ emissions for each product type, using the global average values.

In [None]:
# TODO update this code to work with merged dataframe above + include all categories
domestic = estimate_emissions(domestic, emissions)

In [None]:
domestic

The final column of the above table shows the total CO$_2$ equivalent that would be produced if everything that was consumed in Switzerland was domestically produced, i.e. no transport emissions were considered and Swiss-specific CO$_2$ emissions were used for meat and cereal production. Evidently, it is not possible to produce everything that a current consumer buys locally, so in the following analysis we will consider the effect that these imported products and 'food miles' have on the CO$_2$ emissions resulting from Swiss consumption. 

The following will describe initial observations we have made regarding the transport of food, global carbon emissions intensities, and how we plan to use this information to estimate the impact consumption of different foods in Switzerland has and how a Swiss consumer can minimise their environmental impact in their food choices.

## Further Analysis

'Food miles', or the distance that food has to travel to arrive on your plate, clearly have an impact on the carbon emissions of the products we consume. Let's look at how much of the food, beverages, and tobacco that Switzerland imports comes from its nearest neighboring countries. The farther a country is, the more carbon emissions it would cost to import that food.

In [None]:
percentage = glimpse()
print(str(round(percentage)) + "% of Switzerland's total imports come from countries within a 1000km radius.")

So we can see that a lot of Switzerland's imports come from nearby countries. We will continue in this vein and look at the origins of each product individually to see the impact these food miles/varying production methods have on their carbon footprint. One thing we will consider is the transport methods, clearly the impact of these food miles differs greatly depending on whether the food is transported by plane or ship...

In [None]:
from IPython.display import Image
Image(url = "https://icmattermost.epfl.ch/files/5zr1jyriupfsfgmr4dtg155ssw/public?h=_GPk0xYK1I16gWsY3GuIsrFC5bTb3Ioh4_W3h3oYDs8", width=400,height=300)

Transport methods for different commodoties in the USA are described in detail in the paper *Food-Miles and the Relative Climate Impacts of Food Choices in the United States* (Weber and Matthews, 2008). These values, or similar data for other countries worldwide could be used to estimate the transport means for different products and thus the impact this transport has on emissions for each product. Perishable products more frequently have to be transported by air, and therefore have a significantly larger carbon footprint.