# Goals for the upcoming milestone: focus on Switzerland
* Load all relevant FAO and Impex data, and justify why we used these sources (as opposed to the Federal Statistics Office data, which gives related numbers but does not match up with what is provided by FAO and Impex)
* From the following article, copy the emissions intensities by food group world averages data into an excel file, and load that data (we will use the global average for all countries, since by country is not available for all types of food): https://www.sciencedirect.com/science/article/pii/S0959652616303584
* How much does Switzerland consume? -- calculate from imports, exports, and domestic production
* How much does Switzerland produce vs. supply its needs with imports?
* Update the README

# Introduction

There will be some text here to introduce the notebook, but I'm not sure what will go in the README vs here, so this is just a placeholder for now :)

**Assumptions of this data analysis:**
* Using global average of food production emissions for every single country, which is not at all realistic
* Only dealing with major food groups to simplify the analysis (for example, not looking at nuts, oils, etc--just fruits, vegetables, grains, meats, and non-meat animal products)
* etc

In [1]:
# import external libraries
%matplotlib inline
import collections
import inspect
import pickle
import re

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

%load_ext autoreload
%autoreload 2

In [2]:
# import local dependencies
from scripts.helpers import *
from scripts.plots import *
from scripts.impex_data_manipulation import *
from scripts.fao_data_manipulation import *
from scripts.emissions_data_manipulation import *
from scripts.data_analysis import *

In [3]:
impex = load_impex()

In [4]:
impex.head()

Unnamed: 0_level_0,fruits,fruits,fruits,fruits,fruits,fruits,fruits,fruits,fruits,fruits,...,animal_products,animal_products,animal_products,animal_products,animal_products,animal_products,animal_products,animal_products,animal_products,animal_products
Unnamed: 0_level_1,plantains_and_others,plantains_and_others,plantains_and_others,plantains_and_others,bananas,bananas,bananas,bananas,dates,dates,...,cheese,cheese,eggs,eggs,eggs,eggs,honey,honey,honey,honey
Unnamed: 0_level_2,imports,imports,exports,exports,imports,imports,exports,exports,imports,imports,...,exports,exports,imports,imports,exports,exports,imports,imports,exports,exports
Unnamed: 0_level_3,quantity,value,quantity,value,quantity,value,quantity,value,quantity,value,...,quantity,value,quantity,value,quantity,value,quantity,value,quantity,value
total,1444222.0,1771349.0,2.0,3.0,92397628.0,105810538.0,36763.0,50822.0,2706334.0,15719362.0,...,67285451.0,601005711.0,36298734.0,75561953.0,442894.0,1462812.0,8191947.0,36137446.0,685704.0,7251815.0
Argentina,396.0,400.0,0.0,0.0,,,,,,,...,,,23700.0,122918.0,0.0,0.0,1407193.0,3722676.0,0.0,0.0
Bangladesh,130.0,349.0,0.0,0.0,,,,,,,...,,,,,,,,,,
Brazil,1229.0,1473.0,0.0,0.0,1301.0,3662.0,0.0,0.0,,,...,17892.0,245202.0,0.0,0.0,0.0,0.0,11907.0,43965.0,0.0,0.0
Cameroon,1529.0,2302.0,0.0,0.0,65786.0,164587.0,0.0,0.0,,,...,,,,,,,,,,


In [6]:
impex_total = pd.DataFrame(impex.iloc[0]).drop("value", level=3).droplevel([0, 3])
# impex_total.index = impex_total.index.set_names(["subtype", "metric"])
impex_total.reset_index(inplace=True)
impex_total.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,total
subtype,metric,Unnamed: 2_level_1
plantains_and_others,imports,1444222.0
plantains_and_others,exports,2.0
bananas,imports,92397628.0
bananas,exports,36763.0
dates,imports,2706334.0


In [None]:
data = impex_total.copy()
data.head()

data.Total = data.Total / 1000

plt.figure(figsize=(12, 4))
ax = sns.barplot(x="subtype", y="Total", hue="impex", data=data)

ax.set(
    title="Total imports vs. exports",
    xlabel="Commodity",
    ylabel="Quantity (tonnes)"
)
plt.xticks(rotation=90)
sns.despine();

***
**Exploratory data analysis**
***

The first step is to calculate, for each type of food, how much of what is consumed by the Swiss population is produced within Switzerland versus imported. To do this, we will combine 3 sets of data: imports, exports, and domestic production. The imports and exports data are sourced from [Swiss Impex](https://www.gate.ezv.admin.ch/swissimpex/index.xhtml), a website hosted by the Swiss Federal Customs Administration which provides data on Switzerland's global trade activity. Domestic production data comes from [FAOStat](http://www.fao.org/faostat/en/#data), the Food and Agriculture Organization of the United Nations which offers a variety of agricultural-related data. In theory, the amount of food consumed in Switzerland (including food waste) can be calculated from these datasets:

Food consumed = domestic production + imports - exports

Note that the Swiss Federal Statistics Office also provided relevant data--namely, it has a dataset on Swiss food consumption by type of food. Unfortunately, these data conflicted with the data from FAO and Swiss Impex. Since the Federal Statistics Office data was much less detailed (for instance, more general/broad food categories), we decided to focus on Impex and FAO, knowing that the numbers must be taken with a grain of salt since it is difficult to accurately quantify such data.

Let's load all the data and then combine the various data sets to get the values of interest.

In [7]:
# only fruits and vegetables for now
fao = load_fao()

In [17]:
fao.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,total
subtype,metric,Unnamed: 2_level_1
agave_fibres_nes,production,0.0
almonds_with_shell,production,0.0
anise_badian_fennel_coriander,production,0.0
apples,production,225622000.0
apricots,production,9420000.0


In [22]:
# mega = fao.join(impex_total, how="outer").sort_values("subtype")
mega = pd.concat([impex_total, fao]).sort_values("subtype")

In [23]:
# Goal for final milestone: 
# Clean this dataframe and add consumption info:
# 1. domestically produced and consumed
# 2. imported and consumed
mega

Unnamed: 0_level_0,Unnamed: 1_level_0,total
subtype,metric,Unnamed: 2_level_1
agave_fibres_nes,production,0.0
almonds_with_shell,production,0.0
anise_badian_fennel_coriander,production,0.0
apples,exports,3027636.0
apples,production,225622000.0
apples,imports,12116858.0
apricots,production,9420000.0
apricots,imports,11934233.0
apricots,exports,7492.0
areca_nuts,production,0.0


# TODO: download and load here domestic production of meats, animal products

# TODO: download and then load here all datasets for other food types (meat, animal products, grains/cereals)

Then, the idea going from here onwards: merge the various datasets so that we eventually have one dataframe with each type of food (e.g. bananas, potatoes, etc.) and one column for how much is grown domestically and not exported, and another column for how much is imported. Maybe also do some pretty visualizations, like make a ratio of those two columns and then plot to see if there are any identifiable clusters?

***

After finishing what is listed in the cell above, we can go further into animal feed.....

One interesting aspect of meat and non-meat animal product production, which is not relevant to the other food groups, is that animal feed may be sourced from a different location than where the meat/animal product is produced. This implicates the sourcing of animal feed as a very important factor for the carbon intensity of these foods. For example, if Switzerland produced most of its meat domestically, but it imported all of its feed, the carbon intensity of its meat would be much higher than if the feed were grown domestically. Thus, both aspects of the final food product must be examined.

Let's now examine Swiss production, imports, and exports of *animal feed*. 

In [None]:
%psource load_imported_feed

In [None]:
imported_feed = load_imported_feed()
imported_feed.head()

# TODO: download and load here the datasets for animal feed exported and grown domestically

In [None]:
ratio_nearby_feed = compute_nearby_imports_ratio(imported_feed)
print("{0:.1%} of imported feed comes from nearby countries".format(ratio_nearby_feed))

Nearly three-quarters of imported animal feed is imported from nearby countries, which means when Switzerland must import feed, they mostly minimize carbon emissions from the process by reducing the travel distance of the feed. More importantly, a report from the State Secretariat for Economic Affairs ("Concentrate Animal Feed as an Input Good in Swiss Agricultural Production - The Effects of Border Protection and Other Support Measures") claims 90% of animal feed used in Switzerland is domestically produced, drastically reducing the carbon impact of the meat industry in the country. This percentage can be found in [this PDF document](https://www.sbv-usp.ch/fileadmin/sbvuspch/00_Bilder/06_Services/Agristat/Statistiken/Produktionsmittel__Umwelt/SES2018_Kap04_Produktionsmittel-Umwelt.pdf) (see the table on page 8):

We will need to pay attention to where food is coming from by animal type. For example, 90% of cow and cattle feed is domestically produced, whereas that number is only 52% for pork and 30% for poultry (found in the Concentrate Animal Feed report above, where it was sourced from the PDF linked above). So even if chickens are less carbon intensive, they require more imported feed which might increase their carbon output!

As we can see from the above table, Switzerland imported 77 million kg of meat and edible offal in 2018 (equivalent to 77 thousand tonnes).

## Emissions


Now that we have calculated Switzerland's production and consumption, we want to have a look at how this translates in terms of equivalent CO$_2$ production. To do this, we could look at Swiss-specific values for the greenhouse gas production for different food types, however data from other countries is very sparse and reliable data for many food types is only available for certain countries. Later in the analysis, we want to compare Swiss production emissions with Swiss import emissions, therefore for consistency we will use the global average values to estimate the domestic Swiss emissions.

Thought: data for meat and cereals is available for every country through FAO Stat, so do we use this when considering these categories, especially as the difference in CO$_2$ production in different countries is so marked. We could just use these global averages for a generalisation for fruit/veg due to lack of reliable information.

Other thought: I've currently created a dictionary to map the different fruit/veg to the impex categories. Was just thinking, this could be a potential spot to use machine learning, although admittedly not a very useful one...

TODO: I have to go, because I'm meant to be working in the lab all day today. In the lunch break, I will create a function in the helper file to open these files + create new category, and then I will start to use this info to calculate Swiss emissions...

In [None]:
fruit_veg = {}
with open("../data/categories.txt") as f:
    for line in f:
        (key, val) = line.split('\t')
        fruit_veg[key] = val.strip('\n')

In [None]:
emissions = pd.read_excel(r'../data/food_emissions.xlsx')
emissions['Category'] = emissions.Name.map(fruit_veg)
emissions.set_index('Name', inplace=True)

In [None]:
emissions

emissions = load_emissions()
emissions.head()

So just going to create an imaginary dataframe, of Swiss consumption of domestically produced products (thousand tonnes per year), and when we have actual values I'll adapt my code to match your real dataframes.

In [None]:
pretend = {"Product""Onion","Beetroot"], "Domestic Consumption":[1000,2000], "Imported Consumption":[2000,4000], "Total Consumption":[3000,6000]}
domestic = pd.DataFrame.from_dict(pretend)

So assuming that domestic transport costs are negligible (?), we can estimate the equivalent CO$_2$ emissions for each product type, using the global average values.

# TODO update this code to work with merged dataframe, plus make the indices work somehow

In [None]:
domestic = estimate_emissions(domestic, emissions)

The final column of the above table shows the total CO$_2$ equivalent that would be produced if everything that was consumed in Switzerland was domestically produced, i.e. no transport emissions were considered and Swiss-specific CO$_2$ emissions were used for meat and cereal production. Evidently, it is not possible to produce everything that a current consumer buys locally, so in the following analysis we will consider the effect that these imported products and 'food miles' have on the CO$_2$ emissions resulting from Swiss consumption. 

The following will describe initial observations we have made regarding the transport of food, global carbon emissions intensities, and how we plan to use this information to estimate the impact consumption of different foods in Switzerland has and how a Swiss consumer can minimise their environmental impact in their food choices.

## Further Analysis

'Food miles', or the distance that food has to travel to arrive on your plate, clearly have an impact on the carbon emissions of the products we consume. Let's look at how much of the food, beverages, and tobacco that Switzerland imports comes from its nearest neighboring countries. The farther a country is, the more carbon emissions it would cost to import that food.

percentage = glimpse()
print(str(round(percentage)) + "% of Switzerland's total imports come from countries within a 1000km radius.")

So we can see that a lot of Switzerland's imports come from nearby countries. We will continue in this vein and look at the origins of each product individually to see the impact these food miles/varying production methods have on their carbon footprint. One thing we will consider is the transport methods, clearly the impact of these food miles differs greatly depending on whether the food is transported by plane or ship...