# ER131: Carbon Footprint Prediction
Fall 2020

In this cell, give an alphabetical (by last name) list of student group members.  Beside each student's name, provide a description of each student's contribution to the project.

`Henry Hao`  
- asdf

`Michelle Kim`  
- asdf

`Vincent Lao`  
- asdf

`Myoung-Jun Park`  
- asdf

## Basic Project Requirements (delete this markdown cell in your final submission)

**How to use this notebook**:  This notebook is the template for your semester project.  Each markdown cell provides instructions on what to do in order to complete a successful project.  The cell you're reading right now is the only one you can delete from what you eventually hand in.  For the other cells:
1. You may replace the instructions in each cell with your own work but do not edit the cell titles (with the exception of the project title, above).  
2. Follow the instructions in each section carefully.  For some sections you will enter only markdown text in the existing cells. For other sections, you'll accompany the markdown cells with additional code cells, and perhaps more markdown, before moving on to the next section.  

**Grading**.  You'll see point allocations listed in each of the section titles below.  In addition, there are other categories for points: 
1. Visualization (10 points).  Plots should be well organized, legible, labelled, and well-suited for the question they are being used to answer or explore.  
2. Clarity (5 points). Note that clarity also supports points elsewhere, because if we can't understand what you're explaining, we'll assume you didn't understand what you were doing and give points accordingly!  

For each Section or Category, we will give points according to the following percentage scale:
1. More than 90%:  work that is free of anything but superficial mistakes, and demonstrates creativity and / or a very deep understanding of what you are doing.
2. 80-90%: work without fundamental errors and demonstrates a basic understanding of what you're doing.
3. 60-80%: work with fundamental flaws in the analysis and / or conveys that you do not understand the basics of the work you are trying to do.
4. Below 60%: Work that is severely lacking or incomplete.  

Note that we distinguish *mistakes* from *"my idea didn't work"*.  Sometimes you don't know if you can actually do the thing you're trying to do and as you dig in you find that you can't.  That doesn't necessarily mean you made a mistake; it might just mean you needed more information.  We'll still give high marks to ambitious projects that "fail" at their stated objective, as long as that objective was clear and you demonstrate an understanding of what you were doing and why it didn't work.

**Number of prediction questions:**  The number of prediction questions must be greater than or equal to the number of students in the team minus one.  (A 4 person team would need to explore 4-1 = 3 questions.)  Questions should be related, but have distinct work efforts, interpretation and analysis. An example: for land use regression, you could have a core prediction question (what is pollution concentration on a fine spatial scale), a supporting question that explore how the degree of spatial aggregation influences prediction quality, plus a prediction model that explores *temporal* prediction at one point in space.  There is a lot of flexibility here; if you have any doubt about whether your questions are distinct, consult with the instructors.

**Data requirements**:  Projects must use data from a minimum of $1+N_s$ different sources, where $N_s$ is the number of students in the group.  You should merge at least two data sets. </font>

**Advice on Project Topics**:  We want you to do a project that relates to energy and environment topics.  

**Suggested data sets**: If you choose not to work on a client projets, here are some ideas for data starting points. You can definitely bring your own data to the table!
1. [Purple Air](https://www.purpleair.com) Instructions on how to download PurpleAir data are [here](https://docs.google.com/document/d/15ijz94dXJ-YAZLi9iZ_RaBwrZ4KtYeCy08goGBwnbCU/edit).
2. California Enviroscreen database.  Available [here].(https://oehha.ca.gov/calenviroscreen/report/calenviroscreen-30) 
3. Several data sets available from the UC Irvine machine learning library:
    1. [Forest Fires](https://archive.ics.uci.edu/ml/datasets/Forest+Fires)
    4. [Climate](https://archive.ics.uci.edu/ml/datasets/Greenhouse+Gas+Observing+Network)
    5. [Ozone](https://archive.ics.uci.edu/ml/datasets/Ozone+Level+Detection)
4. California Solar Initiative data (installed rooftop solar systems).  Available [here](https://www.californiasolarstatistics.ca.gov/data_downloads/).
5. World Bank Open Data, available [here](https://data.worldbank.org).
6. California ISO monitored emissions data, [here](http://www.caiso.com/TodaysOutlook/Pages/Emissions.aspx).
7. Energy Information Administration Residential Energy Consumption Survey, [here] (https://www.eia.gov/consumption/residential/data/2015/) 

## Abstract (5 points)
Although this section comes first, you'll write it last.  It should be a ~250 word summary of your project.  1/3rd of the abstract should provide background, 1/3rd should explain what you did, and 1/3rd should explain what you learned.

## Project Background (5 points)
In this section you will describe relevant background for your project.  It should give enough information that a non-expert can understand in detail the history and / or context of the system or setting you wish to study, the need for quantitative analysis, and, broadly, what impact a quantitative analyses could have on the system.  Shoot for 500 words here.

## Project Objective (5 points)
The objectives of our project are:

- To predict an individual's carbon footprint based on a simplified version (ie. fewer features) of [CoolClimate's Carbon Footprint calculator](https://coolclimate.org/calculator) for _______ applications -- for example: for which counties might it be beneficial to run recycling campaigns or plant-based diet campaigns. 
- To predict an individual's carbon footprint based on relative diet composition for ____ applications -- for example: for which counties might it be beneficial to run recycling campaigns or plant-based diet campaigns. 
- To predict which counties might have large carbon footprint for policy applications -- for example: for which counties might it be beneficial to run recycling campaigns or plant-based diet campaigns. 



The goal for the first objective is to explore different ways people can calculate their carbon footprint. CoolClimate's Carbon Footprint calculator is extremely detailed, and because of this, it can ask for information that not many people might know. For example, one input to the calculator is air-travel miles and public-transit miles; this isn't really something that a person (especially busy college students) could find quickly. 

The point in exploring the second objective is improve the inputs to the carbon calculator by changing the perspective of the individual from asking: "how many servings of vegetables do I eat per day?" to asking "vegetables make up ____ percentage of my diet." The team feels it is easier for someone to think of their diet in terms of relative percentages than servings. 

The goal for the third and last objective is to explore the resource allocation of which counties to fund to run a recycling education program or perhaps even a public health program that behooves county residents to eat a plant-based diet for health and the environment.



## Input Data Description, Data Cleaning, Data Summary & EDA

We separated this into distinct notebooks, one for each dataset.

1. [Kammen Dataset](code/eda/cool-climate-data-vl-hh.ipynb) - Henry Hao
2. 
3. [nu3 Food Dataset](code/eda/food-nu3-data-cleaning-eda.ipynb) - Vincent Lao
4. 
5. 

## Forecasting and Prediction Modeling, Interpretation and Conclusions

We also separated these into three notebooks.

1. [County Dataset Models](code/model/county-carbon-footprint-models.ipynb)

2. [Household Dataset Models](code/model/household-carbon-footprint-models.ipynb)

3. [nu3 Food Dataset Models](code/model/food-carbon-footprint-models.ipynb) - Vincent Lao

---

Thank you so much for looking through our project! Thank you for the great semester, Prof. Callaway, Jessica, Sindhu, and anyone else involved in the making of this class. Happy holidays!