# COGS 108 - Project Proposal

## Authors

- Jin Choi
- Sujin Kim
- Rowoon Lee
- Yechan Park
- Idhant Kumar

## Research Question

Is there a statistically significant correlation between the global production rate of plastics and key indicators of global warming, such as atmospheric CO₂ concentration, fossil fuel consumption, and global average temperature anomalies?


## Background and Prior Work

The rapid growth of global plastic production over recent decades has raised increasing concern due to its environmental and climate impacts. Since the 1950s, plastic production has increased from almost zero to hundreds of millions of metric tons per year, largely driven by industrialization and rising consumer demand [1]. Because most plastics are produced from fossil fuels, their manufacturing and disposal require large amounts of energy and result in greenhouse gas emissions. As a result, plastic production is closely connected to broader patterns of fossil fuel use and industrial activity.
Long-term observations show that atmospheric carbon dioxide (CO₂) concentrations have risen steadily since the late 1950s. Measurements collected by the National Oceanic and Atmospheric Administration (NOAA) indicate that current CO₂ levels are significantly higher than pre-industrial values and continue to increase each year [2]. This rise is mainly caused by the widespread burning of fossil fuels and other human activities, and it is strongly linked to global warming. Increasing CO₂ concentrations are associated with other major climate indicators, including rising global temperature anomalies, reflecting an enhanced greenhouse effect in Earth’s atmosphere.
Scientific assessments by the Intergovernmental Panel on Climate Change (IPCC) provide strong evidence that human-driven increases in greenhouse gases have caused widespread warming across the climate system. The IPCC Sixth Assessment Report explains that excess heat trapped by greenhouse gases has been absorbed largely by the oceans since the mid-20th century, leading to accelerating glacier and ice-sheet melt and rising global sea levels [3]. These observed changes closely follow long-term increases in fossil fuel consumption and industrial production, suggesting that other fossil-fuel-intensive activities may exhibit similar relationships with climate indicators.
Global temperature records further support this warming trend. Data from NASA’s Goddard Institute for Space Studies show that recent decades are significantly warmer than the mid-20th century average, indicating a clear and persistent rise in global temperature anomalies [4]. Because plastics are derived from fossil fuels and contribute to greenhouse gas emissions throughout their lifecycle, examining the statistical relationship between global plastic production and key climate indicators. Atmospheric CO₂ concentrations and global temperature anomalies can help clarify how industrial production aligns with observed global warming trends.

References
1. Our World in Data. Global Plastics Production. https://ourworldindata.org/grapher/global-plastics-production
2. NOAA Global Monitoring Laboratory. Trends in Atmospheric Carbon Dioxide. https://gml.noaa.gov/ccgg/trends/
3. IPCC. (2021). Climate Change 2021: The Physical Science Basis, Chapter 9. https://www.ipcc.ch/report/ar6/wg1/chapter/chapter-9/
4. NASA Goddard Institute for Space Studies. GISTEMP Surface Temperature Analysis. https://data.giss.nasa.gov/gistemp/


## Hypothesis


We hypothesize that there is a statistically significant positive correlation between global plastic production levels and indicators of global warming, including atmospheric CO₂ concentration, fossil fuel consumption, and global average temperature anomalies. This relationship is expected because plastic production is highly dependent on fossil fuels and contributes to greenhouse gas emissions throughout its lifecycle. As plastic production increases steadily over time, we anticipate upward trends in the climate variables as well.

## Data

1. Global Plastic Production: https://ourworldindata.org/grapher/global-plastics-production 
This is an open dataset provided by Our World in Data. Important variables for our research are Year and Global plastics production(million tonnes), which will serve as our primary independent variable to test against climate indicators. This dataset is ideal because it covers the historical period from 1950 through 2019, showing the trend over time. 

2. Atmospheric CO₂ Concentration (NOAA): https://gml.noaa.gov/ccgg/trends/data.html 
This is public domain US government data. Variables year and mean represent the annual average CO2 concentration in parts per million, showing the long-term CO2 change.

3. Global Temperature Anomalies: https://data.giss.nasa.gov/gistemp/ 
It is open public data provided by NASA's Goddard Institute for Space Studies. Variables year and column labeled J-D, which shows the January-December annual mean temperature anomaly, represents the deviation in global surface temperature relative to the 1951-1980 baseline. This helps in observing the global temperature rise. 


## Ethics 

Instructions: Keep the contents of this cell. For each item on the checklist
-  put an X there if you've considered the item
-  IF THE ITEM IS RELEVANT place a short paragraph after the checklist item discussing the issue.
  
Items on this checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Your teams will document these discussions and decisions for posterity using this section.  You don't have to solve these problems, you just have to acknowledge any potential harm no matter how unlikely.

Here is a [list of real world examples](https://deon.drivendata.org/examples/) for each item in the checklist that can refer to.

[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)

### A. Data Collection
 - [X] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

> Example of how to use the checkbox, and also of how you can put in a short paragraph that discusses the way this checklist item affects your project.  Remove this paragraph and the X in the checkbox before you fill this out for your project

 - [X] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
 - [X] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
 - [X] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

### B. Data Storage
 - [X] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?
 - [X] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?
 - [X] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?

### C. Analysis
 - [X] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?
 - [X] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?
 - [X] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?
 - [X] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?
 - [X] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?

### D. Modeling
 - [X] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?
 - [X] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?
 - [X] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?
 - [X] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?
 - [X] **D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

### E. Deployment
 - [X] **E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?
 - [X] **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?
 - [X] **E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary?
 - [X] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?


## Team Expectations 

Team expectation 1: Communication
* Primary communication method: discord chat and call 
* Response time usually within a day and everyone should answer the weekly group meeting call since we are all contributing to the proposal
* If a deadline is within 48 hours, we aim to start at least 12 hours before the dead line
* If someone is unavailable to answer the call or do the work, they have to notice the group as soon as possible

Team expectation 2: Weekly Meeting Schedule 
* Meeting will be held every week usually wednesday around 3-5 pm since we know everyone is available during that period of time 
* Each meeting we discuss what to do by the deadline and what to expect, and plan for the next meeting
* We use google doc to do the assignments and submit whoever is available

Team expectation 3: Decision-making
* During the team meeting, we go for the majority 
* Whoever is available can create the assignment document or submit the assignment
* If a quick decision needs to be made, whoever answer the first gets the chance 

Team expectation 4: Equal Contribution
* Everyone puts equal amount of time and effort to finish the assignment 
* We will use our Github page and google doc to work on most of our project 
* Everyone must contribute into the weekly meeting and has to let everyone know if something happens on the discord chat 
* Respect every member and make sure to keep the boundaries 


## Project Timeline Proposal

Instructions: REPLACE the contents of this cell with your work

Specify your team's specific project timeline. An example timeline has been provided. Changes the dates, times, names, and details to fit your group's plan.

If you think you will need any special resources or training outside what we have covered in COGS 108 to solve your problem, then your proposal should state these clearly. For example, if you have selected a problem that involves implementing multiple neural networks, please state this so we can make sure you know what you’re doing and so we can point you to resources you will need to implement your project. Note that you are not required to use outside methods.



| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 1/30  |  1 PM | We checked group members and reached out for a group chat. Read and think about what to answer for the assignment where all the members read previous proposals and give a review of them.   | Discussed who will answer which portions and shared thoughts about two different previous proposal examples in order for us to do better.  | 
| 2/4  |  3 PM |  Think about which research question we want to work on. | Discuss which portion in the proposal each person wants to work on, and what to answer. | 
| 2/11  | 3 PM  | Prepare data we need for our research and keep collecting them. | Share the data we collected and the approaches we should take. Every member works on their assigned parts. |
| 2/18  | 3 PM  | Combine the data and utilize them so we can use it on the project; we start using the EDA too.  | Review our work and the data collected. Check if there’s anything wrong with using EDA. |
| 2/25  | 3 PM  | Finalize the project generally and start making the analysis for the project.  | Discuss what to include in the analysis and what to highlight. Everyone should participate and share their thoughts and ideas. Complete project check-in. |
| 3/4  | 3 PM  | Complete the analysis and make the final draft of the project to see if there’s anything causing problems. | Discuss and make edits on the analysis and the contents generally. |
| 3/18  | Before 11:59 PM  | Finish up and fix last minute errors and submit the project  | Submit the project and the surveys |