# ECON 0150 | Spring 2025 | Homework 2.1

### Due: 

Homework is designed to both test your knowledge and challenge you to apply familiar concepts in new applications. Answer clearly and completely. You are welcomed and encouraged to work in groups so long as your work is your own. Use the datafile to answer the following questions. Then submit your figures and answers to Gradescope.

In [ ]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Set style for better-looking plots
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

In [ ]:
# Setup: Data Preparation
# This code prepares the coffee production and agriculture employment datasets

# Load raw data
agric = pd.read_csv('data/world_in_data.csv')
coffee = pd.read_csv('data/coffee_bean_production.csv')

# Clean the coffee data
# Drop rows where Code is empty or NA
coffee = coffee[(coffee['Code'].notna()) & (coffee['Code'] != '')]

# Drop Entity=World
coffee = coffee[coffee['Entity'] != 'World']

# Count number of unique years
total_years = coffee['Year'].nunique()

# Filter to keep only countries that appear in all years
coffee_years = (coffee.groupby('Code')
                .filter(lambda x: len(x) == total_years)
                .copy())

# Save panel data for coffee production
coffee_years.to_csv('data/coffee_prod_in_years.csv', index=False)

# Keep only Year=2023 for merging
coffee_2023 = coffee[coffee['Year'] == 2023].copy()

# Rename columns for merging
coffee_2023 = coffee_2023.rename(columns={
    'Code': 'Country.Code',
    'Entity': 'Country.Name'
})

# Merge the data using Country.Code, keep only matched rows
merged_data = pd.merge(agric, coffee_2023, on='Country.Code', how='inner')

# Drop duplicate country name column
merged_data = merged_data.drop('Country.Name.y', axis=1)

# Rename columns for clarity
merged_data = merged_data.rename(columns={
    'Country.Name.x': 'Country_Name',
    'Country.Code': 'Country_Code',
    'Employment.in.agriculture....of.total.employment.': 'Employment_in_agriculture',
    'coffe_prod': 'Coffee_Prod'
})

# Save the merged data
merged_data.to_csv('data/coffee_prod_agr.csv', index=False)

print(f"Setup complete! Created datasets:")
print(f"- coffee_prod_agr.csv: {len(merged_data)} countries with coffee & agriculture data")
print(f"- coffee_prod_in_years.csv: {len(coffee_years['Code'].unique())} countries across {total_years} years")

In [None]:
# Load Data
coffee_prod_agr = pd.read_csv('data/coffee_prod_agr.csv')
coffee_bean_production = pd.read_csv('data/coffee_bean_production.csv')
world_in_data = pd.read_csv('data/world_in_data.csv')


#### Q1. Bike Hires and Weather

In the following questions, we'll analyze a data set that includes the monthly number of bike hires in London as well as monthly weather data: minimum and maximum temperature in degrees Celsius, rain in millimeters, and hours of sunshine.

![](i/HW4_image.png)

a) From the list below, how much did it rain in the month with the largest number of bike hires?

- 7.6 mm
- 27.6 mm
- 137.6 mm
- 157.6 mm

b) When were bikes most popular?

- In very sunny months
- In moderately sunny months
- In cloudy months
- Sunshine and bike hires were not strongly related

c) In months with what maximum temperatures were bikes most popular?

- Between 5 C and 10 C
- Between 15 C and 20 C
- Between 25 C and 30 C
- Maximum temperature and bike hires were not strongly related

#### Q2. A Relationship Between Variables

The dataset `coffee_prod_agr.csv` provides information on coffee production and employment in agriculture across different countries. Refer to the links below to answer the following question.

- Data source (1): https://ourworldindata.org/grapher/coffee-production-by-region?tab=table
- Data source (2): https://data.worldbank.org/indicator/SL.AGR.EMPL.ZS

a) Who collects the data reported in each source? Briefly describe the role of the organization behind it.

b) Identify one potential limitation in the data.

# Solution for Q2c: Visualization of coffee production vs employment in agriculture

# Convert coffee production to thousands of tons for better readability
coffee_prod_agr['Coffee_Prod_1000tons'] = coffee_prod_agr['Coffee_Prod'] / 1000

# Create scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(coffee_prod_agr['Employment_in_agriculture'], 
            coffee_prod_agr['Coffee_Prod_1000tons'],
            alpha=0.6, s=50)

plt.xlabel('Employment in Agriculture (% of total employment)', fontsize=12)
plt.ylabel('Coffee Production (thousands of tons)', fontsize=12)
plt.title('Coffee Production vs Employment in Agriculture', fontsize=14)

# Add grid for better readability
plt.grid(True, alpha=0.3)

# Save the figure
plt.savefig('i/HW_2_1_c.png', dpi=300, bbox_inches='tight')
plt.show()

# Calculate correlation
correlation = coffee_prod_agr['Employment_in_agriculture'].corr(coffee_prod_agr['Coffee_Prod'])
print(f"Correlation coefficient: {correlation:.3f}")

In [None]:
# 

d) Describe the relationship: Is it positive, negative, or unclear?

e) How might this relationship relate to the correlation between coffee production and GDP?

f) Why might economists be interested in studying both of these relationships? What kinds of questions could they answer?

g) Go to the data sources and download the latest available data for the year 2020. Using this updated data, choose two variables and generate a figure. Upload both your figure and the cleaned dataset you used to Gradescope. Briefly describe the steps you followed to retrieve and clean the data.

In [None]:
# 