# Data Visualization for Garbage Bag Charging Scheme in Hong Kong

This Jupyter Notebook guides you through data visualization techniques using Python, focusing on the garbage bag charging scheme dataset in Hong Kong. This aligns with the *Empowering Citizens through Data: Participatory Policy Analysis for Hong Kong* course, supporting SDG 12 (Responsible Consumption and Production). You will learn to:
1. Load and examine a dataset.
2. Understand its structure and variables.
3. Visualize categorical data with frequency tables, bar charts, and pie charts.
4. Analyze continuous data with summary statistics, box-whisker plots, and histograms.
5. Explore relationships using scatter plots.
6. Save visualizations to a directory.

Use GitHub Copilot to assist by writing prompts (e.g., comments like `# Write code to...`) to generate Python code. Libraries required: `pandas`, `matplotlib`, `seaborn`.

## Section 0: Import Relevant Libraries

Before we start, let’s make sure your computer has the tools we need, called libraries (`pandas`, `matplotlib`, and `seaborn`). If you see an error when running the code below, follow these simple steps to add them:

- On **Windows**: Click the Windows icon to open the Start menu, type “Command Prompt,” and click it to open.
- On **macOS**: Click the magnifying glass (top right) to open Spotlight Search, type “Terminal,” and click it to open.

In the window that opens, type this command and press Enter: `pip install pandas matplotlib seaborn`. 

- If you get a permission error, add `--user` at the end, like `pip install pandas matplotlib seaborn --user`. (If this doesn’t work, ask your instructor.)

After installing the libraries, try running the next code cell (the one asks you to write prompt to import pandas, matplotlib, and seaborn). If it works without errors, you’re ready to continue. If it shows an error (e.g., ‘ModuleNotFoundError’), you need to update the notebook’s settings. To do this, click the kernel name (e.g., ‘base’) in the top-right corner—don’t worry if it’s different, just click it—and look for a ‘Restart’ option in the dropdown. If you don’t see it, press Ctrl+Shift+P to open the Command Palette, type ‘Jupyter: Restart Kernel’ and select it. Restarting refreshes the notebook to use the new libraries. After restarting, run the next code cell again to confirm it works. If you’re still stuck, ask your instructor for help. 


**Task**: Use GitHub Copilot to generate the import statements for `pandas`, `matplotlib`, and `seaborn`. Write a prompt as a comment (e.g., `# Write code to import pandas, matplotlib, and seaborn`) and let Copilot suggest the code.

In [1]:
# Write a promt to import pandas, matplotlib, and seaborn
# Write code to import pandas, matplotlib, and seaborn
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 

## Section 1: Load and Examine the Dataset

**Task**: Use GitHub Copilot to generate code to load `week2.csv` into a pandas DataFrame and display the first five rows. Write a prompt as a comment and paste the generated code below.

In [2]:
# Write code to load week2.csv into a pandas DataFrame and show the first five rows
df = pd.read_csv('week2.csv')
df.head()

Unnamed: 0,support_info,support_after_info,fairness,government_consideration,policy_helpfulness,waste_severity,recycling_effort,food_waste_behavior,LocalResidentcode,HongKongDistrict_CW,...,HongKongDistrict_ YauTsimMong,HongKongDistrict_ShamShuiPo,HongKongDistrict_WanChai,HongKongDistrict_TsuenWan,HongKongDistrict_KwaiTsing,HongKongDistrict_SaiKung,HongKongDistrict_KwunTong,HongKongDistrict_WongTaiSin,HousingType_Publicrentalhousing,Distance_artificial
0,1,1,1,1,1,3,1,never_seen,0,0,...,0,1,0,0,0,0,0,0,0,259.83
1,5,5,5,5,4,2,2,never_seen,0,0,...,0,0,0,0,0,0,0,0,0,260.04
2,3,5,5,4,2,3,1,seen_not_used,0,0,...,0,0,0,0,0,0,0,0,0,338.89
3,1,2,2,1,2,3,2,never_seen,0,0,...,0,0,0,0,0,0,0,0,0,192.59
4,1,3,4,3,1,3,2,seen_not_used,1,0,...,0,0,0,0,0,0,0,0,0,232.95


## Section 2: Understand the Dataset Structure

**Task**: Use GitHub Copilot to generate code to display dataset information and summary statistics. Write a prompt as a comment and paste the generated code below.

In [4]:

# Write code to display dataset info and summary statistics
print("Displaying dataset information...")
df.info()  # Shows column names, non-null counts, and data types

print("\nDisplaying summary statistics for numerical columns...")
print(df.describe())  # Shows count, mean, std, min, max, and quartiles for numerical columns

print("\nDisplaying summary statistics for categorical columns...")
print(df.describe(include='object'))  # Shows summary for categorical columns (e.g.,

Displaying dataset information...
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 28 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   support_info                     97 non-null     int64  
 1   support_after_info               97 non-null     int64  
 2   fairness                         97 non-null     int64  
 3   government_consideration         97 non-null     int64  
 4   policy_helpfulness               97 non-null     int64  
 5   waste_severity                   97 non-null     int64  
 6   recycling_effort                 97 non-null     int64  
 7   food_waste_behavior              97 non-null     object 
 8   LocalResidentcode                97 non-null     int64  
 9   HongKongDistrict_CW              97 non-null     int64  
 10  HongKongDistrict_KowloonCity     97 non-null     int64  
 11  HongKongDistrict_Other           97 non-null     int

## Section 3: Visualize Categorical Data

**Task**: Use GitHub Copilot to generate code for a 
- frequency table
- bar chart, and 
- pie chart of support_level. 

The variable is called `support_info`. Make sure the figures have good readability.

In [None]:
# Write code to create a frequency table for support_level and  

In [None]:
# Write code to create a bar chart for support_level, ordered by strongly disagree (1) to strongly agree (5), with labels and axis titles
# Write code to label the 1-5 scale as 1=Strongly oppose, 2=Oppose, 3=Neutral, 4=Support, 5=Strongly support

In [None]:
# Write code to create a pie chart for support_info, explicitly label the 1-5 scale as 1=Strongly oppose, 2=Oppose, 3=Neutral, 4=Support, 5=Strongly support   

In [None]:
# Write code to create bar charts for support_info and support_after_info; put the plots in the 1x2 grid. Modify the output using github copilot chat and apply the new code if necessary.
# Get the maximum count for y-axis limit
# Bar chart for support_info
# Bar chart for support_after_info

In [None]:
# Write code to create bar charts for perceived fairness, government_consideration, policy_helpfulness, and waste_severity with correct Likert scales and labels
# Define variables, titles, and Likert scale options for each
# Calculate max count for y-axis scaling

**More Tasks**: 
- To see the living district distribution of respondents (now the living district information is coded by 0/1, you may consult with Github Copilot the steps to do this task)
- To generate a cross table showing the percentage of respondents in each district who report seeing a food waste bin (`food_waste_behavior`)

In [None]:
# Write code to generate the living district distribution of respondents (now the living district information is coded by 0/1)
# Step 1. Count the number of columns which contain the string "HongKongDistrict" in df
# Step 2. Combine all district columns into one Series and count frequencies
    # Extract district name after the underscore
    # Sum the values (number of participants for this district)
# Step 3. Generate a bar chart to visualize the frequency table above, ordered by frequency descending

In [None]:
# To generate a cross table showing the percentage of respondents in each district who report seeing a food waste bin (`food_waste_behavior`) The district information is coded by 0/1, for example, HongKongDistrict_CW	HongKongDistrict_KowloonCity	HongKongDistrict_Other	HongKongDistrict_North	HongKongDistrict_Southern. The value of food waste behavior are never_seen, seen_not_used, seen_and_used.
# Step 1. Create a DataFrame to hold district and food waste behavior
    # Filter respondents living in this district
        # Count food waste behavior
# Step 2. Create a pivot table for better visualization
# Step 3. Visualize the pivot table using a grouped bar chart

## Section 4: Analyze the Continuous Data (vs a Categorical Data)

A box-whisker plot, or box plot, is a graphical representation of the five-number summary of a dataset: the minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It is a standardized way of displaying the distribution of data based on these five key statistics.

Histogram is a type of bar chart used to show the distribution of continuous numerical data by dividing the data into intervals called bins and displaying the frequency of data points within each bin as a vertical bar.

**Task**: Use GitHub Copilot to generate code for summary statistics, a box-whisker plot, and a histogram of the distance of the nearest recycle facility to the respondents (m)`Distance_artificial`. Write prompts as comments and paste the generated code below.

In [None]:
# Write code to display summary statistics for Distance_artificial
# Write code to create a box-whisker plot and histogram for Distance_artificial and put then in a 1x2 way.
# Box-whisker plot
# Histogram

## Section 5: Explore Relationships Between Variables

**Task**: Use GitHub Copilot to generate code to explore the relationship between the distance of the nearest recycle facility to the respondents (`Distance_artificial`) and the recycling effort (`recycling_effort`) . Write a prompt as a comment and paste the generated code below.

In [None]:
# Write code to create a scatter plot with jitter of `Distance_artificial` vs. `recycling_effort`

## Section 6: Save Visualizations to a Designated Directory

**Task**: Use GitHub Copilot to generate code to check if the `plots` directory exists, create it if it doesn’t. Then, modify the previous visualization codes to save any one plot in png format to `plots/`.

In [None]:
# Write code to check if plots directory exists, create it if not, and list files  

## Reflection and Policy Insights

**Task**: 
- Summarize key insights from visualizations (100-150 words) (CILO 3).
- Propose one more question that can be answered using visualization. 