# Week 2: Visualising Lifestyle and Mental Health

Welcome to your first coding challenge! In this notebook, you'll explore a dataset about lifestyle factors and depression, and create a multi-panel visualisation using an LLM coding assistant.

**In this lab, you'll complete Part 1 using this notebook, then try Part 2 using Python scripts.** See the [challenge brief (README.md)](README.md) for full details on both workflows.

**Remember the LLM Problem-Solving Loop:**

**Outer loop (your research process):** PLAN → EXECUTE → EVALUATE → DOCUMENT

**Inner loop (working with the AI):** ENGINEER → PROMPT → VERIFY → REFINE

The inner loop will probably run 2–5 times for each piece of code. That's normal — even experienced developers iterate with AI tools.

---

In [None]:
# === IMPORTS ===
# These are the libraries you'll use today.
# pandas: for loading and working with data tables
# numpy: for numerical operations (like calculating trend lines)
# matplotlib: for creating plots and figures
# seaborn: for making statistical plots look great with less code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set some defaults so our plots look nice
sns.set_theme(style="whitegrid", font_scale=1.1)
plt.rcParams["figure.dpi"] = 100

print("Libraries loaded successfully!")

In [None]:
# === LOAD THE DATA ===
# This reads the CSV file into a pandas DataFrame called 'data'.
# A DataFrame is like a spreadsheet — rows are participants, columns are variables.

data = pd.read_csv("data/fake_depression_dataset.csv")

# Show the shape (rows, columns) and first few rows
print(f"Dataset shape: {data.shape[0]} participants, {data.shape[1]} variables")
print()
data.head(10)

## Step 1: Explore the Data

Before creating any visualisations, let's understand what we're working with. Run the cells below to check for missing values and see summary statistics for each variable.

In [None]:
# Check for missing values
# (If any column shows a number > 0, that column has missing data)
print("Missing values per column:")
print(data.isnull().sum())
print()

# Summary statistics for all numeric columns
# This shows the count, mean, standard deviation, min, max, and quartiles
data.describe().round(2)

## Step 2: PLAN Your Visualisation

This is the **PLAN** step of the outer loop — and it's the most important one. Before your AI writes any code, ask it to write a plan.

**Why plan first?** A written plan catches misunderstandings before code is written. It's much easier to say "actually, let's swap panel 3 for a violin plot" when you're looking at a plan than after the AI has produced 50 lines of code. Planning saves iteration cycles and produces better results.

### What to do

1. Open your AI assistant (Copilot Chat, ChatGPT, Claude, etc.)
2. Ask it to create an analysis plan — here's a prompt you can adapt:

> "Based on the dataset I've loaded (columns: Age, Gender, Height_cm, Weight_kg, TV_hrs_week, VideoGames_hrs_week, Siblings, FB_Friends, SocialMedia_hrs_week, Sleep_hrs_night, Exercise_hrs_week, Depression), write an analysis plan for a 2×2 multi-panel scatter plot figure exploring lifestyle factors and depression. Write the plan as a numbered list in markdown — what goes in each panel, what relationships we're testing, and what visual features to include. Don't write any code yet."

3. **Paste the AI's plan into the markdown cell below.** (Double-click the cell to edit it.)
4. **Review the plan.** Does the approach make sense? Would the panels tell a coherent story? Are the variable choices interesting?
5. **Revise if needed.** Go back to the AI and ask for changes — "Actually, let's use Age instead of TV hours for the fourth panel" — until you're happy with the plan.

A good plan makes everything that follows faster. You'll write better prompts and spend less time debugging because you know exactly what you're aiming for.

### Your AI-Generated Plan

*Paste your AI's analysis plan here. Double-click this cell to edit it.*

*Once you've reviewed and revised the plan, move on to Step 3.*

## Step 3: ENGINEER Your Prompt

Now that you have a reviewed plan, your prompts will be much more focused. You know exactly what each panel should show, so you can give the AI precise instructions.

This is the most important skill you'll develop in this course: writing prompts that give the AI enough context to produce useful code.

### Weak Prompt vs. Strong Prompt

Here's the same request, written two different ways:

---

**Weak prompt:**
> "Make me a graph of my data"

This will produce *something*, but probably not what you want. The AI has to guess what data you have, what type of graph you want, what variables to use, and what libraries are available. It will guess wrong on at least one of these.

---

**Strong prompt:**
> "I have a pandas DataFrame called `data` with 2000 rows. It has columns including `Sleep_hrs_night` (continuous, 4–9), `Exercise_hrs_week` (continuous, 0–7), `SocialMedia_hrs_week` (continuous, 0–50), `Age` (continuous, 18–65), and `Depression` (continuous, 0–10).
>
> Using matplotlib and seaborn, create a 2×2 figure (figsize 12×10) with:
> - Top-left: scatter plot of Sleep vs Depression with a linear trend line
> - Top-right: scatter plot of Exercise vs Depression with a linear trend line
> - Bottom-left: scatter plot of Social Media vs Depression with a linear trend line
> - Bottom-right: scatter plot of Age vs Depression with a quadratic trend line
>
> Use alpha=0.3 for the scatter points, add descriptive titles to each subplot, and use `sns.despine()` to clean up the borders. Add an overall figure title."

---

The strong prompt tells the AI:
- **What your data looks like** (column names, types, ranges)
- **What libraries to use** (matplotlib and seaborn)
- **Exactly what you want** (4 specific panels, each described)
- **Visual details** (transparency, titles, styling)

This will almost certainly produce working, good-looking code on the first try. **The time you invest in writing a good prompt saves you time debugging bad code.**

Now write your own prompt based on your plan from Step 2. Open your LLM assistant (ChatGPT, Claude, Copilot Chat, etc.) and send it.

In [None]:
# === YOUR VISUALISATION CODE ===
# Paste the code from your LLM assistant below.
# Run the cell (Shift+Enter) to see if it works.
#
# If you get an error:
#   1. Read the error message — it usually tells you what went wrong
#   2. Copy the error message and send it back to your LLM assistant
#   3. Tell the AI what you think went wrong and ask it to fix it
#   4. Paste the corrected code and try again
#
# This is the VERIFY → REFINE part of the loop. It's normal to
# go through this 2-3 times before everything works.




## Step 4: VERIFY and REFINE

Your code ran and produced a figure. Now check:

- [ ] **Does it run without errors?** If not, copy the error to your LLM assistant and ask for a fix.
- [ ] **Are the axis labels correct?** Check that x and y axes show the right variable names.
- [ ] **Do the patterns make sense?** For example, you'd expect more sleep to be associated with lower depression — does your plot show that?
- [ ] **Is the figure readable?** Are the fonts big enough? Are the panels spaced well? Can you tell what each panel shows at a glance?
- [ ] **Does it look professional?** Would you be comfortable putting this on a presentation slide?

### Tips for Refining

If the figure needs work, try prompts like:
- "The axis labels are overlapping. How do I increase the spacing between subplots?"
- "I want to change the colour of the scatter points to a blue-to-red gradient based on the Depression score. How do I do this with matplotlib's colormap?"
- "Add a correlation coefficient (r value) as text in the top-right corner of each panel."
- "Make the overall figure title bold and larger."

Each refinement is another pass through the inner loop: **ENGINEER → PROMPT → VERIFY → REFINE**.

In [None]:
# === SAVE YOUR FIGURE ===
# Once you're happy with your visualisation, uncomment the lines below
# to save it as a PNG file. You can then insert it into your HTML slide.
#
# Tip: Run your visualisation code in the cell above first,
# then run this cell immediately after (while the figure is still active).

# fig.savefig("our_visualisation.png", dpi=300, bbox_inches="tight")
# print("Figure saved as 'our_visualisation.png'")

## Step 5: DOCUMENT and Reflect

Take a moment to think about what you've done and learned. Add your answers below (you can edit this markdown cell by double-clicking on it):

**What relationships did we explore, and what did we find?**

*Your answer here...*

**What was the most interesting pattern in the data?**

*Your answer here...*

**What did we learn about prompting an LLM?**
- What worked well?
- What was harder than expected?
- Did we have to refine our prompts? How many times?

*Your answer here...*

**Did having a written plan before coding help? How?**

*Your answer here...*

**What would we do differently next time?**

*Your answer here...*

## Part 2: Try the Script Workflow

You've completed Part 1 using the notebook workflow. For the remaining time, try the **script workflow** — writing Python as `.py` files and running them from the terminal.

The full instructions are in the [challenge brief (README.md)](README.md#part-2-script-workflow-45-min), but here's the short version:

1. **Have your AI create a `plan.md`** for a different visualisation of the same dataset
2. **Review and revise** the plan until you're happy
3. **Have the AI edit `starter.py`** (or create a new `.py` file) with the visualisation code
4. **Run it** from the terminal: `conda activate psyc4411-env` then `python starter.py`
5. **Check the output** and iterate

Your AI assistant can create new files, edit existing ones, and even add new sections to this notebook — use it to build up your work piece by piece. See the [README](README.md#bonus-challenges) for bonus challenges you can try in either workflow.