# DATA101 Live Coding - Notebook -> Chart -> Export

Use this notebook as a **follow-along** during live coding. It mirrors the Python WebViz slide deck (62 slides).

**You should be able to run everything top-to-bottom** (and edit small variables as we go).

## Slide map (for reference)
- Part 1 → Slides 13–14 (Demo A + Code Demo A)
- Part 2 → Slides 11–12 (Concrete D3 examples + distributions)
- Part 3 → Slide 10 (Pandas “Chart Data Engine” → chart-ready tables)
- Part 4 → Slides 15–17 (Matplotlib model + SVG export)
- Part 5 → Slide 11 (Program-level change example)
- Part 6 → Slides 18–21 (Seaborn + Code Demo C)
- Part 7 → Slides 31–32 (Interactive HTML artifact + code run)
- Part 8 → Slide 36 (Animations / Plotly)
- Checklist → Slides 34–35 (Export formats + micro-checklist)

## What you'll practice
- Build a *chart-ready table* (derive measures, aggregate intentionally)
- Make readable Matplotlib charts (titles, labels, domains)
- Export deliverables as **SVG/PNG** (not screenshots)
- (Optional) Generate a **single HTML artifact** with inline SVG + tooltip

## Local setup (optional)

If you're running locally (not on JupyterHub):

```bash
# from the repo root
python3 -m venv .venv
source .venv/bin/activate  # macOS/Linux
# Windows (PowerShell): .venv\Scripts\Activate.ps1

python -m pip install -r requirements.txt
jupyter lab
```

Then open: `notebooks/python-webviz-live-coding.ipynb`.





## Environment setup (if needed)

If imports fail (missing `numpy`, `pandas`, `matplotlib`, or `plotly`), run the install cell once.

This notebook supports the live demos shown in slides 9–36.

Slide numbers refer to the footer of `slides-python-webviz.md` (the Python WebViz deck).



In [None]:
# Install dependencies (run once if needed)
import sys
import subprocess
from importlib.util import find_spec
from pathlib import Path

required = ["numpy", "pandas", "matplotlib", "plotly"]
missing = [p for p in required if find_spec(p) is None]

print("Python:", sys.executable)

if not missing:
    print("All required packages already available; skipping install.")
else:
    req = "notebooks/requirements.txt" if Path("notebooks/requirements.txt").exists() else "requirements.txt"
    print("Missing:", ", ".join(missing))
    print("Installing from:", req)
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-r", req])
    print("If you installed anything new, restart the kernel (Kernel → Restart).")


In [None]:
# Setup
import sys
from pathlib import Path

print("Python:", sys.executable)

try:
    import numpy as np
except ImportError as e:
    raise ImportError(
        "This notebook requires numpy. Run the install cell above (or `python -m pip install -r requirements.txt`), then restart the kernel."
    ) from e

try:
    import pandas as pd
except ImportError as e:
    raise ImportError(
        "This notebook requires pandas. Run the install cell above (or `python -m pip install -r requirements.txt`), then restart the kernel."
    ) from e

try:
    import matplotlib.pyplot as plt
except ImportError as e:
    raise ImportError(
        "This notebook requires matplotlib. Run the install cell above (or `python -m pip install -r requirements.txt`), then restart the kernel."
    ) from e

pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 140)

try:
    plt.style.use("seaborn-v0_8-whitegrid")
except Exception:
    pass

plt.rcParams.update(
    {
        "figure.dpi": 120,
        "axes.titlesize": 12,
        "axes.labelsize": 11,
        "font.size": 11,
    }
)

# Prefer keeping outputs inside `notebooks/outputs/` when possible.
OUT_DIR = Path("notebooks/outputs") if Path("notebooks").is_dir() else Path("outputs")
OUT_DIR.mkdir(parents=True, exist_ok=True)
print("Outputs will be written to:", OUT_DIR.resolve())

# Locate the class dataset regardless of whether you're running from repo root or `notebooks/`.
DATA_CANDIDATES = [
    Path("data-task-abstraction-dataset.csv"),
    Path("notebooks/data-task-abstraction-dataset.csv"),
]
for _p in DATA_CANDIDATES:
    if _p.exists():
        DATA_PATH = _p
        break
else:
    raise FileNotFoundError(
        "Could not find data-task-abstraction-dataset.csv. "
        "Expected it in `notebooks/` (recommended) or the current folder."
    )

print("Using dataset:", DATA_PATH.resolve())


# Part 1 - Chart-ready tables (mini example) — Slides 13–14

Before we touch the big dataset, we practice the idea on a tiny table:
- start with **counts** (`n_pass`, `n_students`)
- derive a **comparable measure** (`pass_rate`)
- compute a **change metric** over time (`delta`)



In [None]:
mini = pd.DataFrame(
    {
        "program": ["A", "A", "B", "B", "C", "C"],
        "week": [1, 2, 1, 2, 1, 2],
        "n_pass": [70, 62, 90, 88, 40, 44],
        "n_students": [100, 100, 120, 120, 50, 50],
    }
)

mini["pass_rate"] = mini["n_pass"] / mini["n_students"]

mini = mini.sort_values(["program", "week"]).assign(
    delta=lambda d: d.groupby("program")["pass_rate"].diff()
)

mini


# Part 2 - Load the class dataset — Slides 11–12

We use the standardized CSV so everyone's results match.



In [None]:
raw = pd.read_csv(DATA_PATH)
print("Rows, cols:", raw.shape)
print("Columns:", raw.columns.tolist())
raw.head(5)


In [None]:
print("Terms:", sorted(raw["term"].unique()))
print("Weeks:", int(raw["week"].min()), "..", int(raw["week"].max()))
print("Programs:", sorted(raw["program"].unique()))


# Part 3 - Build a chart-ready table (real data) — Slide 10

Goal: one row per **term x week x program**, with a derived measure you can compare.

We aggregate counts first, then compute `pass_rate`.



In [None]:
weekly = (
    raw.groupby(["term", "week", "program"], as_index=False)
    .agg(n_pass=("n_pass", "sum"), n_students=("n_students", "sum"))
    .sort_values(["term", "program", "week"])
)

weekly["pass_rate"] = weekly["n_pass"] / weekly["n_students"]

weekly = weekly.assign(
    delta_week=lambda d: d.groupby(["term", "program"])["pass_rate"].diff()
)

weekly.head(10)


In [None]:
# Quick sanity checks (good habits)
assert weekly["pass_rate"].between(0, 1).all(), "pass_rate should always be in [0, 1]"
print("OK: pass_rate in [0, 1] for all rows")


# Part 4 - Chart 1: Pass rate over weeks (Matplotlib) — Slides 15–17

Edit `program_to_show` and re-run the next 2 cells.

Expected output: a line chart (with legend) + SVG/PNG files in `notebooks/outputs/`.



In [None]:
terms = sorted(weekly["term"].unique())
baseline_term, current_term = terms[0], terms[-1]
print("Baseline term:", baseline_term)
print("Current term:", current_term)

program_to_show = "CS"  # TODO: try DS / IS / IT


In [None]:
dfp = weekly.loc[weekly["program"] == program_to_show].copy()

fig, ax = plt.subplots(figsize=(7.6, 3.4))
for term, d in dfp.groupby("term"):
    d = d.sort_values("week")
    ax.plot(d["week"], d["pass_rate"], marker="o", linewidth=2, label=term)

ax.set_title(f"Pass rate over weeks - {program_to_show}")
ax.set_xlabel("Week")
ax.set_ylabel("Pass rate")
ax.set_ylim(0.5, 1.0)
ax.grid(True, alpha=0.25)
ax.legend(title="Term", frameon=False)
fig.tight_layout()
fig


In [None]:
# Export (deliverable mindset)
svg_path = OUT_DIR / f"pass_rate_{program_to_show}.svg"
png_path = OUT_DIR / f"pass_rate_{program_to_show}.png"

fig.savefig(svg_path, format="svg")
fig.savefig(png_path, dpi=200)

print("Wrote:", svg_path.resolve())
print("Wrote:", png_path.resolve())

# Show that SVG is text
first_line = svg_path.read_text(encoding="utf-8").splitlines()[0]
print("SVG first line:", first_line[:90] + ("..." if len(first_line) > 90 else ""))


# Part 5 - Chart 2: Program-level change between terms — Slide 11

Here we compare **baseline vs current** at the program level.

We aggregate totals per program per term, compute `pass_rate`, then compute:

`delta = pass_rate(current) - pass_rate(baseline)`



In [None]:
term_prog = (
    weekly.groupby(["term", "program"], as_index=False)
    .agg(n_pass=("n_pass", "sum"), n_students=("n_students", "sum"))
)
term_prog["pass_rate"] = term_prog["n_pass"] / term_prog["n_students"]

pivot = term_prog.pivot(index="program", columns="term", values="pass_rate")
change = (pivot[current_term] - pivot[baseline_term]).rename("delta_pass_rate")
change = change.sort_values()

change


In [None]:
colors = ["#ef4444" if v < 0 else "#22c55e" for v in change.values]

fig2, ax2 = plt.subplots(figsize=(7.6, 3.2))
ax2.barh(change.index, change.values, color=colors)
ax2.axvline(0, color="#111827", linewidth=1)

ax2.set_title("Program-level change in pass rate")
ax2.set_xlabel(f"Delta pass_rate ({current_term} - {baseline_term})")
ax2.set_ylabel("Program")
ax2.grid(True, axis="x", alpha=0.25)
fig2.tight_layout()
fig2


In [None]:
svg_path2 = OUT_DIR / f"delta_pass_rate_{current_term}_minus_{baseline_term}.svg"
png_path2 = OUT_DIR / f"delta_pass_rate_{current_term}_minus_{baseline_term}.png"

fig2.savefig(svg_path2, format="svg")
fig2.savefig(png_path2, dpi=200)

print("Wrote:", svg_path2.resolve())
print("Wrote:", png_path2.resolve())


# Part 6 (Optional) - Distribution view (small multiples) — Slides 18–21

Averages can hide risk.

We use `avg_score` to compare distribution shapes across programs (small multiples with shared axes).



In [None]:
term_for_dist = current_term

programs = sorted(raw["program"].unique())
cols = 2
rows = (len(programs) + cols - 1) // cols

fig3, axes = plt.subplots(rows, cols, figsize=(10, 6), sharex=True, sharey=True)
axes = np.array(axes).ravel()

# Fixed bins so panels are comparable.
s_all = raw.loc[raw["term"] == term_for_dist, "avg_score"].dropna()
bins = np.linspace(float(s_all.min()), float(s_all.max()), 20)

for ax, prog in zip(axes, programs):
    s = raw.loc[(raw["term"] == term_for_dist) & (raw["program"] == prog), "avg_score"].dropna()
    ax.hist(s, bins=bins, color="#60a5fa", alpha=0.9)
    ax.axvline(s.mean(), color="#f59e0b", linewidth=2)
    ax.set_title(prog)
    ax.grid(True, alpha=0.2)

for ax in axes[len(programs) :]:
    ax.axis("off")

fig3.suptitle(f"Distribution of avg_score by program ({term_for_dist})\nOrange line = mean", y=1.02)
fig3.tight_layout()
fig3


# Part 7 (Optional) - Single-file HTML artifact (inline SVG + tooltip) — Slides 31–32

This demonstrates the idea of a **shareable artifact**: a single `.html` you can open in a browser.

We'll generate a tiny SVG line chart directly from the data (no chart libraries needed) and add a tooltip.



In [None]:
term_for_html = current_term
d_html = (
    weekly.loc[(weekly["term"] == term_for_html) & (weekly["program"] == program_to_show), ["week", "pass_rate"]]
    .sort_values("week")
    .reset_index(drop=True)
)

if d_html.empty:
    raise ValueError("No rows found for the selected program/term.")

d_html.head(5)


In [None]:
width, height = 760, 360
pad_l, pad_r, pad_t, pad_b = 62, 18, 24, 52

xmin, xmax = float(d_html["week"].min()), float(d_html["week"].max())
ymin, ymax = 0.5, 1.0


def sx(x: float) -> float:
    if xmax == xmin:
        return pad_l
    return pad_l + (x - xmin) / (xmax - xmin) * (width - pad_l - pad_r)


def sy(y: float) -> float:
    if ymax == ymin:
        return height - pad_b
    return pad_t + (1 - (y - ymin) / (ymax - ymin)) * (height - pad_t - pad_b)


pts = [(sx(float(r.week)), sy(float(r.pass_rate))) for r in d_html.itertuples(index=False)]
path = "M " + " L ".join(f"{x:.1f},{y:.1f}" for x, y in pts)

circles = []
for r in d_html.itertuples(index=False):
    circles.append(
        f"<circle cx='{sx(float(r.week)):.1f}' cy='{sy(float(r.pass_rate)):.1f}' r='6' "
        f"data-week='{int(r.week)}' data-rate='{float(r.pass_rate):.3f}' />"
    )

svg = f'''
<svg viewBox='0 0 {width} {height}' width='{width}' height='{height}' xmlns='http://www.w3.org/2000/svg'>
  <style>
    .axis {{ stroke: #94a3b8; stroke-opacity: 0.55; }}
    .line {{ fill: none; stroke: #60a5fa; stroke-width: 3; }}
    circle {{ fill: #0ea5e9; stroke: #0b1220; stroke-width: 2; cursor: default; }}
    text {{ fill: #e5e7eb; font: 12px -apple-system, BlinkMacSystemFont, 'SF Pro Text', sans-serif; }}
  </style>

  <text x='{pad_l}' y='18' font-weight='700'>Pass rate - {program_to_show} ({term_for_html})</text>
  <line class='axis' x1='{pad_l}' y1='{height - pad_b}' x2='{width - pad_r}' y2='{height - pad_b}' />
  <line class='axis' x1='{pad_l}' y1='{pad_t}' x2='{pad_l}' y2='{height - pad_b}' />
  <path class='line' d='{path}' />
  {''.join(circles)}
</svg>
'''.strip()

html_doc = f'''<!doctype html>
<html>
<head>
<meta charset='utf-8'>
<meta name='viewport' content='width=device-width, initial-scale=1'>
<title>Pass rate - {program_to_show} ({term_for_html})</title>
<style>
  body {{ margin: 0; background: #0b1220; color: #e5e7eb;
         font-family: -apple-system, BlinkMacSystemFont, 'SF Pro Text', sans-serif; }}
  .wrap {{ padding: 16px; }}
  .tip {{ position: fixed; pointer-events: none; background: rgba(17,24,39,0.92);
          border: 1px solid rgba(148,163,184,0.35); border-radius: 10px;
          padding: 8px 10px; font-size: 12px; opacity: 0; }}
</style>
</head>
<body>
<div class='wrap'>
{svg}
</div>
<div class='tip' id='tip'></div>
<script>
  const tip = document.getElementById('tip');
  function showTip(e) {{
    const w = e.target.getAttribute('data-week');
    const r = e.target.getAttribute('data-rate');
    if (!w || !r) return;
    tip.textContent = 'Week ' + w + ' · pass_rate ' + r;
    tip.style.left = (e.clientX + 12) + 'px';
    tip.style.top = (e.clientY + 12) + 'px';
    tip.style.opacity = 1;
  }}
  function hideTip() {{ tip.style.opacity = 0; }}
  document.querySelectorAll('circle').forEach(c => {{
    c.addEventListener('mousemove', showTip);
    c.addEventListener('mouseleave', hideTip);
  }});
</script>
</body>
</html>'''

html_path = OUT_DIR / f"pass_rate_{program_to_show}_{term_for_html}.html"
html_path.write_text(html_doc, encoding="utf-8")

print("Wrote:", html_path.resolve())


# Quick checklist (for submissions / exports) — Slides 34–35

- Titles and axis labels are present (units/meaning)
- Comparisons use fair axes (shared domain when needed)
- If you compute a rate, show denominators somewhere (or justify)
- Export intentionally (`.svg` / `.png` / `.html`), don't screenshot

## Your turn (1-2 min)
- Change `program_to_show`
- Change the y-domain (try `[0.7, 1.0]` and see what story it tells)
- Add one annotation in Matplotlib (baseline line, text note, or highlighted week)



# Part 8 (Optional) - Animations (Plotly) — Slide 36

Animations are just **frames + transitions**. They can be helpful when the task is literally to **watch change over time**,
but they can hurt precise comparison. Keep the axis fixed and also provide a static view when needed.

This mirrors the “Bonus: Animations” slide and uses Plotly’s built-in animation controls.

**Docs:** https://plotly.com/python/animations/

If the animation controls don't show in your notebook UI, open the exported HTML file from `notebooks/outputs/`.


In [None]:
# Plotly animation: pass_rate by program over weeks (real data)
try:
    import plotly.express as px
except ImportError as e:
    raise ImportError(
        "This section requires plotly. Run the install cell above (or `python -m pip install -r requirements.txt`), then restart the kernel."
    ) from e

# Reuse the chart-ready table from Part 3 (`weekly`).
term_for_anim = current_term
df_anim = weekly.loc[weekly["term"] == term_for_anim, ["week", "program", "pass_rate"]].copy()

# Keep categories stable across frames.
program_order = sorted(df_anim["program"].unique())
df_anim["program"] = pd.Categorical(df_anim["program"], categories=program_order, ordered=True)
df_anim = df_anim.sort_values(["week", "program"])

fig = px.bar(
    df_anim,
    x="program",
    y="pass_rate",
    color="program",
    animation_frame="week",
    range_y=[0.6, 0.95],
    title=f"Pass rate by program (animated) — {term_for_anim}",
    labels={"pass_rate": "pass_rate"},
)

fig.update_layout(
    template="plotly_dark",
    width=860,
    height=420,
    margin=dict(l=50, r=20, t=70, b=45),
    showlegend=False,
)
fig.update_traces(opacity=0.92)

# Speed up playback a bit (Plotly Express creates play/pause controls by default).
if fig.layout.updatemenus and len(fig.layout.updatemenus) > 0:
    play_btn = fig.layout.updatemenus[0].buttons[0]
    if play_btn and play_btn.args and len(play_btn.args) > 1 and isinstance(play_btn.args[1], dict):
        play_btn.args[1].setdefault("frame", {}).update({"duration": 450, "redraw": True})
        play_btn.args[1].setdefault("transition", {}).update({"duration": 250})

# Export a single shareable HTML file (works outside the notebook).
html_path = OUT_DIR / f"pass_rate_by_program_{term_for_anim}_animated.html"
fig.write_html(html_path, include_plotlyjs=True, full_html=True)
print("Wrote:", html_path.resolve())

fig
