# DATA101 Take-Home Activity

## Interactive Charts + Data Apps (Plotly + Dash)

Based on the **Interactive Charts and Data Apps** deck (`slides-python-interactive-dash.md`).

**Instructor:** Marc Reyes (marc.reyes@dlsu.edu.ph)

## Group Information (edit this table)

| Group # | Member Name | Role (pick one) |
|---:|---|---|
|  |  | Facilitator |
|  |  | Data + transforms |
|  |  | Interactions + UI |
|  |  | QA + write-up |
|  |  | (optional) |

### Roles (recommended)
- Facilitator: keeps the group moving and merges edits.
- Data engineer: owns transforms + validation.
- App engineer: owns interactivity + callbacks.
- QA editor: checks defaults, fixed scales, and reset behavior.


## Learning Objectives

By the end, your group should be able to:

- Name a **task** that justifies interactivity.
- Build a **chart-ready table** (right grain + derived measures + validation).
- Create a Plotly figure that is **readable by default** and **precise on hover**.
- Implement a **linked-view** interaction (one selection updates another view).
- Explain the Dash mental model: **Inputs -> callback -> Outputs**.

## What To Submit

Submit the completed notebook.

Your notebook should include:

1) Two task statements (monitoring + drill-down) with a clear, verifiable output
2) A chart-ready table + at least 2 validation checks
3) An interactive Plotly chart (intentional labels, stable scales, formatted tooltip)
4) A linked-view design where one filtered dataset drives multiple outputs
5) A short reflection (8 to 12 sentences) about design decisions and tradeoffs

If your LMS allows multiple files, also submit the exported HTML artifact from Part 6.
If not, include a screenshot of the HTML opened in a browser.

## Rubric (30 points)

| Criteria | Points |
|---|---:|
| Interaction brief (tasks + chosen patterns + reset + stable comparisons) | 6 |
| Chart-ready data (correct grain; derived measures; validation) | 6 |
| Plotly figure quality (default readability; labels; hovertemplate; stable scales) | 8 |
| Linked views (one filter -> many outputs; selection state + clear reset) | 6 |
| Reflection + evidence (screenshots + concrete rules you followed) | 4 |


## Scenario (Domain Question)

An academic support office asks:

> "Where are students struggling this term, and which weeks should we investigate first?"

You will build an interactive view that supports two decisions:

- **Monitoring:** detect when a program's pass rate is worse than usual.
- **Drill-down:** select a week range and inspect what is happening inside that window.

### Non-negotiables (professional rules)

- **Default view must stand alone.** If your chart only "works" on hover, it is fragile.
- **Keep comparisons stable.** Do not change scales between states.
- **Make reset obvious.** Assume the viewer will get into a weird state and need to recover.



## Warm-up: Use the Interactive Lab (5 minutes)

Open the lab and try each interaction pattern:

- Interactive lab: https://data101-s15.feb10.dlsu-demos.marcr.io/demo/interactive/

Fill this in as a group:

| Pattern | What you did | What changed | What makes it predictable? |
|---|---|---|---|
| Tooltips |  |  |  |
| Zoom/pan |  |  |  |
| Legend filtering |  |  |  |
| Selection/brush |  |  |  |
| Linked views |  |  |  |

Then answer:
- What is one thing the lab does that makes the default view readable?
- What is one thing the lab does to make reset predictable?



# Part 1 - Interaction Brief (Task -> Interaction -> State)

Your goal is not "add features". Your goal is: **reduce viewer work**.

Write 2 tasks, then choose interaction patterns that support them.

Use these patterns (from the slide deck):
- Tooltips
- Zoom / pan
- Legend filtering
- Selection / brush (week range)
- Linked views

Professional non-negotiables:
- **Default view must stand alone** (readable without interaction)
- **Keep comparisons stable** (fixed scales between states)
- **Clear reset** (no mystery states)


## Your Interaction Brief (edit this cell)

**Domain question (1 sentence):**
- ...

**Task 1 (Monitoring):**
- What should the viewer detect or compare?
- What is the output (flagged weeks, ranked list, etc.)?

**Task 2 (Drill-down):**
- What should the viewer investigate after they see a signal?
- What is the output (filtered window + distribution + KPIs, etc.)?

**Chosen interaction patterns (circle or bold):**
- Tooltips / Zoom-pan / Legend filtering / Week-range selection / Linked views

**Default view (no interaction):**
- What does the viewer learn immediately?

**Stable comparisons:**
- Which axes must be fixed across states?

**Reset behavior:**
- How does the viewer return to the default state?

**Selection state visibility:**
- Where will you show the selected weeks + row count?


# Part 2 - Chart-Ready Data Checklist (Grain + Measures)

In the slide deck: **"Your data shape controls your workload."**

Before you chart, specify the grain you will plot and the measures you will compute.

Keep it practical:
- What is the grain of your trend view? (term x week x program)
- What is the grain of your drill-down view? (section-week rows inside a selected window)
- Which measures must be derived? (pass_rate)
- What validations prove your table is sane?


## Your Chart-Ready Data Spec (edit this cell)

**Unit of analysis (one row) in the raw CSV:**
- ...

**Trend table grain (what one row represents):**
- ...

**Derived measures (with formulas):**
- pass_rate = ...

**Required aggregations:**
- ...

**Sorting / ordering choices (what order supports the question?):**
- ...

**Validation checks (at least 2):**
- Example: pass_rate in [0, 1]
- Example: expected week range exists
- Example: no duplicate keys at the chosen grain


# Part 3 - Environment Setup

If imports fail (missing packages), run the install cell once.

- Required: numpy, pandas, plotly
- Optional (for Part 8): dash



In [None]:
# Install dependencies (run once if needed)
import sys
import subprocess
from importlib.util import find_spec
from pathlib import Path

required = ["numpy", "pandas", "plotly"]
optional = ["dash"]

missing_required = [p for p in required if find_spec(p) is None]
missing_optional = [p for p in optional if find_spec(p) is None]

print("Python:", sys.executable)

# Install required packages from a requirements file.
# Prefer `notebooks/requirements.txt` when launched from the repo root.
# Prefer `requirements.txt` when launched from inside the `notebooks/` folder.
if missing_required:
    req = "notebooks/requirements.txt" if Path("notebooks/requirements.txt").exists() else "requirements.txt"

    print("Missing required:", ", ".join(missing_required))
    print("Installing from:", req)
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-r", req])
    print("If you installed anything new, restart the kernel (Kernel -> Restart).")
else:
    print("All required packages already available; skipping install.")

# Dash is only needed if you run Part 8.
if missing_optional:
    print("Optional (for Dash app) missing:", ", ".join(missing_optional))
    print("To install Dash:")
    print("  python -m pip install dash")


In [None]:
# Setup
from pathlib import Path

try:
    import numpy as np
except ImportError as e:
    raise ImportError("This activity requires numpy. Run the install cell above, then restart the kernel.") from e

try:
    import pandas as pd
except ImportError as e:
    raise ImportError("This activity requires pandas. Run the install cell above, then restart the kernel.") from e

try:
    import plotly.express as px
except ImportError as e:
    raise ImportError("This activity requires plotly. Run the install cell above, then restart the kernel.") from e

pd.set_option("display.max_columns", 60)
pd.set_option("display.width", 140)

# Prefer keeping outputs inside `notebooks/outputs/` when possible.
OUT_DIR = Path("outputs")
if Path("notebooks").is_dir():
    OUT_DIR = Path("notebooks/outputs")
OUT_DIR.mkdir(parents=True, exist_ok=True)
print("Outputs will be written to:", OUT_DIR.resolve())


# Part 4 - Load the Dataset

You will use the standardized **DATA101 class dataset** (CSV): `data-task-abstraction-dataset.csv`.

**Unit of analysis (one row):** one section-week summary for one term.

Key columns:
- `term` (categorical)
- `week` (ordered)
- `program` (categorical)
- `section_id` (ID)
- `n_students` (count)
- `n_pass` (count)
- `avg_score` (0-100)


In [None]:
# Locate the dataset regardless of where the notebook is launched from.
candidate_paths = [
    Path("data-task-abstraction-dataset.csv"),
    Path("notebooks/data-task-abstraction-dataset.csv"),
]

dataset_path = next((p for p in candidate_paths if p.exists()), None)
if dataset_path is None:
    raise FileNotFoundError(
        "Could not find `data-task-abstraction-dataset.csv`. "
        "Expected it next to this notebook or in `notebooks/` in the repo root."
    )

raw = pd.read_csv(dataset_path)
print("Loaded:", dataset_path)
print("Rows, cols:", raw.shape)
raw.head(5)


In [None]:
# Quick dataset checks (good habits)
assert set(["term", "week", "program", "section_id", "n_students", "n_pass", "avg_score"]).issubset(raw.columns)

print("Terms:", sorted(raw["term"].unique()))
print("Weeks:", int(raw["week"].min()), "..", int(raw["week"].max()))
print("Programs:", sorted(raw["program"].unique()))

# Derived measure at the row level
raw = raw.copy()
raw["pass_rate"] = raw["n_pass"] / raw["n_students"]

assert raw["pass_rate"].between(0, 1).all(), "pass_rate should always be in [0, 1]"
print("OK: pass_rate in [0, 1]")


# Part 5 - Build Chart-Ready Tables

For trend charts, we want one row per:

- `term` x `week` x `program`

We will aggregate counts, then compute `pass_rate`.



In [None]:
# TODO: Build a weekly program table (one row per term x week x program)

weekly = (
    raw.groupby(["term", "week", "program"], as_index=False)
    .agg(
        n_pass=("n_pass", "sum"),
        n_students=("n_students", "sum"),
    )
    .sort_values(["term", "program", "week"])
)

# Derived measure at the aggregated grain
weekly["pass_rate"] = weekly["n_pass"] / weekly["n_students"]

weekly.head(8)


In [None]:
# Validation checks (add at least 2)

# 1) pass_rate bounds at the aggregated grain
assert weekly["pass_rate"].between(0, 1).all()

# 2) expected week range
assert int(weekly["week"].min()) == int(raw["week"].min())
assert int(weekly["week"].max()) == int(raw["week"].max())

print("OK: weekly table looks sane")


# Part 6 - Interactive Plotly Figure (Tooltips + Zoom/Pan + Legend Filtering)

Goal: a line chart that is readable by default and precise on hover.

Requirements:
- Title and axis labels are complete.
- Y axis uses a stable range (0 to 1) and percent formatting.
- Tooltip is formatted (percent, not raw decimals) and does not show extra noise.

Try these interactions:
- Hover a point for exact values.
- Drag to zoom; double-click to reset.
- Click legend items to isolate programs.



In [None]:
# TODO: Build the figure

# Choose a term to plot (default: the most recent term in the dataset)
terms = sorted(raw["term"].unique())
current_term = terms[-1]
print("Current term:", current_term)

view = weekly.query("term == @current_term")

fig = px.line(
    view,
    x="week",
    y="pass_rate",
    color="program",
    markers=True,
    title=f"Pass rate by week ({current_term})",
)

fig.update_layout(
    template="plotly_white",
    legend_title_text="Program",
    hovermode="x unified",
    margin=dict(l=60, r=20, t=60, b=50),
)

fig.update_yaxes(range=[0, 1], tickformat=".0%", title_text="Pass rate")
fig.update_xaxes(title_text="Week")

fig.update_traces(
    hovertemplate=(
        "<b>%{fullData.name}</b>" +
        "<br>Week %{x}" +
        "<br>Pass rate %{y:.1%}" +
        "<extra></extra>"
    )
)

fig


## Export an Interactive HTML Artifact

Export your Plotly figure as a single HTML file. This is the professional "ship" move.

- Write it to `notebooks/outputs/`
- Open it in a browser
- Include a screenshot in the notebook (or submit the HTML file if allowed)



In [None]:
# Export (HTML)
out_html = OUT_DIR / "pass_rate_by_week.html"
fig.write_html(out_html, include_plotlyjs="cdn")
print("Wrote:", out_html.resolve())


## Stretch Goal (Optional) - Animation (Transitions Between States)

Animation is only defensible when frames are comparable.

Rules:
- Fixed axis range (no per-frame rescaling).
- Provide a static fallback chart for precision (your trend chart already does this).

Task:
- Build a Plotly animation where each frame is a week.
- Use the aggregated `weekly` table for the current term.


In [None]:
# Optional: Plotly animation (fixed scale)

terms = sorted(raw["term"].unique())
current_term = terms[-1]

anim = weekly.query("term == @current_term").copy()

program_order = sorted(anim["program"].unique())

fig_anim = px.scatter(
    anim,
    x="pass_rate",
    y="program",
    animation_frame="week",
    animation_group="program",
    range_x=[0, 1],
    category_orders={"program": program_order},
    title=f"Pass rate by program (animated by week, {current_term})",
)

fig_anim.update_layout(template="plotly_white")
fig_anim.update_xaxes(tickformat=".0%", title_text="Pass rate")
fig_anim.update_yaxes(title_text="Program")
fig_anim.update_traces(
    marker=dict(size=12),
    hovertemplate=(
        "Program %{y}<br>Pass rate %{x:.1%}<extra></extra>"
    ),
)

fig_anim


# Part 7 - Linked Views (Overview -> Selection -> Detail)

Goal: one selection updates another view.

Your job:
- Use a **week range** selection (e.g., [5, 13])
- Use a **program** selection (e.g., CS)
- Update BOTH:
  - a trend view (pass rate over week inside the window)
  - a detail view (distribution of avg_score inside the window)

Professional rules:
- Show the selection state (selected weeks + row count).
- Make it easy to clear (full range is a reasonable reset).



In [None]:
# TODO: Implement the linked-view functions

def filter_rows(df, *, term, program, week_range):
    # Return a filtered table for one term + program + week window.
    lo, hi = week_range
    view = df.query(
        "term == @term and program == @program and @lo <= week <= @hi"
    ).copy()
    return view


def kpis(view):
    # Return simple summary stats for the selection window.
    # TODO: add at least 1 more KPI that supports your task.
    return {
        "rows": int(view.shape[0]),
        "sections": int(view["section_id"].nunique()),
        "students": int(view["n_students"].sum()),
        "pass_rate": float(view["n_pass"].sum() / view["n_students"].sum()),
        "avg_score": float(view["avg_score"].mean()),
    }


def make_trend(view):
    # Trend chart for the selected program + weeks.
    # Aggregate inside the selection window so the grain matches the question.
    trend = (
        view.groupby(["week"], as_index=False)
        .agg(n_pass=("n_pass", "sum"), n_students=("n_students", "sum"))
        .sort_values(["week"])
    )
    trend["pass_rate"] = trend["n_pass"] / trend["n_students"]

    fig = px.line(
        trend,
        x="week",
        y="pass_rate",
        markers=True,
        title="Pass rate inside selected window",
    )
    fig.update_layout(template="plotly_white", hovermode="x unified")
    fig.update_yaxes(range=[0, 1], tickformat=".0%", title_text="Pass rate")
    fig.update_xaxes(title_text="Week")
    fig.update_traces(
        hovertemplate=(
            "Week %{x}<br>Pass rate %{y:.1%}<extra></extra>"
        )
    )
    return fig


def make_distribution(view):
    # Detail view: distribution of avg_score for the selected window.
    fig = px.histogram(
        view,
        x="avg_score",
        nbins=18,
        title="Avg score distribution (selected window)",
    )
    fig.update_layout(template="plotly_white")
    fig.update_xaxes(range=[0, 100], title_text="Average score")
    fig.update_yaxes(title_text="Count (section-weeks)")
    return fig


In [None]:
# Try it: pick a program + week window

terms = sorted(raw["term"].unique())
current_term = terms[-1]

program = "CS"  # TODO: try DS / IS / IT
week_range = [5, 13]

view = filter_rows(raw, term=current_term, program=program, week_range=week_range)
summary = kpis(view)

print(f"Selection: term={current_term}, program={program}, weeks={week_range}")
print("Rows:", summary["rows"], "Sections:", summary["sections"], "Students:", summary["students"])
print("Pass rate:", f"{summary['pass_rate']:.1%}", "Avg score:", f"{summary['avg_score']:.1f}")

fig_trend = make_trend(view)
fig_dist = make_distribution(view)

fig_trend


In [None]:
# Distribution figure
fig_dist


# Part 8 - Dash App (Wrap the Same Logic)

Dash lets you turn the linked-view pattern into a real app.

You will build:
- Inputs: Dropdown (program), RangeSlider (weeks)
- Outputs: Trend graph, distribution graph, a small text KPI panel

Important: callbacks should feel like pure functions of state.

If you cannot run a Dash server in your current environment, you can still earn full points by:
- writing the layout and callback code below
- showing that your pure functions in Part 7 work

If you CAN run Dash locally:
- `python -m pip install dash`
- uncomment the last lines and run the app



In [None]:
# TODO: Dash skeleton (optional to run)

# NOTE: Running a Dash server in a notebook will block the kernel until you interrupt it.
# If you are using JupyterHub, you may not be able to access localhost ports.

try:
    from dash import Dash, html, dcc, Input, Output, callback
except ImportError:
    raise ImportError(
        "Dash is not installed. Install with: python -m pip install dash"
    )

terms = sorted(raw["term"].unique())
current_term = terms[-1]

app = Dash(__name__)

app.layout = html.Div(
    [
        html.H1("DATA101 Mini Dashboard"),
        html.Div(
            [
                html.Label("Program"),
                dcc.Dropdown(
                    options=[{"label": p, "value": p} for p in sorted(raw["program"].unique())],
                    value="CS",
                    id="program",
                    clearable=False,
                ),
            ],
            style={"maxWidth": "320px"},
        ),
        html.Div(
            [
                html.Label("Week range"),
                dcc.RangeSlider(
                    min=int(raw["week"].min()),
                    max=int(raw["week"].max()),
                    step=1,
                    value=[5, 13],
                    marks={w: str(w) for w in range(int(raw["week"].min()), int(raw["week"].max()) + 1, 2)},
                    id="week_range",
                    tooltip={"placement": "bottom", "always_visible": False},
                ),
            ],
            style={"marginTop": "12px"},
        ),
        html.Div(
            id="kpis",
            style={
                "marginTop": "12px",
                "fontFamily": "ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, 'Liberation Mono', 'Courier New', monospace",
            },
        ),
        dcc.Graph(id="trend", style={"marginTop": "12px"}),
        dcc.Graph(id="dist"),
    ],
    style={"maxWidth": "1100px", "margin": "24px auto", "padding": "0 16px"},
)


@callback(
    Output("trend", "figure"),
    Output("dist", "figure"),
    Output("kpis", "children"),
    Input("program", "value"),
    Input("week_range", "value"),
)
def update(program, week_range):
    view = filter_rows(raw, term=current_term, program=program, week_range=week_range)
    s = kpis(view)

    fig_trend = make_trend(view)
    fig_dist = make_distribution(view)

    kpi_text = (
        f"term={current_term} | program={program} | weeks={week_range} | "
        f"rows={s['rows']} | sections={s['sections']} | students={s['students']} | "
        f"pass_rate={s['pass_rate']:.1%} | avg_score={s['avg_score']:.1f}"
    )

    return fig_trend, fig_dist, kpi_text


# Uncomment to run locally
# if __name__ == "__main__":
#     app.run(debug=True)


# Part 9 - Reflection (write as a group)

Answer in 8 to 12 sentences total:

- What exact question is your app answering?
- Which interaction patterns did you implement (tooltips / zoom/pan / legend filtering / selection / linked views)?
- What did you do to keep the default view readable?
- What did you do to keep comparisons stable?
- What is your reset behavior?
- What is one pitfall you avoided (and how)?

Include at least 2 screenshots:
- one from the interactive lab
- one from your exported HTML or Dash output



## References (optional)

- Plotly Python docs: https://plotly.com/python/
- Dash docs: https://dash.plotly.com/

