## Source Control in VS Code

You don't have to be using the git commands in the terminal. **Source Control in VS Code**: https://code.visualstudio.com/docs/sourcecontrol/overview

**You will work in pairs**.

## Learning Objectives {.smaller}

By the end of today, you can:

- Explain Git’s mental model: **working tree → staging → commits**
- Use core commands:
  - `status`, `add`, `commit`, `log`, `diff`
- Create a GitHub repo and:
  - add `origin`, `push`, `pull`
- Write a README that lets anyone run your project
- Submit your Week 1 project by **Thu 18 Dec 2025, 11:59pm**


## README: your repo’s “front door”

A good README answers:

- What is this?
- What can it do?
- How do I install dependencies?
- How do I run it?
- What does output look like?

The following is a good starting point.

**Edit the following cell, and copy the result to your README.md**


# CSV Profiler

Generate a profiling report for a CSV file.

## Features

- The CLI writes:
    - `outputs/report.json`
    - `outputs/report.md`
- The Streamlit app can:
    - preview the report
    - download JSON + Markdown

## Setup

```bash
uv sync
```

## Run

If you have a `src/` folder:
  - On Mac/Linux: `export PYTHONPATH=src`
  - On Windows:   `$env:PYTHONPATH="src"`

Then run the app:

```bash
uv run streamlit run app.py
```

Or run the CLI:

```bash
uv run python -m csv_profiler.cli profile data/sample.csv --out-dir outputs
```


Then commit:

```bash
git add README.md
git commit -m "Document setup and usage"
```

---

## Add one screenshot (optional but strong)

- Take a screenshot of your Streamlit app (small)
- Add it to `assets/` or `images/`
- Reference it in README:

```markdown
![Streamlit UI](images/ui.png)
```

---

## What to include / exclude in your repo?

### Include

Things you will include alongside your source code:

- `data/` to prove the code works with sample inputs
- `README.md` to explain how to run the project
- `pyproject.toml` to specify dependencies (such that `uv sync` will install them after a `git clone`)

### Exclude

Beginners often accidentally upload unnecessary files that "clutter" the project. To exclude files, you create a `.gitignore` file in the root folder with the names of the files:


* `.venv/`: Your virtual environment (too large to upload).
* `__pycache__/`: Temporary Python files.
* `outputs/`: Generated reports that don't belong in the source code.

## Grading rubric (transparent)

| Area | Points | What we look for |
|---|---:|---|
| CLI works | 30 | Reads CSV, writes JSON + MD, helpful errors |
| Streamlit works | 30 | Upload CSV, preview, export JSON + MD |
| Code quality | 15 | Clear functions/modules, reasonable naming |
| Reproducibility | 15 | README + `pyproject.toml`, fresh-clone runnable |
| Git/GitHub hygiene | 10 | commits, `.gitignore`, pushed on time |

**Passing (Week 1):** ≥ 70/100

::: {.notes}
**Say:** Points are easiest to earn with a good README + clean runbook.
:::

## “Fresh clone” runbook (what graders do)

We will roughly do:

1. `git clone ...`
2. `uv sync`
3. `uv run python -m csv_profiler.cli profile data/sample.csv --out-dir outputs`
4. `uv run streamlit run app.py`

If any step is confusing → you lose points.

## Task 0 — Ensure a runnable sample CSV exists (8 minutes)

- Confirm `data/sample.csv` exists
- Keep it small (≤ ~20 rows)
- Include:
  - a numeric column
  - a text column
  - at least one missing value

**Checkpoint:** Your repo can be tested without extra files.



## Solution — Commit your sample CSV

```bash
git add data/sample.csv
git commit -m "Add sample CSV for grading"
```

::: aside
If you can’t share real data, create a synthetic sample.
:::





## Task 1 — Final local smoke test (10 minutes) {.smaller}

Run both (from repo root):

**Mac/Linux (bash/zsh)**

```bash
# Only if you have a src/ folder:
export PYTHONPATH=src

uv run python -m csv_profiler.cli profile data/sample.csv --out-dir outputs
uv run streamlit run app.py
```

**Windows PowerShell**

```powershell
# Only if you have a src/ folder:
$env:PYTHONPATH="src"

uv run python -m csv_profiler.cli profile data/sample.csv --out-dir outputs
uv run streamlit run app.py
```

**Checkpoint:** Both run without editing code.






## Solution — Task 1 checklist

- CLI produced:
  - `outputs/report.json`
  - `outputs/report.md`
- Streamlit:
  - uploads `sample.csv`
  - shows a preview
  - download buttons work

If one fails: fix it before touching GitHub.





## Task 2 — Create `requirements.txt` (10 minutes)

```bash
uv pip freeze > requirements.txt
```

**Checkpoint:** file exists and is not empty.





## Solution — Task 2

Commit:

```bash
git add requirements.txt
git commit -m "Add requirements.txt"
```





## Task 3 — Ensure `.gitignore` is correct (10 minutes)

Verify these are NOT tracked:

- `.venv/`
- `outputs/`
- `__pycache__/`

Check:

```bash
git status
git ls-files | head
```





## Solution — Task 3 (fix accidental tracking)

If you already **committed** `.venv/` or `outputs/` by mistake:

```bash
git rm -r --cached .venv outputs __pycache__
git commit -m "Stop tracking generated files"
```

Then ensure `.gitignore` contains those patterns.





## Task 4 — README “fresh clone” instructions (15 minutes)

Your README must include:

- setup steps
- CLI command
- Streamlit command
- expected outputs

**Checkpoint:** Your partner can follow it without help.





## Solution — Task 4 (minimum README) {.smaller}

```markdown
## Setup
uv venv -p 3.11
uv pip install -r requirements.txt

## Run CLI
# If you have a src/ folder:
#   Mac/Linux: export PYTHONPATH=src
#   Windows:   $env:PYTHONPATH="src"
uv run python -m csv_profiler.cli profile data/sample.csv --out-dir outputs

## Run GUI
# If you have a src/ folder:
#   Mac/Linux: export PYTHONPATH=src
#   Windows:   $env:PYTHONPATH="src"
uv run streamlit run app.py
```

Commit:

```bash
git add README.md
git commit -m "Finalize README runbook"
```






## Task 5 — Create GitHub repo + push (15 minutes)

1. Create repo on GitHub
2. Add remote
3. Push

**Checkpoint:** You can open GitHub and see your files.





## Solution — Task 5 (push commands)

```bash
git remote add origin <YOUR_REPO_URL>
git branch -M main
git push -u origin main
```

If you already had a remote but it’s wrong:

```bash
git remote set-url origin <YOUR_REPO_URL>
git push -u origin main
```





## Git tags (bookmark a commit)

A **tag** is a human-friendly name for a specific commit.

- It does **not** change your code
- It makes grading / “submission versions” easy to find later
- You can push a tag to GitHub just like a branch

Generic pattern:

```bash
git tag -a <tag-name> -m "message"
git push origin <tag-name>
```





## Task 6 — Tag your Week 1 submission (optional, 8 minutes)

Create a “submission tag” so it’s easy to find:

```bash
git tag -a week1-submission -m "Week 1 submission"
git push origin week1-submission
```

**Checkpoint:** GitHub shows the tag under Releases/Tags.





## Solution — Task 6 (when tags fail)

If push is rejected, first push commits:

```bash
git push
git push origin week1-submission
```



## Task 7 — Final sanity check from GitHub (10 minutes) {.smaller}

On GitHub:

- open README (renders correctly)
- verify file tree:
  - `src/csv_profiler/...`
  - `app.py`
  - `pyproject.toml`
  - `.gitignore`
  - `data/sample.csv`

**Checkpoint:** Repo looks “professional”.



```text
csv-profiler/
├── README.md
├── pyproject.toml
├── .gitignore
├── app.py
├── data/
│   └── sample.csv
├── outputs/          (ignored)
└── src/
    └── csv_profiler/
        ├── __init__.py
        ├── cli.py
        ├── io.py
        ├── profiling.py
        └── render.py
```



## Task 8 — Submission message (5 minutes)

Send the following to the instructor/portal:

- GitHub repo link
- Commit hash of your final submission
- Any known limitations (1–2 bullets)

**Checkpoint:** Submission sent before **11:59pm**.

# Week 1 wrap-up

## Day 1: Python & Tooling

**Goal:** Set up your environment, use the shell confidently, and write your first Python scripts that read a CSV and produce a basic profile.

- Navigate and inspect files using basic shell commands
- Create and use a Python environment with `uv`
- Write Python scripts with variables, basic types, and control flow
- Read a CSV and compute a **basic profiling summary**
- Write outputs to **Markdown** and **JSON** files

## Day 2: Functions + Files + Better Profiling

**Goal:** Turn yesterday’s script into **clean functions** and produce a **richer profiling report** for any CSV.

- Write **reusable functions** (args, defaults, `*args`, `**kwargs`, keyword-only)
- Use built-ins like `enumerate`, `zip`, `sorted`, `any`, `all`
- Generate clean **Markdown** with f-strings + `join()`
- Use `pathlib.Path` for safe paths
- Read CSVs with `csv.DictReader` and write JSON with `json.dumps()`
- Upgrade your profiler: **type inference + stats**
- Add **type hints** to clarify inputs/outputs (`name: str`, `-> float | None`)
- Use `lambda` for tiny one-off helper functions (mostly with `key=`)

## Day 3: Modules + OOP + Typer CLI

**Goal:** Turn your profiler into a **clean Python package** and expose it as a **real CLI**.

- Explain the difference between a **module** and a **package**
- Fix imports by understanding **`sys.path`** and `PYTHONPATH`
- Use core modules: `os`, `sys`, `time`, `shutil`
- Write a small class using **properties** to enforce constraints
- Explain (and recognize) **inheritance** and **polymorphism**
- Build a multi-command **Typer** CLI with `--help`
- Ship a CLI that generates **JSON + Markdown** reports

## Day 4: Streamlit GUI + (Optional) httpx

**Goal:** Build a **GUI** for your CSV Profiler that can **load data**, **preview results**, and **export JSON + Markdown**.

By the end of today, you can:

- Explain Streamlit’s **rerun model** and why widgets “cause reruns”
- Build a simple Streamlit UI using **sidebar + widgets**
- Load CSV data from:
  - file upload (required)
  - local path (optional)
  - URL via `httpx` (stretch)
- Reuse your package functions:
  - `profile_rows()`
  - `render_markdown()`
- Export profiling results as:
  - downloadable **JSON** + **Markdown**
  - saved to `outputs/` on disk (local run)

## Day 5: Git + GitHub + Ship Week 1

**Goal:** Publish a clean GitHub repo for your **CSV Profiler** (CLI + Streamlit) with a **clear README** and a reproducible setup.

- Explain Git’s mental model: **working tree → staging → commits**
- Use core commands:
  - `status`, `add`, `commit`, `log`, `diff`
- Create a GitHub repo and:
  - add `origin`, `push`, `pull`
- Write a README that lets anyone run your project
- Submit your Week 1 project by **Thu 18 Dec 2025, 11:59pm**

Next week: **Data Work (ETL + EDA)**