Implement production-grade ExamKit: offline exam prep toolkit with RAG pipeline #2

Copilot · 2025-11-09T16:55:28Z

ExamKit Pull Request

Description

Built complete offline exam preparation system that transforms lecture materials (video, transcripts, slides, exams) into cited study PDFs using local LLM and RAG pipeline.

Core pipeline: Ingestion → NLP (embeddings + FAISS) → LLM synthesis (Ollama) → PDF rendering (Typst/Pandoc)

Implementation

Ingestion (6 modules)

Multi-format parsing: VTT/SRT/TXT transcripts, PPTX/PDF slides, PDF exams
FFmpeg audio extraction, Tesseract OCR fallback
Normalized JSONL output with manifest validation

NLP (5 modules)

Embeddings: sentence-transformers (all-MiniLM-L6-v2, 384-dim)
Vector search: FAISS indexing with semantic retrieval
Topic mapping with coverage metrics

Synthesis (5 modules)

Ollama client for local LLM (llama3.2:8b)
Jinja2 prompt templates: definition, derivation, mistakes, revision
Citation tracking: [vid HH:MM:SS], [slide N], [exam Q2b]
Graphviz diagram generation

Rendering (3 modules)

Typst compilation (primary), Pandoc fallback
Professional templates with ToC, formulas, styling

QA & Reports (4 modules)

Formula validation, link checking, keyword coverage
Coverage CSV, citations JSON export

CLI Commands

# Process inputs
examkit ingest --manifest manifest.json

# Generate PDF with citations
examkit build --config config.yml --out notes.pdf --offline

# Coverage report
examkit report --session demo --open

Configuration

asr:
  engine: faster-whisper
  model: small
llm:
  engine: ollama
  model: llama3.2:8b
  temperature: 0.2
embedding:
  model: all-MiniLM-L6-v2
  dim: 384
retrieval:
  top_k: 8
pdf:
  engine: typst

Type of Change

New feature
Bug fix
Breaking change
Documentation update

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Testing

VTT/SRT/TXT transcript normalization (3 formats × edge cases)
PPTX/PDF slides parsing with OCR fallback
Exam structure extraction with marks detection
Topic mapping and coverage calculation
Citation formatting (video, slides, exam)
Template rendering (Typst, Jinja2)
Formula validation and link checking

Additional Context

Project Stats: 38 Python modules, 5 test suites, 9 templates, ~5K LOC

Security: CodeQL scan passed with 0 vulnerabilities. Offline mode enforced—no network calls during processing.

Dependencies: All free/OSS (faster-whisper, FAISS, spaCy, Ollama, Typst). System deps via Homebrew.

Docs: README (installation/usage), ARCHITECTURE (system design), CONTRIBUTING (dev guidelines), IMPLEMENTATION_SUMMARY (requirements checklist).

Original prompt

This section details on the original issue you should resolve

<issue_title>Master Build: Production-Grade Python "ExamKit" Project Generator (macOS, Offline, OSS)</issue_title>
<issue_description>Perfect—here’s a ready-to-use, master prompt you can paste into ChatGPT (or your favorite code-gen assistant) to generate the full Python project from the requirements we just defined.

✅ Master Build Prompt — “ExamKit” (Python, Local-Only, macOS)

Role: You are a senior Python engineer. Generate a complete, production-grade Python project called examkit that runs fully offline on macOS using only free/open-source tools. The project ingests lecture video, transcripts, slides, session topics, exam topics, and exam papers and outputs an exam-ready PDF with citations, formulas, diagrams, and a coverage report.

Follow every instruction precisely. Produce all files exactly as specified, with type hints, docstrings, and clear comments.

1) Objectives & Constraints

Local-only (offline): No network calls during processing. Everything must run on macOS with Apple Silicon/Intel.
Free/Open-source: Use faster-whisper, PyMuPDF, python-pptx, tesseract, ffmpeg, faiss-cpu, sentence-transformers, spaCy, matplotlib, jinja2, Typst (preferred) OR pandoc+wkhtmltopdf fallback. Use Ollama for local LLM (llama3.2:8b default).
Reproducible CLI pipeline with config (config/config.yml) and deterministic outputs.
Traceability: Every paragraph in PDF must cite sources (video timecodes, slide numbers, exam question ids).
Portability: No Docker required. Poetry environment or uv is fine.

2) Deliverables (Create all these files)

Project root

examkit/
  pyproject.toml
  README.md
  LICENSE
  Makefile
  .gitignore
  .env.example
  examkit/                      # Python package
    __init__.py
    cli.py                      # Typer-based CLI (or Click), entrypoint
    config.py                   # Pydantic models for config
    logging_utils.py
    utils/
      __init__.py
      io_utils.py
      text_utils.py
      timecode.py
      math_utils.py
    ingestion/
      __init__.py
      ingest.py                 # Manifest, validation, ffmpeg extract
      transcript_normalizer.py  # VTT/SRT/TXT → jsonl segments
      slides_parser.py          # PPTX→JSONL, images; PDF→JSONL via PyMuPDF+OCR
      exam_parser.py            # Exam paper structure/marks extraction
      ocr.py                    # Tesseract helper
    asr/
      __init__.py
      whisper_runner.py         # faster-whisper wrapper (offline)
    nlp/
      __init__.py
      splitter.py               # sentence/paragraph segmentation
      embeddings.py             # sentence-transformers; FAISS index
      topic_mapping.py          # syllabus mapping, coverage matrix
      retrieval.py              # RAG over FAISS
      spaCy_nlp.py              # NER, cleanup (en_core_web_sm)
    synthesis/
      __init__.py
      prompts.py                # Jinja templates for prompts
      ollama_client.py          # local LLM calls via subprocess/http
      composer.py               # section builders: def/intuit/derivation/examples/common mistakes
      citations.py              # manage refs: [vid hh:mm:ss][slide N][exam Q2b]
      diagrams.py               # Graphviz/Mermaid helpers
    render/
      __init__.py
      templater.py              # Jinja2 → Markdown/Typst
      typst_renderer.py         # Typst compile
      pandoc_renderer.py        # Fallback path
    qa/
      __init__.py
      checks.py                 # formulas compile, link checker, keyword recall
    reports/
      __init__.py
      coverage.py               # topic coverage csv/json
      export.py                 # write citations.json, coverage.csv
  config/
    config.yml
    templates/
      typst/
        main.typ                # Typst main template
        theme.typ               # typography/theme
      markdown/
        section.md.j2           # per-topic section template
        pdf_main.md.j2          # stitched MD template
      prompts/
        definition.j2
        derivation.j2
        mistakes.j2
        compare.j2
        fast_revision.j2
  input/
    sample/
      video/sample.mp4          # (stub, small or placeholder note)
      transcript/sample.vtt
      slides/sample.pptx
      exam/sample_exam.pdf
      topics/session_topics.yml
      topics/exam_topics.yml
  out/                          # build artifacts
  cache/
  logs/
  tests/
    test_ingestion.py
    test_parsers.py
    test_topic_mapping.py
    test_render.py

3) `pyproject.toml` (Poetry) — Required Dependencies

Include at least:

typer[all], rich, pydantic, pyyaml, tqdm
faster-whisper, ffmpeg-python
pymupdf, pdfminer.six, python-pptx
pytesseract, Pillow
sentence-transformers, faiss-cpu, spacy (en_core_web_sm in README)
`m...

Fixes Master Build: Production-Grade Python "ExamKit" Project Generator (macOS, Offline, OSS) #1

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Summary by CodeRabbit

Release Notes

New Features
- CLI tool for ingesting and processing educational content (videos, transcripts, slides, exams)
- Offline-first PDF generation for exam preparation materials
- Semantic search and automatic topic mapping with coverage analysis
- Quality assurance checks for generated content
- Citation tracking and coverage reporting
Documentation
- Architecture guide, contribution guidelines, and implementation summary
- Configuration templates and sample workflows
Chores
- Project structure, dependencies, and CI configuration

coderabbitai · 2025-11-09T16:55:35Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces ExamKit, a comprehensive Python application for generating exam preparation materials offline. Creates a complete project structure including CLI, ingestion pipeline for videos/transcripts/slides/exams, NLP processing with embeddings and topic mapping, content synthesis via local Ollama LLM, PDF rendering with Typst, QA checks, and reporting—all with Jinja2 templates, Poetry dependency management, and extensive documentation.

Changes

Cohort / File(s)	Summary
Configuration & Environment `.env.example`, `config/config.yml`, `pyproject.toml`, `Makefile`	Environment template with logging/Ollama settings; YAML config defaults for ASR, LLM, embeddings, retrieval, PDF, diagrams, offline mode; Poetry project manifest with dev tools; Makefile with setup, test, lint, format, build-demo, clean targets
Jinja2 Templates `config/templates/markdown/pdf_main.md.j2`, `config/templates/markdown/section.md.j2`, `config/templates/prompts/*.j2`, `config/templates/typst/main.typ`, `config/templates/typst/theme.typ`	Markdown document template with metadata, TOC, coverage summary; section template for topics (definition, formulas, derivation, examples, mistakes, revision); 6 prompt templates (definition, derivation, mistakes, compare, revision, examples) for LLM generation; Typst document configuration function and theme system with colors, boxes, and citation styling
GitHub & Project Meta `.github/PULL_REQUEST_TEMPLATE.md`, `.gitignore`, `LICENSE`, `README.md`	PR template with description/checklist sections; comprehensive ignore patterns for Python/IDE/media artifacts; MIT License; project overview and setup instructions
Documentation `ARCHITECTURE.md`, `CONTRIBUTING.md`, `IMPLEMENTATION_SUMMARY.md`	Architecture diagram and module descriptions with data flow; contribution guidelines with code standards, PR workflow, testing, setup; implementation completion summary with deliverables, dependencies, acceptance criteria
Core CLI & Config `examkit/__init__.py`, `examkit/cli.py`, `examkit/config.py`, `examkit/logging_utils.py`	Package metadata (version 0.1.0, contributors); Typer CLI with `ingest`, `build`, `report`, `cache` commands and session management; Pydantic config models for ASR/LLM/embedding/retrieval/PDF with YAML I/O; centralized logging setup with Rich console support
Ingestion Pipeline `examkit/ingestion/__init__.py`, `examkit/ingestion/ingest.py`, `examkit/ingestion/exam_parser.py`, `examkit/ingestion/slides_parser.py`, `examkit/ingestion/transcript_normalizer.py`, `examkit/ingestion/ocr.py`	Manifest validation and orchestration pipeline with ffmpeg audio extraction, transcript normalization (VTT/SRT/TXT); PDF exam parser extracting marks/sections/questions; PPTX/PDF slide parsing with OCR fallback; Tesseract-based OCR with availability guards
NLP & Embeddings `examkit/nlp/__init__.py`, `examkit/nlp/embeddings.py`, `examkit/nlp/retrieval.py`, `examkit/nlp/splitter.py`, `examkit/nlp/spacy_nlp.py`, `examkit/nlp/topic_mapping.py`	Sentence-transformers embeddings with FAISS indexing; RAG retrieval with deduplication, source diversity ranking, confidence filtering; spaCy-based NLP (entities, phrases, lemmatization, language patterns); text chunking/merging; topic loading, chunk-to-topic mapping, coverage calculation, gap identification
Synthesis & LLM `examkit/synthesis/__init__.py`, `examkit/synthesis/composer.py`, `examkit/synthesis/ollama_client.py`, `examkit/synthesis/prompts.py`, `examkit/synthesis/citations.py`, `examkit/synthesis/diagrams.py`	Main orchestration pipeline: load data, embed chunks, map to topics, retrieve context, generate content via LLM, render sections, compile PDF, export reports; Ollama HTTP client with availability checks, completion/chat generation, model pulling; Jinja prompt template rendering; CitationManager for tracking sources (transcript/slides/exam); Graphviz/Mermaid diagram generation with fallback handling
Rendering `examkit/render/__init__.py`, `examkit/render/templater.py`, `examkit/render/typst_renderer.py`, `examkit/render/pandoc_renderer.py`	Jinja environment setup and Markdown/Typst document rendering; Typst compilation with Markdown wrapping; Pandoc fallback PDF generation with XeLaTeX; template and section rendering helpers
QA & Reporting `examkit/qa/__init__.py`, `examkit/qa/checks.py`, `examkit/reports/__init__.py`, `examkit/reports/coverage.py`, `examkit/reports/export.py`	Quality checks: LaTeX formula validation, markdown link verification, keyword recall, citation detection, equation consistency; coverage reporting with CSV export and gap identification; session report generation with coverage/QA/citations aggregation and text/JSON export
Utilities `examkit/utils/__init__.py`, `examkit/utils/io_utils.py`, `examkit/utils/text_utils.py`, `examkit/utils/math_utils.py`, `examkit/utils/timecode.py`	File I/O (JSON, JSONL, text, directory management); text processing (cleaning, tokenization, keyword extraction, truncation); LaTeX formula extraction/validation, coverage calculation, score normalization, symbol extraction; video timecode conversion and citation formatting
ASR Module `examkit/asr/__init__.py`, `examkit/asr/whisper_runner.py`	Faster-Whisper offline ASR wrapper with 16kHz mono WAV conversion, segment to timecode mapping, VTT export
Test Suite `tests/__init__.py`, `tests/test_ingestion.py`, `tests/test_parsers.py`, `tests/test_render.py`, `tests/test_topic_mapping.py`	Unit tests for transcript parsing (VTT/SRT/TXT), manifest validation, exam/text/math utilities, rendering, config loading, coverage calculation, topic mapping, segmentation, QA checks
Sample Fixtures `input/sample/manifest.json`, `input/sample/exam/README.md`, `input/sample/slides/README.md`, `input/sample/transcript/sample.vtt`, `input/sample/video/README.md`, `input/sample/topics/exam_topics.yml`, `input/sample/topics/session_topics.yml`	Manifest metadata for lecture session; sample exam structure with sections and marks; sample WebVTT transcript; topic definitions for session and exam; READMEs explaining sample content and generation

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant Ingest as Ingestion
    participant NLP
    participant Synth as Synthesis
    participant Render
    participant Report

    User->>CLI: ingest --manifest
    CLI->>Ingest: validate_manifest
    Ingest->>Ingest: extract_audio_from_video (ffmpeg)
    Ingest->>Ingest: normalize_transcript (VTT/SRT/TXT)
    Ingest->>Ingest: parse_slides (PPTX/PDF + OCR)
    Ingest->>Ingest: parse_exam (marks, questions)
    Ingest-->>CLI: cache → segments.jsonl

    User->>CLI: build --config --session_id
    CLI->>NLP: generate_embeddings
    NLP->>NLP: load FAISS index
    NLP-->>Synth: embeddings, index
    Synth->>NLP: map_chunks_to_topics
    NLP-->>Synth: topic_mapping, coverage
    
    Synth->>Synth: For each topic:<br/>retrieve_context_for_topic
    Synth->>Synth: RAG + Ollama LLM<br/>generate definition/derivation/etc
    Synth->>Synth: CitationManager.add_citation
    Synth-->>Synth: sections with citations

    Synth->>Render: render_markdown_document
    Render->>Render: setup Jinja environment
    Render->>Render: render section templates
    Render-->>Synth: markdown content
    Synth->>Render: compile_typst_to_pdf
    Render->>Render: Typst or Pandoc compile
    Render-->>Synth: out/session.pdf

    Synth->>Report: generate_report
    Report->>Report: run_all_checks
    Report-->>Report: coverage.csv, citations.json, notes.md
    CLI-->>User: ✓ Pipeline complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Scope & heterogeneity: 100+ files spanning configuration, templates, 8 Python packages with distinct responsibilities (ingestion, NLP, synthesis, rendering, QA, reporting, utilities, ASR), external tool integration, and tests. While patterns are consistent, each domain requires separate reasoning.
Logic density: Moderate to high in orchestration points (composer.py, ingest.py, embeddings/FAISS integration, Ollama client, rendering pipeline). Straightforward utility functions balance heavier logic.
External dependencies: Multiple third-party tools (ffmpeg, Tesseract, Ollama, Typst/Pandoc, FAISS, sentence-transformers) with availability guards and fallback strategies require careful verification.
Specific attention areas:
- examkit/synthesis/composer.py — orchestrates entire pipeline; complex error handling per topic
- examkit/ingestion/ingest.py — ffmpeg subprocess calls and file I/O error paths
- examkit/nlp/embeddings.py — FAISS index creation and search logic
- examkit/synthesis/ollama_client.py — HTTP client with offline validation
- examkit/render/typst_renderer.py — subprocess management and Markdown-to-Typst conversion
- Citation tracking and chunk metadata flow across modules
- Template rendering contexts and Jinja2 safety

Poem

🐰 Behold! A kit for exams so bright,
With whispered words and topics right,
From video streams to PDF dreams,
Citations flow through every seam,
Offline magic, local and true—
ExamKit's here to prep for you! 📚✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely summarizes the main deliverable: a production-grade, offline exam prep toolkit with RAG pipeline named ExamKit.
Linked Issues check	✅ Passed	The PR comprehensively implements all major requirements from #1: complete offline ingestion pipeline (VTT/SRT/TXT/PPTX/PDF/exam parsing), NLP (embeddings, FAISS, topic mapping), synthesis (Ollama, Jinja templates, citations), rendering (Typst/Pandoc), QA checks, CLI, config, tests, and documentation.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue #1 requirements: configuration templates, ingestion modules, NLP components, synthesis pipeline, rendering infrastructure, QA framework, utilities, CLI, tests, documentation, and sample fixtures are all specified deliverables.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…, nlp, synthesis) Co-authored-by: thecoder8890 <33773206+thecoder8890@users.noreply.github.com>

Co-authored-by: thecoder8890 <33773206+thecoder8890@users.noreply.github.com>

thecoder8890 · 2025-11-10T04:12:52Z

@coderabbitai review

coderabbitai · 2025-11-10T04:13:09Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

thecoder8890 · 2025-11-10T04:13:48Z

@coderabbitai review

coderabbitai · 2025-11-10T04:14:03Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-11-10T04:15:15Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #3

@thecoder8890

Docstrings generation was requested by @thecoder8890. * #2 (comment) The following files were modified: * `examkit/asr/whisper_runner.py` * `examkit/cli.py` * `examkit/config.py` * `examkit/ingestion/exam_parser.py` * `examkit/ingestion/ingest.py` * `examkit/ingestion/ocr.py` * `examkit/ingestion/slides_parser.py` * `examkit/ingestion/transcript_normalizer.py` * `examkit/logging_utils.py` * `examkit/nlp/embeddings.py` * `examkit/nlp/retrieval.py` * `examkit/nlp/spacy_nlp.py` * `examkit/nlp/splitter.py` * `examkit/nlp/topic_mapping.py` * `examkit/qa/checks.py` * `examkit/render/pandoc_renderer.py` * `examkit/render/templater.py` * `examkit/render/typst_renderer.py` * `examkit/reports/coverage.py` * `examkit/reports/export.py` * `examkit/synthesis/citations.py` * `examkit/synthesis/composer.py` * `examkit/synthesis/diagrams.py` * `examkit/synthesis/ollama_client.py` * `examkit/synthesis/prompts.py` * `examkit/utils/io_utils.py` * `examkit/utils/math_utils.py` * `examkit/utils/text_utils.py` * `examkit/utils/timecode.py` * `tests/test_render.py`

coderabbitai

Actionable comments posted: 19

🧹 Nitpick comments (43)

pyproject.toml (1)
61-66: Consider stricter MyPy configuration for production-grade code.

The PR objectives emphasize "production-grade" development with type hints, but the current MyPy settings are quite permissive:

disallow_untyped_defs = false allows functions without type annotations

ignore_missing_imports = true suppresses import resolution errors

For a production-grade codebase, consider gradually tightening these settings as the project matures.

Apply this diff to enable stricter type checking:
 [tool.mypy]
 python_version = "3.11"
 warn_return_any = true
 warn_unused_configs = true
-disallow_untyped_defs = false
-ignore_missing_imports = true
+disallow_untyped_defs = true
+ignore_missing_imports = false
+# Temporarily allow untyped definitions for gradual adoption
+# disallow_untyped_defs = false
Alternatively, keep the current permissive settings but add a comment indicating plans for future strictness, and consider enabling per-module overrides as modules become fully typed.
examkit/ingestion/ocr.py (2)
16-43: Consider using logging.exception for better error diagnostics.

The current error logging provides the error message but not the full traceback. Using logging.exception would include the stack trace, making debugging easier.

Apply this diff:
     except Exception as e:
-        logger.error(f"OCR failed for {image_path}: {e}")
+        logger.exception(f"OCR failed for {image_path}: {e}")
         return ""
46-72: Consider using logging.exception for better error diagnostics.

Similar to extract_text_with_ocr, using logging.exception would provide more debugging context when OCR confidence calculation fails.

Apply this diff:
     except Exception as e:
-        logger.error(f"Failed to get OCR confidence for {image_path}: {e}")
+        logger.exception(f"Failed to get OCR confidence for {image_path}: {e}")
         return 0.0
examkit/utils/math_utils.py (1)
6-6: Remove unused import.

The Optional type is imported but never used in any function signature.

Apply this diff:
-from typing import List, Optional
+from typing import List
examkit/nlp/spacy_nlp.py (1)
75-98: Non-deterministic ordering of key phrases.

Line 98 converts phrases to a set() and then slices, which produces arbitrary ordering since sets are unordered in Python. If consistent ordering is important, consider preserving insertion order.

If deterministic ordering is desired, apply this diff:
-    # Return unique phrases, limited to top_n
-    return list(set(phrases))[:top_n]
+    # Return unique phrases, limited to top_n (preserve order)
+    seen = set()
+    unique_phrases = []
+    for phrase in phrases:
+        if phrase not in seen:
+            seen.add(phrase)
+            unique_phrases.append(phrase)
+    return unique_phrases[:top_n]
README.md (2)
21-69: Consider adding language specifiers for better rendering.

The ASCII architecture diagram renders correctly but adding a language specifier would improve syntax highlighting and rendering consistency.

Apply this change:
-```
+```text
 ┌─────────────────┐
 │  Input Sources  │
 │ Video, Slides,  │
259-284: Consider adding language specifier for project structure.

The file tree would benefit from a language specifier for consistent rendering.

Apply this change:
-```
+```text
 examkit/
 ├── examkit/              # Main package
examkit/synthesis/ollama_client.py (7)
44-52: Use explicit Optional type annotation for logger parameter.

PEP 484 recommends using explicit Optional[T] or T | None rather than implicit Optional.

Apply this diff:
 def generate_completion(
     prompt: str,
     model: str = "llama3.2:8b",
     system_prompt: Optional[str] = None,
     temperature: float = 0.2,
     max_tokens: int = 900,
     offline: bool = True,
-    logger: logging.Logger = None
+    logger: Optional[logging.Logger] = None
 ) -> str:
101-104: Use logging.exception and preserve exception chain.

Replace logging.error with logging.exception to include the stack trace, and use raise ... from e to preserve the exception chain.

Apply this diff:
     except requests.exceptions.RequestException as e:
         if logger:
-            logger.error(f"Ollama request failed: {e}")
-        raise RuntimeError(f"Failed to generate completion: {e}")
+            logger.exception("Ollama request failed")
+        raise RuntimeError(f"Failed to generate completion: {e}") from e
107-113: Consider adding offline parameter for consistency.

generate_completion has an offline parameter to enforce availability checks, but generate_chat_completion always checks availability. Consider adding the parameter for API consistency.

Apply this diff if you want consistency:
 def generate_chat_completion(
     messages: list,
     model: str = "llama3.2:8b",
     temperature: float = 0.2,
     max_tokens: int = 900,
-    logger: logging.Logger = None
+    offline: bool = True,
+    logger: Optional[logging.Logger] = None
 ) -> str:
Then update the availability check to respect the parameter:
-    if not check_ollama_available():
+    if offline and not check_ollama_available():
         raise RuntimeError("Ollama not available")
107-113: Use explicit Optional type annotation for logger parameter.

Same issue as in generate_completion.

Apply this diff:
 def generate_chat_completion(
     messages: list,
     model: str = "llama3.2:8b",
     temperature: float = 0.2,
     max_tokens: int = 900,
-    logger: logging.Logger = None
+    logger: Optional[logging.Logger] = None
 ) -> str:
149-152: Use logging.exception and preserve exception chain.

Same logging issue as in generate_completion.

Apply this diff:
     except requests.exceptions.RequestException as e:
         if logger:
-            logger.error(f"Ollama chat request failed: {e}")
-        raise RuntimeError(f"Failed to generate chat completion: {e}")
+            logger.exception("Ollama chat request failed")
+        raise RuntimeError(f"Failed to generate chat completion: {e}") from e
155-165: Use explicit Optional type annotation for logger parameter.

Same issue in the pull_model function.

Apply this diff:
-def pull_model(model: str, logger: logging.Logger = None) -> bool:
+def pull_model(model: str, logger: Optional[logging.Logger] = None) -> bool:
     """
     Pull a model using Ollama CLI.
177-180: Use logging.exception for better error diagnostics.

Using logging.exception instead of logging.error will automatically include the stack trace.

Apply this diff:
     except Exception as e:
         if logger:
-            logger.error(f"Failed to pull model: {e}")
+            logger.exception("Failed to pull model")
         return False
examkit/synthesis/diagrams.py (2)
151-154: Use explicit Optional type annotation for logger parameter.

PEP 484 recommends explicit Optional[T] or T | None rather than implicit Optional.

Apply this diff:
 def generate_mermaid_diagram(
     mermaid_code: str,
     output_path: Path,
-    logger: logging.Logger = None
+    logger: Optional[logging.Logger] = None
 ) -> bool:
189-192: Use logging.exception for better error diagnostics.

Using logging.exception instead of logging.error automatically includes the stack trace.

This is addressed in the previous comment's diff, but if applied separately:
     except subprocess.CalledProcessError as e:
         if logger:
-            logger.error(f"Failed to generate Mermaid diagram: {e}")
+            logger.exception("Failed to generate Mermaid diagram")
         return False
config/templates/markdown/pdf_main.md.j2 (1)
10-12: Enhance anchor generation to handle special characters.

The anchor generation using {{ section.topic | lower | replace(' ', '-') }} only handles spaces. Topics containing special characters (parentheses, slashes, apostrophes, colons, etc.) could produce invalid or non-unique anchors, resulting in broken TOC links.

Consider creating a custom Jinja2 filter for robust anchor generation that:

Converts to lowercase

Replaces spaces and special characters with hyphens

Removes or escapes problematic characters

Ensures uniqueness (e.g., by appending a counter for duplicates)

Example implementation in the templating module:
import re

def slugify(text: str) -> str:
    """Convert text to a valid anchor slug."""
    # Convert to lowercase and replace spaces/special chars with hyphens
    slug = re.sub(r'[^\w\s-]', '', text.lower())
    slug = re.sub(r'[-\s]+', '-', slug)
    return slug.strip('-')
Then register it in your Jinja2 environment and use: {{ section.topic | slugify }}
tests/test_topic_mapping.py (5)

24-37: Consider adding edge case tests.

While the current test validates basic coverage calculation, consider adding tests for edge cases such as topics with zero chunks or 100% coverage to ensure robustness.

40-47: Tighten the assertion for more precise validation.

The assertion len(chunks) > 2 is quite weak. Given the 155-character second segment with max_chunk_size=50, you should expect at least 4-5 chunks total. Consider asserting a more specific range like assert len(chunks) >= 4.

50-58: Strengthen the assertion to validate exact merge behavior.

The assertion len(merged) < len(segments) is weak. Since the first two short segments should merge into one and the third remains separate, you should expect exactly 2 segments: assert len(merged) == 2.

61-73: LGTM! Consider testing additional citation types.

The test appropriately validates video citation formatting. For more comprehensive coverage, consider adding tests for slide and exam citation types mentioned in the PR objectives.

76-86: LGTM! Consider adding tests for invalid formulas.

The test validates basic QA checks. For better coverage, consider adding tests for invalid LaTeX syntax and multiple citation types to ensure robust error detection.

tests/test_parsers.py (2)

10-15: LGTM! Test covers common mark formats.

The test appropriately validates extraction of marks in different bracket styles and the absence case. For additional robustness, consider edge cases like multiple marks in one string.

18-35: Strengthen assertions to validate complete exam structure.

The assertion len(questions) >= 1 is weak for an exam with 2 questions and sub-parts. Consider asserting len(questions) >= 2 and validating that sub-parts (a, b) are correctly parsed.
examkit/logging_utils.py (1)
16-73: Validate log_file is not a directory before creating FileHandler.

If log_file points to an existing directory rather than a file path, FileHandler(log_file) at line 64 will fail with a confusing error. Consider adding a check:
if log_file:
    if log_file.exists() and log_file.is_dir():
        raise ValueError(f"log_file must be a file path, not a directory: {log_file}")
    log_file.parent.mkdir(parents=True, exist_ok=True)
    file_handler = logging.FileHandler(log_file)
ARCHITECTURE.md (1)

400-400: Minor: Optional style improvement for Docker section.

The static analysis tool flagged "Could be containerized" as lacking a subject. While acceptable for documentation, you could optionally revise to "The application could be containerized..." for more formal writing.
tests/test_render.py (5)
12-27: LGTM! Consider expanding test coverage for other section types.

The test validates basic markdown rendering. For more comprehensive coverage, consider testing additional section types mentioned in the PR (Derivation, Examples, Common Mistakes, Quick Revision).

30-57: Consider using pytest's tmp_path fixture for cleaner temp file handling.

The manual tempfile creation and cleanup works, but pytest's tmp_path fixture provides automatic cleanup and is more idiomatic:
def test_typst_wrapper_creation(tmp_path):
    """Test Typst wrapper creation."""
    from examkit.render.typst_renderer import create_typst_wrapper_for_markdown
    
    md_content = """# Test Title
...
"""
    temp_path = tmp_path / "test.md"
    temp_path.write_text(md_content)
    
    typst_content = create_typst_wrapper_for_markdown(temp_path)
    assert "= Test Title" in typst_content
    assert "== Section 1" in typst_content
    assert "=== Subsection" in typst_content
60-81: Consider using pytest's tmp_path fixture here as well.

Similar to the previous test, using tmp_path would simplify the temp file handling:
def test_config_loading(tmp_path):
    """Test configuration loading."""
    import yaml
    
    config_data = {
        "asr": {"model": "small"},
        "llm": {"model": "llama3.2:8b"},
        "offline": True
    }
    
    temp_path = tmp_path / "config.yml"
    temp_path.write_text(yaml.dump(config_data))
    
    config = ExamKitConfig.from_yaml(temp_path)
    assert config.asr.model == "small"
    assert config.llm.model == "llama3.2:8b"
    assert config.offline is True
84-89: Strengthen Jinja environment test to verify actual functionality.

The current test only checks that setup returns a non-None value. Consider verifying the environment's template loader path, or better yet, test that it can actually load and render a template:
def test_jinja_template_setup():
    """Test Jinja2 template environment setup."""
    from examkit.render.templater import setup_jinja_environment
    
    env = setup_jinja_environment()
    assert env is not None
    
    # Verify we can list templates
    templates = env.list_templates()
    assert len(templates) > 0
    
    # Or verify loader points to correct directory
    assert env.loader is not None
92-109: Tighten the tolerance for coverage mean calculation.

Line 103 uses pytest.approx(48.33, rel=0.1) which allows 10% error. For exact arithmetic (145/3 = 48.333...), you can use a much tighter tolerance like rel=0.01 or even exact comparison with more decimal places:
assert stats["mean"] == pytest.approx(48.333, rel=0.01)
examkit/config.py (2)
81-94: Add error handling for file I/O and YAML parsing.

The method lacks error handling for common failure cases. Consider adding:
@classmethod
def from_yaml(cls, path: Path) -> "ExamKitConfig":
    """
    Load configuration from a YAML file.

    Args:
        path: Path to the YAML configuration file.

    Returns:
        ExamKitConfig instance.
        
    Raises:
        FileNotFoundError: If the config file doesn't exist.
        ValueError: If the YAML is invalid or doesn't contain a dict.
    """
    if not path.exists():
        raise FileNotFoundError(f"Config file not found: {path}")
    
    with open(path, "r") as f:
        data = yaml.safe_load(f)
    
    if not isinstance(data, dict):
        raise ValueError(f"Config file must contain a YAML dict, got {type(data)}")
    
    return cls(**data)
96-104: Add directory creation and improve error handling.

The method should ensure the parent directory exists before writing:
def to_yaml(self, path: Path) -> None:
    """
    Save configuration to a YAML file.

    Args:
        path: Path to save the YAML configuration file.
        
    Raises:
        OSError: If writing the file fails.
    """
    path.parent.mkdir(parents=True, exist_ok=True)
    
    with open(path, "w") as f:
        yaml.dump(self.model_dump(), f, default_flow_style=False, sort_keys=False)
examkit/render/templater.py (9)
15-15: Use explicit Path | None type annotation.

Per PEP 484, implicit Optional is prohibited. Update the type hint to be explicit.

Apply this diff:
-def setup_jinja_environment(templates_dir: Path = None) -> Environment:
+def setup_jinja_environment(templates_dir: Path | None = None) -> Environment:
28-32: Consider autoescape setting for template security.

While this module generates Markdown/Typst (not HTML), explicitly setting autoescape=True or using select_autoescape() is a security best practice if templates might ever include user-controlled content.

Based on learnings

37-41: Remove unused config parameter or document its intended purpose.

The config parameter is declared but never used in the function body. Either remove it or add a comment explaining why it's reserved for future use.

54-61: Remove unnecessary f-string prefixes.

Lines 56, 58, 59, and 60 use f-strings without any placeholders. Use regular strings instead for clarity and minor performance improvement.

Apply this diff:
         f"# Exam Preparation Notes - {session_id}",
-        f"",
+        "",
         f"**Generated:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
-        f"",
-        f"---",
-        f""
+        "",
+        "---",
+        ""
77-101: Remove unnecessary f-string prefixes in section rendering.

Lines 78, 84, 88, 92, 96, and 100 use f-strings without placeholders.

Apply this diff to the subsection headers:
         if section.get("definition"):
-            lines.append(f"### Definition\n")
+            lines.append("### Definition\n")
             lines.append(f"{section['definition']}\n")
             if citations:
                 lines.append(f"*Sources: {citations}*\n")
 
         if section.get("key_formulas"):
-            lines.append(f"### Key Formulas\n")
+            lines.append("### Key Formulas\n")
             lines.append(f"{section['key_formulas']}\n")
 
         if section.get("derivation"):
-            lines.append(f"### Derivation\n")
+            lines.append("### Derivation\n")
             lines.append(f"{section['derivation']}\n")
 
         if section.get("examples"):
-            lines.append(f"### Worked Examples\n")
+            lines.append("### Worked Examples\n")
             lines.append(f"{section['examples']}\n")
 
         if section.get("mistakes"):
-            lines.append(f"### Common Mistakes\n")
+            lines.append("### Common Mistakes\n")
             lines.append(f"{section['mistakes']}\n")
 
         if section.get("revision"):
-            lines.append(f"### Quick Revision\n")
+            lines.append("### Quick Revision\n")
             lines.append(f"{section['revision']}\n")
108-112: Remove unused config parameter or document its intended purpose.

Same issue as in render_markdown_document - the config parameter is unused.

125-133: Remove unnecessary f-string prefixes.

Lines 128, 130, and 131 use f-strings without placeholders.

Apply this diff:
         "#import \"theme.typ\": *",
         "",
-        f"#show: doc => conf(",
-        f"  title: \"Exam Notes - {session_id}\",",
-        f"  date: datetime.today().display(),",
-        f"  doc",
+        "#show: doc => conf(",
+        f"  title: \"Exam Notes - {session_id}\",",
+        "  date: datetime.today().display(),",
+        "  doc",
158-158: Use explicit Path | None type annotation.

Same issue as line 15 - use explicit optional type per PEP 484.

Apply this diff:
-def load_template(template_name: str, templates_dir: Path = None) -> Template:
+def load_template(template_name: str, templates_dir: Path | None = None) -> Template:
176-176: Use explicit Path | None type annotation.

Same PEP 484 issue as lines 15 and 158.

Apply this diff:
 def render_section_template(
     template_name: str,
     context: Dict[str, Any],
-    templates_dir: Path = None
+    templates_dir: Path | None = None
 ) -> str:
examkit/qa/checks.py (1)
13-13: Use explicit logging.Logger | None type annotation.

Per PEP 484, implicit Optional is prohibited. This applies to all logger parameters in this file.

Apply this diff pattern to lines 13, 44, 86, 119, 150, 194, 195:
-def check_formula_compilation(content: str, logger: logging.Logger = None) -> Dict[str, Any]:
+def check_formula_compilation(content: str, logger: logging.Logger | None = None) -> Dict[str, Any]:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3fe2ee4 and 83b53b0.

📒 Files selected for processing (70)

.env.example (1 hunks)
.github/PULL_REQUEST_TEMPLATE.md (1 hunks)
.gitignore (1 hunks)
ARCHITECTURE.md (1 hunks)
CONTRIBUTING.md (1 hunks)
IMPLEMENTATION_SUMMARY.md (1 hunks)
LICENSE (1 hunks)
Makefile (1 hunks)
README.md (1 hunks)
config/config.yml (1 hunks)
config/templates/markdown/pdf_main.md.j2 (1 hunks)
config/templates/markdown/section.md.j2 (1 hunks)
config/templates/prompts/compare.j2 (1 hunks)
config/templates/prompts/definition.j2 (1 hunks)
config/templates/prompts/derivation.j2 (1 hunks)
config/templates/prompts/fast_revision.j2 (1 hunks)
config/templates/prompts/mistakes.j2 (1 hunks)
config/templates/typst/main.typ (1 hunks)
config/templates/typst/theme.typ (1 hunks)
examkit/__init__.py (1 hunks)
examkit/asr/__init__.py (1 hunks)
examkit/asr/whisper_runner.py (1 hunks)
examkit/cli.py (1 hunks)
examkit/config.py (1 hunks)
examkit/ingestion/__init__.py (1 hunks)
examkit/ingestion/exam_parser.py (1 hunks)
examkit/ingestion/ingest.py (1 hunks)
examkit/ingestion/ocr.py (1 hunks)
examkit/ingestion/slides_parser.py (1 hunks)
examkit/ingestion/transcript_normalizer.py (1 hunks)
examkit/logging_utils.py (1 hunks)
examkit/nlp/__init__.py (1 hunks)
examkit/nlp/embeddings.py (1 hunks)
examkit/nlp/retrieval.py (1 hunks)
examkit/nlp/spacy_nlp.py (1 hunks)
examkit/nlp/splitter.py (1 hunks)
examkit/nlp/topic_mapping.py (1 hunks)
examkit/qa/__init__.py (1 hunks)
examkit/qa/checks.py (1 hunks)
examkit/render/__init__.py (1 hunks)
examkit/render/pandoc_renderer.py (1 hunks)
examkit/render/templater.py (1 hunks)
examkit/render/typst_renderer.py (1 hunks)
examkit/reports/__init__.py (1 hunks)
examkit/reports/coverage.py (1 hunks)
examkit/reports/export.py (1 hunks)
examkit/synthesis/__init__.py (1 hunks)
examkit/synthesis/citations.py (1 hunks)
examkit/synthesis/composer.py (1 hunks)
examkit/synthesis/diagrams.py (1 hunks)
examkit/synthesis/ollama_client.py (1 hunks)
examkit/synthesis/prompts.py (1 hunks)
examkit/utils/__init__.py (1 hunks)
examkit/utils/io_utils.py (1 hunks)
examkit/utils/math_utils.py (1 hunks)
examkit/utils/text_utils.py (1 hunks)
examkit/utils/timecode.py (1 hunks)
input/sample/exam/README.md (1 hunks)
input/sample/manifest.json (1 hunks)
input/sample/slides/README.md (1 hunks)
input/sample/topics/exam_topics.yml (1 hunks)
input/sample/topics/session_topics.yml (1 hunks)
input/sample/transcript/sample.vtt (1 hunks)
input/sample/video/README.md (1 hunks)
pyproject.toml (1 hunks)
tests/__init__.py (1 hunks)
tests/test_ingestion.py (1 hunks)
tests/test_parsers.py (1 hunks)
tests/test_render.py (1 hunks)
tests/test_topic_mapping.py (1 hunks)

🧰 Additional context used

🪛 checkmake (0.2.2)

Makefile

[warning] 3-3: Target body for "help" exceeds allowed length of 5 (9).

(maxbodylength)

[warning] 54-54: Target body for "clean" exceeds allowed length of 5 (6).

(maxbodylength)

[warning] 1-1: Missing required phony target "all"

(minphony)

🪛 dotenv-linter (4.0.0)

.env.example

[warning] 13-13: [UnorderedKey] The MAX_WORKERS key should go before the OFFLINE_MODE key

(UnorderedKey)

[warning] 18-18: [UnorderedKey] The LOGS_DIR key should go before the OUTPUT_DIR key

(UnorderedKey)

🪛 LanguageTool

CONTRIBUTING.md

[grammar] ~98-~98: Use a hyphen to join words.
Context: ...tion signatures - Docstrings: Google style docstrings for all public function...

(QB_NEW_EN_HYPHEN)

.github/PULL_REQUEST_TEMPLATE.md

[style] ~5-~5: Consider using a different verb for a more formal wording.
Context: ...mmary of the changes and which issue is fixed. ## Type of Change - [ ] Bug fix - [ ...

(FIX_RESOLVE)

ARCHITECTURE.md

[style] ~400-~400: To form a complete sentence, be sure to include a subject.
Context: ...opriate resources ### Docker (Future) Could be containerized with all dependencies ...

(MISSING_IT_THERE)

IMPLEMENTATION_SUMMARY.md

[style] ~178-~178: ‘vid’ is informal. Consider replacing it.
Context: ...ples, Mistakes, Revision ✅ - Citations: [vid hh:mm:ss], [slide N], [exam Q2b] ✅...

(VID)

🪛 markdownlint-cli2 (0.18.1)

README.md

3-3: Emphasis used instead of a heading