Production-grade exam preparation toolkit for macOS - Offline, Local-Only Processing
ExamKit is a comprehensive Python application that transforms lecture materials (videos, transcripts, slides, exam papers) into exam-ready study notes with citations, formulas, and coverage reports.
- π₯ Multi-Source Ingestion: Process videos, transcripts (VTT/SRT), slides (PPTX/PDF), and exam papers
- π£οΈ Offline ASR: Transcribe audio using faster-whisper (no cloud APIs)
- π§ Local LLM: Generate content using Ollama (llama3.2:8b) running locally
- π RAG Pipeline: Semantic search with sentence-transformers and FAISS
- π Structured Output: Generate PDF study notes with definitions, derivations, examples, and common mistakes
- π Citation Tracking: Every paragraph cites sources (video timecodes, slide numbers, exam questions)
- π Coverage Analysis: Track which topics are covered by your materials
- β Quality Assurance: Automated checks for formulas, links, and citations
- π¨ Beautiful PDFs: Typst or Pandoc rendering with customizable themes
βββββββββββββββββββ
β Input Sources β
β Video, Slides, β
β Transcripts, β
β Exam Papers β
ββββββββββ¬βββββββββ
β
v
βββββββββββββββββββ
β Ingestion β
β - FFmpeg Audio β
β - OCR (Tesseract)β
β - Text Parsing β
ββββββββββ¬βββββββββ
β
v
βββββββββββββββββββ
β NLP Pipeline β
β - Chunking β
β - Embeddings β
β - FAISS Index β
β - Topic Mapping β
ββββββββββ¬βββββββββ
β
v
βββββββββββββββββββ
β Synthesis β
β - RAG Retrieval β
β - LLM (Ollama) β
β - Citations β
β - Diagrams β
ββββββββββ¬βββββββββ
β
v
βββββββββββββββββββ
β Rendering β
β - Markdown β
β - Typst/Pandoc β
β - PDF Output β
ββββββββββ¬βββββββββ
β
v
βββββββββββββββββββ
β Outputs β
β PDF, Citations, β
β Coverage Report β
βββββββββββββββββββ
- macOS (Apple Silicon or Intel)
- Python 3.11+
- Homebrew (for system dependencies)
Install via Homebrew:
# Core tools
brew install ffmpeg tesseract graphviz typst
# Ollama (for local LLM)
brew install ollama
# Start Ollama service
ollama serve &
# Pull the default model
ollama pull llama3.2:8bgit clone https://github.com/thecoder8890/exam-kit.git
cd exam-kitUsing Poetry (recommended):
# Install Poetry if not already installed
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Download spaCy model
poetry run python -m spacy download en_core_web_smUsing pip:
# Create virtual environment
python3.11 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -e .
# Download spaCy model
python -m spacy download en_core_web_sm# Using Make
make install-system-deps # See installation instructions
make setup # Install Python deps
# Test the CLI
poetry run examkit --help# 1. Prepare your manifest (see input/sample/manifest.json)
# 2. Ingest and preprocess materials
poetry run examkit ingest --manifest input/sample/manifest.json
# 3. Build exam notes
poetry run examkit build --config config/config.yml --out out/exam_notes.pdf --offline
# 4. View coverage report
poetry run examkit report --session demo --openProcess input files and prepare them for synthesis.
poetry run examkit ingest \
--manifest path/to/manifest.json \
--cache cache/ \
--log-level INFOManifest Format:
{
"session_id": "lec05",
"course": "Computer Science 101",
"inputs": {
"video": "input/lecture05.mp4",
"transcript": "input/lecture05.vtt",
"slides": "input/slides05.pptx",
"exam": "input/exam_2024.pdf",
"topics": "input/topics.yml"
}
}Generate exam-ready PDF from processed inputs.
poetry run examkit build \
--config config/config.yml \
--out out/lecture05.pdf \
--session lec05 \
--offlineGenerate coverage and QA report.
poetry run examkit report \
--session lec05 \
--open # Open coverage CSV after generationClear cached files.
poetry run examkit cache clearEdit config/config.yml to customize behavior:
asr:
engine: faster-whisper
model: small # tiny, base, small, medium, large
language: en
vad: true
llm:
engine: ollama
model: llama3.2:8b
temperature: 0.2
max_tokens: 900
system_prompt: "You create exam-ready, cited study notes..."
embedding:
model: all-MiniLM-L6-v2
dim: 384
batch_size: 32
retrieval:
top_k: 8
max_context_tokens: 2000
pdf:
engine: typst # or pandoc
theme: classic
font_size: 11
include_appendix: true
offline: trueexamkit/
βββ examkit/ # Main package
β βββ cli.py # Typer CLI
β βββ config.py # Pydantic config models
β βββ utils/ # Utilities (I/O, text, math, timecode)
β βββ ingestion/ # File parsing (video, slides, exam)
β βββ asr/ # Audio transcription (faster-whisper)
β βββ nlp/ # NLP (embeddings, RAG, topic mapping)
β βββ synthesis/ # LLM generation (Ollama)
β βββ render/ # PDF rendering (Typst/Pandoc)
β βββ qa/ # Quality checks
β βββ reports/ # Coverage and export
βββ config/ # Configuration and templates
β βββ config.yml
β βββ templates/
β βββ typst/ # Typst templates
β βββ markdown/ # Markdown templates
β βββ prompts/ # LLM prompts
βββ input/ # Input files
β βββ sample/ # Sample data for testing
βββ tests/ # pytest tests
βββ pyproject.toml # Poetry dependencies
βββ Makefile # Build automation
βββ README.md
Run tests with pytest:
# Run all tests
make test
# Or directly with poetry
poetry run pytest -v
# With coverage
poetry run pytest --cov=examkit --cov-report=html# Format code
make format
# Lint code
make lint
# Type checking (if configured)
poetry run mypy examkit/make build-demo1. Typst Not Found
# Install Typst
brew install typst
# Verify installation
typst --version2. Ollama Not Running
# Start Ollama service
ollama serve &
# Check if model is available
ollama list
# Pull model if missing
ollama pull llama3.2:8b3. spaCy Model Missing
poetry run python -m spacy download en_core_web_sm4. OCR Confidence Low
- Increase image resolution in slides parser
- Use
--model mediumor--model largefor faster-whisper - Preprocess images with higher DPI
5. Memory Issues
- Reduce
embedding.batch_sizein config - Use smaller Whisper model (tiny, base)
- Process fewer chunks at a time
MIT License - see LICENSE file.
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run
make testandmake lint - Submit a pull request
If you use ExamKit in your research or project, please cite:
@software{examkit2024,
title = {ExamKit: Production-Grade Exam Preparation Toolkit},
author = {ExamKit Contributors},
year = {2024},
url = {https://github.com/thecoder8890/exam-kit}
}Built with:
Made with β€οΈ for students preparing for exams