Skip to content

thecoder8890/exam-kit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ExamKit

Production-grade exam preparation toolkit for macOS - Offline, Local-Only Processing

ExamKit is a comprehensive Python application that transforms lecture materials (videos, transcripts, slides, exam papers) into exam-ready study notes with citations, formulas, and coverage reports.

✨ Features

  • πŸŽ₯ Multi-Source Ingestion: Process videos, transcripts (VTT/SRT), slides (PPTX/PDF), and exam papers
  • πŸ—£οΈ Offline ASR: Transcribe audio using faster-whisper (no cloud APIs)
  • 🧠 Local LLM: Generate content using Ollama (llama3.2:8b) running locally
  • πŸ“Š RAG Pipeline: Semantic search with sentence-transformers and FAISS
  • πŸ“– Structured Output: Generate PDF study notes with definitions, derivations, examples, and common mistakes
  • πŸ” Citation Tracking: Every paragraph cites sources (video timecodes, slide numbers, exam questions)
  • πŸ“ˆ Coverage Analysis: Track which topics are covered by your materials
  • βœ… Quality Assurance: Automated checks for formulas, links, and citations
  • 🎨 Beautiful PDFs: Typst or Pandoc rendering with customizable themes

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Input Sources  β”‚
β”‚ Video, Slides,  β”‚
β”‚ Transcripts,    β”‚
β”‚ Exam Papers     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Ingestion     β”‚
β”‚ - FFmpeg Audio  β”‚
β”‚ - OCR (Tesseract)β”‚
β”‚ - Text Parsing  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   NLP Pipeline  β”‚
β”‚ - Chunking      β”‚
β”‚ - Embeddings    β”‚
β”‚ - FAISS Index   β”‚
β”‚ - Topic Mapping β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Synthesis     β”‚
β”‚ - RAG Retrieval β”‚
β”‚ - LLM (Ollama)  β”‚
β”‚ - Citations     β”‚
β”‚ - Diagrams      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Rendering     β”‚
β”‚ - Markdown      β”‚
β”‚ - Typst/Pandoc  β”‚
β”‚ - PDF Output    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Outputs      β”‚
β”‚ PDF, Citations, β”‚
β”‚ Coverage Report β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“‹ Prerequisites

System Requirements

  • macOS (Apple Silicon or Intel)
  • Python 3.11+
  • Homebrew (for system dependencies)

System Dependencies

Install via Homebrew:

# Core tools
brew install ffmpeg tesseract graphviz typst

# Ollama (for local LLM)
brew install ollama

# Start Ollama service
ollama serve &

# Pull the default model
ollama pull llama3.2:8b

πŸš€ Installation

1. Clone the Repository

git clone https://github.com/thecoder8890/exam-kit.git
cd exam-kit

2. Set Up Python Environment

Using Poetry (recommended):

# Install Poetry if not already installed
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Download spaCy model
poetry run python -m spacy download en_core_web_sm

Using pip:

# Create virtual environment
python3.11 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -e .

# Download spaCy model
python -m spacy download en_core_web_sm

3. Verify Installation

# Using Make
make install-system-deps  # See installation instructions
make setup                # Install Python deps

# Test the CLI
poetry run examkit --help

πŸ“– Usage

Quick Start

# 1. Prepare your manifest (see input/sample/manifest.json)
# 2. Ingest and preprocess materials
poetry run examkit ingest --manifest input/sample/manifest.json

# 3. Build exam notes
poetry run examkit build --config config/config.yml --out out/exam_notes.pdf --offline

# 4. View coverage report
poetry run examkit report --session demo --open

CLI Commands

examkit ingest

Process input files and prepare them for synthesis.

poetry run examkit ingest \
  --manifest path/to/manifest.json \
  --cache cache/ \
  --log-level INFO

Manifest Format:

{
  "session_id": "lec05",
  "course": "Computer Science 101",
  "inputs": {
    "video": "input/lecture05.mp4",
    "transcript": "input/lecture05.vtt",
    "slides": "input/slides05.pptx",
    "exam": "input/exam_2024.pdf",
    "topics": "input/topics.yml"
  }
}

examkit build

Generate exam-ready PDF from processed inputs.

poetry run examkit build \
  --config config/config.yml \
  --out out/lecture05.pdf \
  --session lec05 \
  --offline

examkit report

Generate coverage and QA report.

poetry run examkit report \
  --session lec05 \
  --open  # Open coverage CSV after generation

examkit cache clear

Clear cached files.

poetry run examkit cache clear

βš™οΈ Configuration

Edit config/config.yml to customize behavior:

asr:
  engine: faster-whisper
  model: small  # tiny, base, small, medium, large
  language: en
  vad: true

llm:
  engine: ollama
  model: llama3.2:8b
  temperature: 0.2
  max_tokens: 900
  system_prompt: "You create exam-ready, cited study notes..."

embedding:
  model: all-MiniLM-L6-v2
  dim: 384
  batch_size: 32

retrieval:
  top_k: 8
  max_context_tokens: 2000

pdf:
  engine: typst  # or pandoc
  theme: classic
  font_size: 11
  include_appendix: true

offline: true

πŸ“ Project Structure

examkit/
β”œβ”€β”€ examkit/              # Main package
β”‚   β”œβ”€β”€ cli.py           # Typer CLI
β”‚   β”œβ”€β”€ config.py        # Pydantic config models
β”‚   β”œβ”€β”€ utils/           # Utilities (I/O, text, math, timecode)
β”‚   β”œβ”€β”€ ingestion/       # File parsing (video, slides, exam)
β”‚   β”œβ”€β”€ asr/             # Audio transcription (faster-whisper)
β”‚   β”œβ”€β”€ nlp/             # NLP (embeddings, RAG, topic mapping)
β”‚   β”œβ”€β”€ synthesis/       # LLM generation (Ollama)
β”‚   β”œβ”€β”€ render/          # PDF rendering (Typst/Pandoc)
β”‚   β”œβ”€β”€ qa/              # Quality checks
β”‚   └── reports/         # Coverage and export
β”œβ”€β”€ config/              # Configuration and templates
β”‚   β”œβ”€β”€ config.yml
β”‚   └── templates/
β”‚       β”œβ”€β”€ typst/       # Typst templates
β”‚       β”œβ”€β”€ markdown/    # Markdown templates
β”‚       └── prompts/     # LLM prompts
β”œβ”€β”€ input/               # Input files
β”‚   └── sample/          # Sample data for testing
β”œβ”€β”€ tests/               # pytest tests
β”œβ”€β”€ pyproject.toml       # Poetry dependencies
β”œβ”€β”€ Makefile             # Build automation
└── README.md

πŸ§ͺ Testing

Run tests with pytest:

# Run all tests
make test

# Or directly with poetry
poetry run pytest -v

# With coverage
poetry run pytest --cov=examkit --cov-report=html

πŸ”§ Development

Code Quality

# Format code
make format

# Lint code
make lint

# Type checking (if configured)
poetry run mypy examkit/

Building Demo

make build-demo

πŸ› Troubleshooting

Common Issues

1. Typst Not Found

# Install Typst
brew install typst

# Verify installation
typst --version

2. Ollama Not Running

# Start Ollama service
ollama serve &

# Check if model is available
ollama list

# Pull model if missing
ollama pull llama3.2:8b

3. spaCy Model Missing

poetry run python -m spacy download en_core_web_sm

4. OCR Confidence Low

  • Increase image resolution in slides parser
  • Use --model medium or --model large for faster-whisper
  • Preprocess images with higher DPI

5. Memory Issues

  • Reduce embedding.batch_size in config
  • Use smaller Whisper model (tiny, base)
  • Process fewer chunks at a time

πŸ“ License

MIT License - see LICENSE file.

🀝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Run make test and make lint
  5. Submit a pull request

πŸ“š Citation

If you use ExamKit in your research or project, please cite:

@software{examkit2024,
  title = {ExamKit: Production-Grade Exam Preparation Toolkit},
  author = {ExamKit Contributors},
  year = {2024},
  url = {https://github.com/thecoder8890/exam-kit}
}

πŸ™ Acknowledgments

Built with:


Made with ❀️ for students preparing for exams

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •