Skip to content

tuttinator/pelajari

Repository files navigation

Pelajari

A content pipeline and multi-client delivery system for audio-narrated micro-courses. It turns authored Markdown/YAML course content into enriched metadata, structured summaries, and TTS audio, then serves it to an Expo mobile app and a SvelteKit web preview through a Cloudflare Worker backed by R2.

Note: This is a prototype extracted for open-source. The course content under resources/ is not included (it is gitignored) — clone the repo and supply your own content tree under resources/<course>/ to run the pipeline end to end.

Architecture

A monorepo with four cooperating parts plus a content tree:

  • resources/ — authoritative course content. Each course is a directory containing units.yaml and per-unit sub-directories (unit-d/, unit-e/, …). Each unit dir holds intro.md, topics.yaml, numbered lesson files (01.md, 02.md, …), and generated artefacts (##-summary.md, ##.tts_script.txt, ##-summary.mp3, ##.segments.json, intro.enriched.json).
  • *.ipynb + pyproject.toml — the Python pipeline. Notebooks enrich metadata, fetch Wistia transcripts, generate summaries, and produce ElevenLabs (or OpenAI) TTS audio, writing generated files back into resources/:
    • 01_metadata_enrichment.ipynb — enrich unit metadata (topics & learning objectives) via an LLM.
    • 02_transcripts_and_assets.ipynb — fetch transcripts, produce structured summaries, TTS scripts, and audio.
    • 03_script_phrase_analysis.ipynb — analysis over generated scripts.
  • scripts/ — TypeScript tooling. content-loader.ts is the canonical normaliser from resources/CourseData (the source of truth for the data shapes); sync-resources-to-r2.ts uploads data/courses.json + audio to R2.
  • cloudflare/ — a Wrangler Worker (pelajari-api) serving GET /courses, GET /courses/:id, and GET /audio/:courseId/:unitId/:lessonId from the ASSETS_BUCKET R2 binding. It rewrites lesson.audioKey into an audioUrl pointing back at /audio/... and caches /courses.
  • mobile/ — an Expo / expo-router app. Streams from the Worker by default (EXPO_PUBLIC_API_BASE_URL, fallback http://127.0.0.1:8787). An optional offline path (npm run generate:data) bundles resources/ into the app.
  • webapp/ — a SvelteKit preview site that reads ../resources directly from disk (no R2 dependency). Dev-oriented; not deployable as-is.

Data-flow contract

The CourseData / UnitData / LessonData shapes are duplicated in four places and must stay aligned: scripts/content-loader.ts, cloudflare/src/types.ts, mobile/data/types.ts, webapp/src/lib/types.ts. scripts/content-loader.ts is the source of truth — mirror any shape change there first.

Audio-key convention: ${courseId}/${unitId}/${lessonId} maps to the R2 object key audio/${courseId}/${unitId}/${lessonId}-summary.mp3. The Worker reconstructs that path from the URL segments; changing the prefix requires updating both AUDIO_PREFIX in cloudflare/wrangler.toml and audioPrefix in the sync script.

Prerequisites

  • Node 24 and Python 3.12 (both pinned in mise.toml; run mise install to match, or install manually).
  • uv for Python dependency management.
  • An OpenAI API key (required). An ElevenLabs API key is optional — audio synthesis falls back to OpenAI TTS.
  • A Cloudflare account + R2 bucket if you want to run the serving path (sync:r2, Worker).

Setup

mise install              # Node 24 + Python 3.12 (or install manually)
uv sync                   # create the Python venv for the notebooks
cp .env.example .env      # then fill in your keys
# add course content under resources/<course>/  (not shipped with the repo)
npm install               # at the repo root, and in cloudflare/, mobile/, webapp/

Environment variables

Copy .env.example to .env and fill in:

Variable Used by Notes
OPENAI_API_KEY notebooks Required. Metadata enrichment, summarization, TTS scripts.
OPENAI_MODEL notebooks Default gpt-4o-mini.
ELEVENLABS_API_KEY notebooks Optional. Falls back to OpenAI TTS.
ELEVENLABS_MODEL notebooks Default eleven_multilingual_v2.
RESOURCES_ROOT notebooks, sync Default ./resources.
R2_ACCOUNT_ID / R2_ACCESS_KEY_ID / R2_SECRET_ACCESS_KEY / R2_BUCKET_NAME npm run sync:r2 Required to push data + audio to R2.
CORS_ALLOW_ORIGINS Worker Comma-separated allow-list; unset means *.
EXPO_PUBLIC_API_BASE_URL mobile Worker base URL; fallback http://127.0.0.1:8787.
PUBLIC_AUDIO_BASE_URL webapp Optional override for the audio host.

Common commands

Root:

  • uv sync — create the Python venv for running notebooks.
  • npm run sync:r2 — normalise resources/ and upload the manifest + audio to R2.
  • npm run generate:mobile — proxy to the mobile generate:data script (offline-bundle path).

Per workspace (run from that directory):

  • cloudflare/: npm run dev (wrangler dev against the real bucket), npm run deploy, npm run check (tsc --noEmit).
  • mobile/: npm run start / ios / android / web, npm run lint, npm run generate:data.
  • webapp/: npm run dev, npm run build, npm run check (svelte-kit sync + svelte-check).

Authoring course content

Each course is a directory under resources/ with:

  • units.yaml — the list of units.
  • Unit directories (e.g. unit-d, unit-e) containing:
    • intro.md — unit introduction with frontmatter.
    • ##.md — numbered lesson files with kind: video frontmatter.
    • Generated outputs: ##.segments.json, ##-summary.md, ##.tts_script.txt, ##-summary.mp3, intro.enriched.json.

The notebooks mutate resources/ in place. Treat the generated files as outputs — regenerate rather than hand-edit, then re-run npm run sync:r2 to propagate.

Development notes

  • Linting/formatting: Biome at the repo root (biome.json, tab indent, useConst disabled). The webapp has its own biome.json with .svelte overrides.
  • There is no configured test runner.

License

See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors