A content pipeline and multi-client delivery system for audio-narrated micro-courses. It turns authored Markdown/YAML course content into enriched metadata, structured summaries, and TTS audio, then serves it to an Expo mobile app and a SvelteKit web preview through a Cloudflare Worker backed by R2.
Note: This is a prototype extracted for open-source. The course content under
resources/is not included (it is gitignored) — clone the repo and supply your own content tree underresources/<course>/to run the pipeline end to end.
A monorepo with four cooperating parts plus a content tree:
resources/— authoritative course content. Each course is a directory containingunits.yamland per-unit sub-directories (unit-d/,unit-e/, …). Each unit dir holdsintro.md,topics.yaml, numbered lesson files (01.md,02.md, …), and generated artefacts (##-summary.md,##.tts_script.txt,##-summary.mp3,##.segments.json,intro.enriched.json).*.ipynb+pyproject.toml— the Python pipeline. Notebooks enrich metadata, fetch Wistia transcripts, generate summaries, and produce ElevenLabs (or OpenAI) TTS audio, writing generated files back intoresources/:01_metadata_enrichment.ipynb— enrich unit metadata (topics & learning objectives) via an LLM.02_transcripts_and_assets.ipynb— fetch transcripts, produce structured summaries, TTS scripts, and audio.03_script_phrase_analysis.ipynb— analysis over generated scripts.
scripts/— TypeScript tooling.content-loader.tsis the canonical normaliser fromresources/→CourseData(the source of truth for the data shapes);sync-resources-to-r2.tsuploadsdata/courses.json+ audio to R2.cloudflare/— a Wrangler Worker (pelajari-api) servingGET /courses,GET /courses/:id, andGET /audio/:courseId/:unitId/:lessonIdfrom theASSETS_BUCKETR2 binding. It rewriteslesson.audioKeyinto anaudioUrlpointing back at/audio/...and caches/courses.mobile/— an Expo / expo-router app. Streams from the Worker by default (EXPO_PUBLIC_API_BASE_URL, fallbackhttp://127.0.0.1:8787). An optional offline path (npm run generate:data) bundlesresources/into the app.webapp/— a SvelteKit preview site that reads../resourcesdirectly from disk (no R2 dependency). Dev-oriented; not deployable as-is.
The CourseData / UnitData / LessonData shapes are duplicated in four places and must stay aligned: scripts/content-loader.ts, cloudflare/src/types.ts, mobile/data/types.ts, webapp/src/lib/types.ts. scripts/content-loader.ts is the source of truth — mirror any shape change there first.
Audio-key convention: ${courseId}/${unitId}/${lessonId} maps to the R2 object key audio/${courseId}/${unitId}/${lessonId}-summary.mp3. The Worker reconstructs that path from the URL segments; changing the prefix requires updating both AUDIO_PREFIX in cloudflare/wrangler.toml and audioPrefix in the sync script.
- Node 24 and Python 3.12 (both pinned in
mise.toml; runmise installto match, or install manually). - uv for Python dependency management.
- An OpenAI API key (required). An ElevenLabs API key is optional — audio synthesis falls back to OpenAI TTS.
- A Cloudflare account + R2 bucket if you want to run the serving path (
sync:r2, Worker).
mise install # Node 24 + Python 3.12 (or install manually)
uv sync # create the Python venv for the notebooks
cp .env.example .env # then fill in your keys
# add course content under resources/<course>/ (not shipped with the repo)
npm install # at the repo root, and in cloudflare/, mobile/, webapp/Copy .env.example to .env and fill in:
| Variable | Used by | Notes |
|---|---|---|
OPENAI_API_KEY |
notebooks | Required. Metadata enrichment, summarization, TTS scripts. |
OPENAI_MODEL |
notebooks | Default gpt-4o-mini. |
ELEVENLABS_API_KEY |
notebooks | Optional. Falls back to OpenAI TTS. |
ELEVENLABS_MODEL |
notebooks | Default eleven_multilingual_v2. |
RESOURCES_ROOT |
notebooks, sync | Default ./resources. |
R2_ACCOUNT_ID / R2_ACCESS_KEY_ID / R2_SECRET_ACCESS_KEY / R2_BUCKET_NAME |
npm run sync:r2 |
Required to push data + audio to R2. |
CORS_ALLOW_ORIGINS |
Worker | Comma-separated allow-list; unset means *. |
EXPO_PUBLIC_API_BASE_URL |
mobile | Worker base URL; fallback http://127.0.0.1:8787. |
PUBLIC_AUDIO_BASE_URL |
webapp | Optional override for the audio host. |
Root:
uv sync— create the Python venv for running notebooks.npm run sync:r2— normaliseresources/and upload the manifest + audio to R2.npm run generate:mobile— proxy to the mobilegenerate:datascript (offline-bundle path).
Per workspace (run from that directory):
cloudflare/:npm run dev(wrangler dev against the real bucket),npm run deploy,npm run check(tsc --noEmit).mobile/:npm run start/ios/android/web,npm run lint,npm run generate:data.webapp/:npm run dev,npm run build,npm run check(svelte-kit sync + svelte-check).
Each course is a directory under resources/ with:
units.yaml— the list of units.- Unit directories (e.g.
unit-d,unit-e) containing:intro.md— unit introduction with frontmatter.##.md— numbered lesson files withkind: videofrontmatter.- Generated outputs:
##.segments.json,##-summary.md,##.tts_script.txt,##-summary.mp3,intro.enriched.json.
The notebooks mutate resources/ in place. Treat the generated files as outputs — regenerate rather than hand-edit, then re-run npm run sync:r2 to propagate.
- Linting/formatting: Biome at the repo root (
biome.json, tab indent,useConstdisabled). The webapp has its ownbiome.jsonwith.svelteoverrides. - There is no configured test runner.
See LICENSE.