Skip to content

v9ai/ai-engineer-roadmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1,915 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


🧠   AI Engineering

From Zero to Production AI Engineer

A hands-on learning platform that takes engineers from transformer internals to shipping production AI systems.

108 deeply-researched lessons across 15 categories β€” RAG, agents, evals, fine-tuning, prompting β€” wired together with semantic search, AI audio narration, an interactive knowledge graph, a RAG tutor, and per-learner mastery analytics.


Next.js TypeScript LangGraph Postgres Vercel


πŸš€ Quick Start Β Β·Β  ✨ Features Β Β·Β  🧱 Stack Β Β·Β  πŸ— Architecture Β Β·Β  πŸ›  Dev

108 lessons Β Β·Β  15 categories Β Β·Β  5 LangGraph graphs Β Β·Β  22 DB tables Β Β·Β  10 course evaluators


✨ Features

Feature What it does
πŸ“š 108 lessons, 15 categories Curriculum from transformer internals β†’ RAG β†’ agents β†’ evals β†’ production, prerequisite-ordered.
πŸ”Ž Semantic + full-text search Cmd+K instant search over Postgres FTS and pgvector cosine similarity.
🎧 Audio narration TTS audio per lesson (Rust pipeline β†’ Cloudflare R2) with per-user resume positions in D1.
πŸ•ΈοΈ Knowledge graph Concepts linked by prerequisite / builds_on / related edges, rendered as an explorable graph.
πŸ€– AI tutor chat RAG chat grounded in lesson content, with intent routing and checkpointed threads.
πŸ“ˆ Mastery analytics Bayesian Knowledge Tracing (mastery / transit / slip / guess) per learner per lesson.
✍️ Self-authoring A 5-pass LangGraph writer (research β†’ outline β†’ draft β†’ review β†’ revise) generates new lessons behind a quality gate.
πŸŽ“ Course reviewer 10 expert evaluators score & rank external AI courses concurrently.

πŸš€ Quick Start

Prerequisites: Node 22.x Β· pnpm Β· a Neon Postgres database

pnpm install
cp .env.example .env.local   # fill in the keys (see Environment below)

pnpm db:push                 # sync schema to Neon
pnpm seed                    # seed 108 lessons from content/*.md
pnpm dev                     # β†’ http://localhost:3006  πŸŽ‰

That's the full app β€” no separate backend server. Chat calls DeepSeek directly from the Next route (set DEEPSEEK_API_KEY); article / prep / flashcard / course-review generation are offline Rust bins (see AI generation).

🧱 Stack

Layer Technology
Framework Next.js 15 (App Router, Turbopack)
Database Neon PostgreSQL + pgvector, Drizzle ORM
UI Radix UI Themes
AI / LLM OpenAI Β· DeepSeek
AI backend Rust axum LangGraph service (crates/ml/server) β€” 6 graphs (chat, app_prep, memorize_generate, article_generate, course_review, fetch_courses); chat does SQLite + LanceDB RAG, the rest are stateless LLM orchestration
Storage SQLite (data/knowledge.db content, data/courses.db courses) + LanceDB (vectors) Β· Cloudflare R2 (audio) Β· D1 (per-user playback state)
Deployment Vercel (frontend) + the Rust knowledge-server binary (backend)

πŸ— Architecture

graph TD
    Browser --> Next["Next.js on Vercel<br/>pages Β· API routes Β· server actions"]
    Next --> Adapter["data.ts adapter"]
    Adapter -->|"DATA_SOURCE=db"| DB[("Neon Postgres<br/>+ pgvector + checkpoints")]
    Adapter -->|"DATA_SOURCE=fs"| FS["content/*.md"]
    Next -->|"BACKEND_URL + bearer"| Rust["Rust knowledge-server :7860<br/>6 LangGraph graphs"]
    Rust --> DeepSeek["DeepSeek API"]
    Rust --> SQLite[("SQLite + LanceDB<br/>knowledge.db Β· courses.db")]
    Next --> R2["Cloudflare R2<br/>audio files"]
    Next --> D1["Cloudflare D1<br/>audio progress"]
Loading

Request paths: lesson pages read through data.ts (DB or filesystem) and pull related lessons via pgvector cosine similarity. Chat does FTS + vector retrieval in Next.js, then POSTs snippets + history to the Rust knowledge-server, which merges them with its own SQLite + LanceDB retrieval and calls DeepSeek (stateless β€” history is supplied by the caller).

πŸ”€ LangGraph Pipelines

  • Content generation (article_generate) β€” research β†’ outline β†’ draft β†’ review β†’ revise, with a conditional revision loop (max 2) gated on word count, code blocks, cross-refs, β‰₯5 xyflow diagrams, and mandatory sections. Run via the gen-article Rust bin (pnpm generate:rust) or over /runs/wait.
  • RAG chat (chat) β€” SQLite lexical + LanceDB vector retrieval merged with caller snippets β†’ format context β†’ one DeepSeek call.
  • Course review (course_review) β€” 10 expert evaluators run concurrently, then a weighted aggregator computes score + verdict.

πŸ—‚ Project Layout

app/                  Next.js App Router (lessons, AWS hub, applications, coursework, problems, api/*)
components/           React components (search, audio-player, toc, …)
content/              Markdown lesson files
src/db/               Neon client + Drizzle schema (22 tables)
src/lib/              backend-client (typed POST /runs/wait)
lib/                  data.ts adapter, db queries, r2.ts, d1.ts, server actions
crates/ml/            Rust workspace β€” knowledge-server (6 graphs, gen-article,
                      seed-topic-courses), core (seed/export), audio-guide
scripts/              seed, scrape, review-courses, e2e
sql/ Β· migrations/    Neon setup + D1 migrations

πŸ›  Dev

pnpm dev                       # start on :3006
pnpm db:push / db:studio       # sync schema / open Drizzle Studio
pnpm seed / seed:courses       # seed lessons / Udemy catalog
pnpm scrape:udemy              # scrape AI/ML Udemy topics β†’ external_courses

pnpm generate <slug>           # generate a lesson via LangGraph
pnpm generate:dry <slug>       # preview without saving
pnpm generate:batch            # generate all missing lessons
pnpm review:courses            # batch-review unreviewed courses

pnpm backend:rust              # run knowledge-server on :7860 (Rust)
pnpm backend:rust:index        # (re)build the LanceDB section index
pnpm generate:rust <args>      # gen-article bin (research→…→finalize)
pnpm seed:courses <args>       # seed-topic-courses bin β†’ data/courses.db
pnpm audio:meta <args>         # markdown β†’ AudioMeta JSON (deterministic)

pnpm backend:test              # cargo test (knowledge-server + audio-guide)
pnpm test:e2e                  # smoke the running server

AI generation (Rust bins)

There is no long-running backend server. Chat is a direct DeepSeek call in app/api/chat/route.ts (via lib/chat-llm.ts β€” 1:1 port of the old Rust chat prompt; needs DEEPSEEK_API_KEY in the Next env / Vercel). All other AI generation runs offline as one-shot aer-ml bins that write Neon / SQLite:

pnpm prep:loop -- --slug <slug>     # interview prep (deepseek agent β†’ Neon)
pnpm prep:rust  / prep:owner:rust   # prep variants (static artifact / owner)
pnpm prep:memorize -- --slug <slug> # flashcards β†’ concepts + applications row
pnpm review:courses [--dry-run]     # course_review β†’ data/courses.db
pnpm generate:rust -- --slug <s>    # article generation

Env for the bins: DEEPSEEK_API_KEY (monorepo-root .env) and, for the ones that persist, DATABASE_URL (auto-exported by the npm scripts). Course data is scraped/reviewed into data/courses.db and surfaced to the frontend as JSON via pnpm export:content (Rust β†’ data/content/*.json).

Local prep generation (DB-backed)

The application /prep page reads interviewQuestions from Neon. In prod that DB-backed path is dormant (no Rust backend is deployed; the public page falls back to the committed data/app-prep/<slug>.json seed). To regenerate real prep and push it into the row β€” which, since DATABASE_URL is the shared Neon, also updates the live owner view β€” one full-Rust command does generate β†’ validate β†’ persist:

pnpm prep:loop                 # ECB SSM Cockpit Developer (default slug)
pnpm prep:loop -- --slug <slug>            # another application
pnpm prep:loop -- --slug <slug> --no-db    # regenerate + validate only

prep:loop runs the gen-app-prep-loop Rust bin (crates/ml). It: (1) runs a deepseek agent in-process (deepseek::run, only the builtin Read/Write tools, AcceptEdits) that reads the job description from data/app-prep/<slug>.json and rewrites the artifact's prep fields; (2) validates the result in Rust β€” exactly 4 ## sections, techStack a JSON string whose every category ∈ app_prep::CATEGORIES, valid relevance; (3) connects to Neon over Postgres (sqlx) and updates the applications row (interviewQuestions/techStack, plus jobDescription backfill if the row had none), then reads back to confirm. It mutates the live row β€” not a dry run; the agent step and the validation gate must both pass first, so Neon is untouched on failure.

The npm script exports DEEPSEEK_API_KEY from the monorepo-root /Users/vadimnicolai/Public/ai-apps/.env and DATABASE_URL from .env.local (the Rust process does not load .env*); >1 rows for a slug β†’ --user-id <id>. (prep:rust β€” the gen-app-prep bin, static artifact only β€” and backend:rust:local remain valid alternatives. pnpm test:app-prep still exists as the seed-loader / public-render contract test.)

Environment

DATABASE_URL=             # Neon connection string
OPENAI_API_KEY=
DEEPSEEK_API_KEY=         # required by the Next runtime for /api/chat + the AI bins
NEXT_PUBLIC_DATA_SOURCE=  # "db" | "fs"
NEXT_PUBLIC_R2_DOMAIN=    # audio CDN domain
R2_ACCOUNT_ID= R2_ACCESS_KEY_ID= R2_SECRET_ACCESS_KEY= R2_BUCKET_NAME=
CLOUDFLARE_ACCOUNT_ID= CLOUDFLARE_AUDIO_D1_ID= CLOUDFLARE_D1=

Built with Next.js 15 Β· LangGraph Β· Neon pgvector Β· Cloudflare

From zero to production AI engineer β€” one lesson at a time.

⬆ Back to top

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors