Hoping to make your 秋招 (qiūzhāo, Chinese AI campus recruiting season) a little easier 🌱
📖 中文版 (Chinese version): README_CN.md
📚 Jump to a topic — 23 first-party cheat sheets across 7 categories + 1 community-contributed category:
🧠 General / Foundations · 🎯 Post-Training & Reasoning · 🏛️ LLM Architecture & Systems · 🌊 Generative Models — Theory & Tokenizers · 🎨 Generation Systems (Image / Video / 3D / Diffusion Post-Training) · 👁️ Multimodal · 🤖 Agents · 🦾 Embodied AI / 具身智能
Or browse the full 📚 Tutorial Index ↓ · jump to 🌐 ARIS-Homepage ↓.
🏆 Built on a battle-tested foundation — the ARIS main repo has ~10k GitHub stars, was HuggingFace Daily Papers #1, won AI Digital Crew Project of the Day, and ships 74+ research skills across 7+ platforms. This isn't a vaporware preview — every cheat sheet here is the production output of the same
/interview-cheatsheet+/render-htmlworkflow used in academic-research production.
A curated, bilingual (中文 + English) collection of ML / LLM / multimodal / diffusion / agent / generative-model interview cheat sheets, auto-generated by the ARIS — Auto Research in Sleep /render-html workflow.
Each cheat sheet is a long-form Chinese tutorial with: formula derivations · from-scratch PyTorch code · 25 high-frequency interview questions (L1 essentials · L2 advanced · L3 top-tier lab).
🔥 NEW · 🌐 Also in this repo — ARIS-Homepage: turn your CV + selected GitHub repos into a fact-checked academic homepage with single-file HTML output. v1.1 adds
--from-repos owner/repo,...—ghCLI snapshots stars / releases / READMEs and merges repo timelines into your homepage. 🔥 Live demo at wanshuiyin.github.io →
📖 Preview (above): one snapshot per pillar, taken from the Diffusion Foundations cheat sheet — ① Foundations (formula derivations + intuition + TL;DR), ② Interview Q&A (25 high-frequency questions stratified L1/L2/L3), ③ From-Scratch Code (runnable PyTorch, including CFG training + DDIM sampling). Every cheat sheet in this collection follows the same three-pillar structure.
🌐 ARIS-Homepage preview (above): same
/render-htmlworkflow turning a CV into a fact-checked academic homepage. Live demo at wanshuiyin.github.io. Details + pipeline diagram in the ARIS-Homepage section ↓.
📝 Long-form blog preview (above): a standalone hand-authored long-form technical survey. A Survey on Continuous DLM (2026 H1, 6 papers) — Chinese-language survey by Ruofeng Yang (SJTU), written end-to-end via cross-model discussion (Claude Opus 4.7 + Codex GPT-5.5 xhigh + Gemini auto-gemini-3). 📖 Read full blog ↗.
Phone on the subway, iPad at a café, laptop in the library — same HTML link opens equally well:
- 🧮 MathJax renders all LaTeX formulas (not screenshots — scalable, copyable, selectable)
- 💻 highlight.js colors all PyTorch code blocks
- 📐 Responsive layout adapts to any window width — no overflow, no blur
- 📑 Sticky TOC for jumping around long documents
- 💾 Single-file HTML — download once, read offline, no backend required
- 2026-06-08 —
🔧
tools/render_html.pynow strips a leading UTF-8 BOM before frontmatter detection (6cc4876). - 2026-06-05 —
🔭 Community Showcase: an online tutorial-collection site by @QiZishi — merges all 23 Chinese tutorials here with Datawhale Hello-Agents' LLM interview Q&A into one card-index reading site (read online · from #3). The README gains a Community Showcase section listing community derivatives — build something on top of these tutorials and open an issue to get it listed (f8e7d33).
- 2026-06-02 —
📝 Two new blogs — Continuous DLM (representation perspective, v2) + Diffusion × Representation × Manifold — the Continuous DLM survey is upgraded to a representation-perspective expansion (replacing the earlier v1), and a companion blog traces the "borrow representation / use the manifold" threads across image & video diffusion (SSL / Consistency / REPA / RAE / JiT / V-JEPA2). Both hand-authored, cross-model reviewed; the two cross-reference each other. Live:
continuous_dlm_representation_perspective.html·diffusion_representation_manifold.html. See the Blog Index ↓. - 2026-06-01 —
📝 New blog: a deep-dive guide to NVIDIA Cosmos 3 (omnimodal world model · MoT architecture) — a long-form Chinese popsci walkthrough of the 138-page Cosmos 3 technical report (15 sections · 12 figures): how understanding and generation are stitched into one Transformer (Mixture-of-Transformers), the training recipe, scaling / serving, and the three model sizes. Reviewed end-to-end across 5 rounds of Codex GPT-5.5 xhigh cross-model audit (numbers + mechanisms checked, overclaims trimmed). Self-contained single-file HTML at
docs/blogs/cosmos3_mot_guide.html— a hand-authored guide (ELF blog format), not part of the audited/render-htmlpipeline. All figures and numbers are from the NVIDIA Cosmos 3 report (github.com/nvidia/cosmos); © the original authors, used here as an attributed popsci guide. - 2026-05-31 —
🔍 Cross-model audit pass — real errors caught & fixed across the whole collection, now CI-enforced — a fresh Codex GPT-5.5 xhigh re-review of every cheat sheet (中文 + EN) surfaced genuine technical mistakes the first pass had missed, and fixed them: DeepSeek-V3's FP8 GEMM accumulates in FP32, not bf16; Qwen2-VL M-RoPE base is 1e6; Molmo's vision tower is CLIP, not SigLIP; TensoRF complexity is O(N³); a broken StreamingLLM render callout; plus 10 EN translation-fidelity fixes (meaning reversals, a dropped framework list). Every
docs/artifact now carries a traceable cross-model review, and a new CI gate (tools/verify_reviews.py--mode strict --reproduce, wired inreview-audit.yml) blocks any PR whose HTML isn't a reviewed view that byte-reproduces from its source — so no un-reviewed or hand-edited content can land. Also merged today: a Focus-dim reading mode for--blog-modepages (dim non-current sections · floating 🎯 toggle · ↑/↓ section nav), reflowed from the dllm blog effort into the academic template — dormant for the cheat sheets (gated under.aris-blog), it activates in blog-mode, ready for the first blogs rendered through it (landing soon). (5aae952 · 498ecf2)
📋 Earlier updates
- 2026-05-28 —
📝 First blog shipped: A Survey on Continuous DLM (2026 H1, 6 papers) — long-form Chinese technical survey by Ruofeng Yang (SJTU); superseded by the v2 rewrite at the same path, see the 2026-06-02 entry, written end-to-end through cross-model discussion (Claude Opus 4.7 + Codex GPT-5.5 xhigh + Gemini auto-gemini-3). Compares ELF, ByteDance Cola-DLM, and Flow-Matching family across discrete-DLM problems, the "known-unknown" continuous space idea, training pipeline, architecture / params / shapes, inference grids + Tab 6/7 numerical results, denoising trajectories, and a Field Landscape against Cola-DLM. Lives at
docs/blogs/continuous_dlm_representation_perspective.html(1.7 MB self-contained, no build) — a hand-authored long-form HTML, outside the audited/render-htmlpipeline. Preview strip + live link at the top of this README. (8475a2d) - 2026-05-28 —
⚡
render-htmlP0 polish + all 23 tutorials regenerated — academic template gained 7 interactive features: print degradation fix (PDF no longer loses<details>content), TOC sidebar scrollspy (current section auto-highlights as you scroll), figure lightbox (native<dialog>with focus trap + Esc), long-code auto-collapse (<pre>≥30 lines wrapped in<details class="code-card">, per-block override via```python {collapsed}/{open}fence flags), paper citation popover (new[[key]]MD syntax +--papers <papers.json>sidecar), eyebrow cleanup (marketing uppercase → body-serif gray),--blog-modeinfrastructure (opt-inaris-blogbody class). XSS-hardened script injection viajson_for_script()(escapes</script>break-out). All 23 bilingual tutorial pairs (= 46 HTMLs) regenerated to pick up the new template shell — source MDs untouched. Codex GPT-5.5 xhigh 4-round review (design × 2 → code × 1 → spot-check × 1). Try it: scrollattention_tutorial.htmland watch the TOC sidebar follow (b79c57d, 8793f40). - 2026-05-26 —
🐍 5 runnable PyTorch tutorial scripts — first runnable-code contribution in
docs/tutorials/code/:mha.py(MHA + causal mask) ·axial_attention.py(H/W axial + complexity table) ·flow_matching.py(Rectified Flow on 2D moons) ·mmdit_block.py(double-stream MMDiT block) ·toy_mmdit_t2i_pipeline.py(end-to-end T2I skeleton). Pure PyTorch, CPU-runnable in seconds, every script ships with built-inassertsanity checks (shape parity, numerical agreement withnn.MultiheadAttentionwhere applicable). Pairs withattention_tutorial.md/flow_matching_tutorial.md/image_generation_systems_tutorial.md(f63f468). - 2026-05-24 —
🐙 ARIS-Homepage v1.1:
--from-repos— snapshot user-selectedowner/repolist viaghCLI; LLM agent merges repo timelines into homepage News +featured_projects[].github. Private repos skipped by default. Closes #2 by @Yafei-Liu99 (cdcf9a2). - 2026-05-23 —
🌐 ARIS-Homepage v1 shipped — CV → fact-checked academic homepage (DBLP / arXiv audit blocks wrong venue / year / author). Single-file HTML; Codex / Gemini reviews optional. Live demo: wanshuiyin.github.io. Skill:
skills/homepage-generator/SKILL.md(b818c1d). - 2026-05-22 —
🦾 Featured community contribution: 具身智能高频面试题库 by @WinstonJQ — 413 questions across 8 卷 (VLA / 模仿学习 / RL / 世界模型 / 工程落地 / 腿足控制 / 3D 感知 / 系统设计). Hosted externally; linked from the new "🦾 Embodied AI" category in the Tutorial Index (b1ebb6f).
- 2026-05 —
📚 4 new bilingual cheat sheets: KL Divergence in RLHF (k1/k2/k3 · placement gradient bias), LLM On-Policy Distillation (MiniLLM / GKD / Qwen3 / Tinker), Diffusion Post-Training (DDPO / DPOK / DRaFT / AlignProp / Diffusion-DPO / Flow-GRPO), Diffusion / Flow Distillation (CM / iCT / sCM / CTM / LCM / DMD/DMD2 / ADD/LADD). Total now: 23 first-party cheat sheets.
- 2026-05 —
📖 README restructure — preview-strip banner, ARIS credentials at top (badges + 10K-star foundation paragraph), shared WeChat community QR with the main ARIS repo.
Long-form technical blogs — hand-authored, cross-model reviewed; outside the audited /render-html pipeline (figures © their original authors, used with attribution).
| Blog | What it covers | |
|---|---|---|
| NVIDIA Cosmos 3 — MoT Architecture Deep-Dive (中文) | Omnimodal world model · Mixture-of-Transformers · a walkthrough of the 138-page Cosmos 3 technical report | 📄 Read |
| A Survey on Continuous DLM — Representation Perspective (中文) | Continuous diffusion language models through a representation lens · ELF / ByteDance Cola-DLM / Flow-Matching family (2026 H1) | 📄 Read |
| Diffusion × Representation × Manifold (中文) | The "borrow representation / use the manifold" threads in image & video diffusion · SSL / Consistency / REPA / RAE / JiT / V-JEPA2 · cross-referenced with the Continuous DLM survey | 📄 Read |
🌐 Bilingual editions: every cheat sheet ships with both a Chinese (default) and an English HTML — filenames are
*_tutorial.html(CN) and*_tutorial_en.html(EN). HTML columns below link to both.
| Topic | HTML 中文 | HTML EN | MD |
|---|---|---|---|
| Attention Interview Cheat Sheet | 📄 CN | 📄 EN | MD |
| KL Divergence in RLHF (k1/k2/k3 · placement gradient bias) | 📄 CN | 📄 EN | MD |
| Topic | HTML 中文 | HTML EN | MD |
|---|---|---|---|
| RLHF / DPO / GRPO / PPO | 📄 CN | 📄 EN | MD |
| Reasoning Models (o1 / R1 / Test-Time Compute / PRM) | 📄 CN | 📄 EN | MD |
| LLM On-Policy Distillation (MiniLLM / GKD / Qwen3 / Tinker) | 📄 CN | 📄 EN | MD |
| Topic | HTML 中文 | HTML EN | MD |
|---|---|---|---|
| MoE (DeepSeek-V3 / Mixtral / Llama 4) | 📄 CN | 📄 EN | MD |
| Long Context (RoPE / YaRN / NTK / MLA / StreamingLLM) | 📄 CN | 📄 EN | MD |
| KV Cache + Speculative Decoding (Medusa / EAGLE / MLA) | 📄 CN | 📄 EN | MD |
| Quantization (GPTQ / AWQ / FP8 / NVFP4 / SmoothQuant) | 📄 CN | 📄 EN | MD |
| Distributed Training (DDP / FSDP2 / ZeRO / TP / PP / EP / SP) | 📄 CN | 📄 EN | MD |
| Topic | HTML 中文 | HTML EN | MD |
|---|---|---|---|
| Flow Matching Quick Reference | 📄 CN | 📄 EN | MD |
| Diffusion Foundations (DDPM / Score / DDIM / EDM / CFG) | 📄 CN | 📄 EN | MD |
| VAE / VQ-VAE / VQ-GAN / FSQ | 📄 CN | 📄 EN | MD |
| Topic | HTML 中文 | HTML EN | MD |
|---|---|---|---|
| Image Gen Systems (LDM / SD / SDXL / SD3 / FLUX / ControlNet) | 📄 CN | 📄 EN | MD |
| Video Gen (Sora / Hunyuan-Video / Kling / Wan / Movie Gen) | 📄 CN | 📄 EN | MD |
| 3D Gen (NeRF / Instant-NGP / 3DGS / SDS / Trellis) | 📄 CN | 📄 EN | MD |
| Diffusion Post-Training (DDPO / DPOK / DRaFT / AlignProp / Diffusion-DPO / Flow-GRPO) | 📄 CN | 📄 EN | MD |
| Diffusion / Flow Distillation (CM / iCT / sCM / CTM / LCM / DMD/DMD2 / ADD/LADD) | 📄 CN | 📄 EN | MD |
| Topic | HTML 中文 | HTML EN | MD |
|---|---|---|---|
| VLM (CLIP / LLaVA / Qwen-VL / DeepSeek-VL) | 📄 CN | 📄 EN | MD |
| Topic | HTML 中文 | HTML EN | MD |
|---|---|---|---|
| Agent Foundations (ReAct / MCP / A2A / SWE-bench / GAIA / OSWorld) | 📄 CN | 📄 EN | MD |
| Agentic RL (AgentTuning / ToolRL / RAGEN / WebRL / SWE-RL / GRPO for tool use) | 📄 CN | 📄 EN | MD |
| Multi-Agent & Long-Horizon (CAMEL / AutoGen / MetaGPT / MoA / Debate / MemGPT / LATS) | 📄 CN | 📄 EN | MD |
| Self-Evolving Agents (Ctx2Skill / Native Evolution / A²RD / Voyager / Reflexion / STaR) | 📄 CN | 📄 EN | MD |
🎉 23 tutorials live (bilingual) (2026-05) — each ships with both Chinese and English HTML. Seven buckets: General · Post-Training · Architecture · Generative · Multimodal · Agents · Diffusion Post-Training. This round adds 4 new sheets: KL Divergence in RLHF, LLM On-Policy Distillation, Diffusion Post-Training, Diffusion Distillation. More (Flow-OPD / Audio Gen / further SOTA updates) coming — PRs welcome (see CONTRIBUTING).
🌟 Community contribution by @WinstonJQ — hosted externally on a separate repo, generously shared with the community. If it helps your interview prep, please ⭐ the source repo to thank the author 🙏
| Topic | HTML 中文 | Source |
|---|---|---|
| 具身智能高频面试题库 (VLA / 模仿学习 / RL / 世界模型 / 工程落地 / 腿足控制 / 3D 感知 / LeetCode·系统设计 — 413 题,8 卷) | 📄 CN (online) | @WinstonJQ/embodied-interview-qa |
Every tutorial uses ARIS's /interview-cheatsheet skill:
- Plan — 12-14 sections (TL;DR · Intuition · Formulas · Code · Variants · Complexity · 25 Q&A)
- Draft — 600-1000 lines of Chinese tutorial + runnable from-scratch PyTorch
- Cross-model review — fresh-thread codex GPT-5.5 xhigh audit on 10 properties (formula correctness · code runnability · citation accuracy · table-pipe escapes · callout style · personal-info leak · ...)
- Fix loop — trajectory-based; keep going if FAIL set is shrinking, stop if same issue recurs or ~6 rounds without convergence
/render-html— single-file HTML render + 13-property render audit (information fidelity · TOC · math · code highlight · safety · privacy · ...).review.json— full audit trail saved next to each tutorial
Cross-model adversarial review (executor ≠ reviewer family) is ARIS's core invariant: an LLM auditing its own output is no audit.
The only personal-site generator that fact-checks your CV before publishing.
A new skill in this repo: /homepage-generator turns your CV (.docx / .pdf / .txt) into a polished single-file academic homepage. Cross-model factual audit runs against DBLP / arXiv — wrong venue / year / author / fabricated awards block ship until corrected or explicitly overridden.
Live demo: wanshuiyin.github.io — generated by this skill from a CV + the maintainer's previous manual page as editorial reference. Preview strip is near the top of this README.
ARIS-Homepage Pipeline
📄 CV (.docx/.pdf/.txt) 🌐 Manual Homepage URL 🖼 Assets Dir
factual source editorial (optional) visual (opt.)
│ │ │
▼ │ │
┌──────────┐ │ │
│ init │ │ │
│ extract │ │ │
│ CV→text │ │ │
└─────┬────┘ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ 🤖 Calling LLM agent reads EXTRACTION_HANDOFF.md + │
│ optional manual-homepage URL + assets dir as context │
│ → writes .aris-homepage/extraction.json │
└─────────────────────────┬───────────────────────────────────────┘
▼
┌──────────┐
│ finalize │
└─────┬────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ ✋ Editable source files (truth lives here, edit in IDE): │
│ profile.yml · publications.bib · bio.md · news.md │
│ EXTRACTION_REVIEW.md (review LLM uncertain extractions) │
└─────────────────────────┬───────────────────────────────────────┘
▼
┌────────────────────────┐
│ render │
│ --persona │
│ theory-minimal │
└───────────┬────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│ Layer-1 │ │ Layer-2 │ │ Layer-2 │
│ DBLP / │ │ Codex MCP│ │ Gemini │
│ arXiv │ │ adv-rev │ │ visual │
│ fact-chk │ │ (opt.) │ │ critique │
│ (always) │ │ │ │ (opt.) │
└─────┬────┘ └──────────┘ └──────────────┘
│
▼
┌──────────────┐
│ index.html + │
│ audit-report │ ──▶ 🚀 Deploy: GitHub Pages · S3 · email · anywhere
│ .md │
└──────────────┘
Typical flow (7 steps, ~5 minutes):
1. aris-homepage init --from-cv ./cv.pdf --out ./site
2. (calling agent) read .aris-homepage/EXTRACTION_HANDOFF.md
→ fill .aris-homepage/extraction.json
3. aris-homepage finalize
4. $EDITOR profile.yml publications.bib bio.md news.md
5. aris-homepage check --strict # fact-check only
6. aris-homepage render --persona theory-minimal
7. inspect audit-report.md; fix → re-render OR --override-all
Minimum runtime: Python + a calling LLM agent.
Codex MCP optional (cross-model adversarial review).
Gemini optional (multimodal visual critique).
- Skill contract:
skills/homepage-generator/SKILL.md - Complete schema:
skills/homepage-generator/PROFILE_SCHEMA.md - Implementation:
tools/aris_homepage.py(pure-stdlib Python;pip install pyyamlaway from working) - Template:
tools/templates/homepage-theory-minimal.html
aris-homepage init --from-cv ./cv.pdf --out ./site
cd ./site
# Calling agent fills .aris-homepage/extraction.json per EXTRACTION_HANDOFF.md
aris-homepage finalize
$EDITOR profile.yml # tweak editorial choices
aris-homepage render --persona theory-minimalOutput: index.html + audit-report.md. Drop the HTML on GitHub Pages, S3, university ~user/public_html/, or attach to email — no build server. Minimum runtime is just Python + a calling LLM agent; Codex MCP optional for adversarial cross-model review; Gemini multimodal optional for visual critique.
One person can only cover so much. The hope is that many hands make this collection more complete.
Full contribution guide: CONTRIBUTING.md (English · 中文) — covers ARIS workflow invocation, strict style guide (headings / math / tables / callouts / personal-info banlist), and PR checklist.
TL;DR: use the /interview-cheatsheet + /render-html workflow to generate, then open a PR. Both skills enforce a cross-model codex GPT-5.5 xhigh review gate (math / code / citation / render fidelity), so anything merged via PR has a baseline quality floor. Skill source and tools/render_html.py are bundled in this repo so you can fork & extend.
Honest disclaimer: across the existing tutorials, the HTML structural foundations (math, code, tables, callouts, TOC, responsive layout) are solid. But the very latest frontier work in any given topic (e.g., methods released in late 2025, niche subfield updates) likely is not fully covered. If you spot something outdated or wrong, PRs and issues are equally welcome — let's keep this resource alive together.
Shared community with the main ARIS repo — the same WeChat group covers ARIS skill workflows + this tutorial collection. Join to discuss interview prep, request new cheat-sheet topics, or share corrections / contributions:
Community-built projects derived from this collection (MIT license — attribution-preserving reuse welcome):
- 大模型秋招教程 (ARIS-in-AI-Offer & Hello-Agents) by @QiZishi — an online reading index that merges all 23 Chinese tutorials here with Datawhale Hello-Agents' LLM interview Q&A, organized as clickable tutorial cards (repo · from #3).
Built something on top of these tutorials? Open an issue and we'll list it here.
ARIS — Auto Research in Sleep is one of the most-watched AI research agent skill platforms of 2025-2026. The /interview-cheatsheet + /render-html skills that produced this repo are 2 out of ARIS's 74+ skills.
- ⭐ ~10k GitHub stars — top-trending AI agent repo
- 🥇 HuggingFace Daily Papers #1 — top of the day, paper arXiv:2605.03042
- 🏆 AI Digital Crew · Project of the Day (2026.03.14)
- 📰 Featured on PaperWeekly + VoltAgent/awesome-agent-skills
- 🛠️ 74+ research skills — full lifecycle from idea exploration → experiments → papers → rebuttals → talk slides
- 🌐 7+ platforms supported — Claude Code · Codex CLI · Cursor · Trae · Antigravity · GitHub Copilot CLI · OpenClaw
- 🔧 ARIS-Code standalone CLI — multi-provider runtime, no Claude Code dependency required
Core methodology: cross-model adversarial review — executor and reviewer must come from different model families (Claude × GPT-5.5 xhigh × Gemini), so no LLM ever judges its own output. This protocol carries directly into interview cheat sheet generation: every formula, code block, and citation in every tutorial passes an independent audit (see each .review.json audit trail).
👉 ARIS main repo: https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep
If this collection — or any cheat sheet here — helped you in your interview prep / research / paper, please consider citing the underlying ARIS methodology paper:
@article{yang2026aris,
title={ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration},
author={Yang, Ruofeng and Li, Yongcan and Li, Shuai},
journal={arXiv preprint arXiv:2605.03042},
year={2026}
}Every tutorial in this repo was generated end-to-end by the ARIS /interview-cheatsheet + /render-html workflow with cross-model adversarial review (Claude × GPT-5.5 xhigh × Gemini). The citation supports the methodology behind the workflow, not just this collection.
MIT — use, modify, share, fork freely. Hope this helps your job search. 💪


