[New blog] Inside vLLM: Anatomy of a High-Throughput LLM Inference System #80

gordicaleksa · 2025-09-05T17:50:39Z

Porting the original blog: https://www.aleksagordic.com/blog/vllm

cloudflare-workers-and-pages · 2025-09-05T17:51:04Z

Deploying vllm-blog-source with Cloudflare Pages

Latest commit:	`3476bef`
Status:	✅ Deploy successful!
Preview URL:	https://0b9d3992.vllm-blog-source.pages.dev
Branch Preview URL:	https://gordicaleksa-anatomy-vllm.vllm-blog-source.pages.dev

View logs

gordicaleksa · 2025-09-05T18:02:31Z

Ok i see i haven't done the DCO thing, let me fix that.

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

simon-mo · 2025-09-05T21:35:34Z

Very nice!! Any way we can get footnote working properly?

gordicaleksa · 2025-09-06T00:01:42Z

@simon-mo which footnote? not sure i understood?

simon-mo · 2025-09-06T00:02:57Z

Ah sorry I meant citations [1] [2]... they don't link the actual references.

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

gordicaleksa · 2025-09-06T04:59:58Z

oh ok, added!

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

gordicaleksa · 2025-09-06T15:13:00Z

@simon-mo lmk what you think now and happy to merge!

youkaichao · 2025-09-06T16:31:52Z

_posts/2025-09-05-anatomy-of-vllm.md

+  <ol type="a">
+  <li>policy setting - it can be either <b>FCFS</b> (first come first served) or <b>priority</b> (higher priority requests are served first)</li>
+  <li><code>waiting</code> and <code>running</code> queues</li>
+  <li>KV cache manager - the heart of paged attention [[3]](#ref-3)</li>


this reference rendering seems to be broken inside <li></li>

youkaichao · 2025-09-06T16:43:42Z

_posts/2025-09-05-anatomy-of-vllm.md

+
+> [!NOTE]
+> Block size for a standard transformer layer (non-MLA [[4]](#ref-4)) is computed as follows:
+> 2 * <code>block_size</code> (default=16) * <code>num_kv_heads</code> * <code>head_size</code> * <code>dtype_num_bytes</code> (2 for bf16)


the first 2 is for k and v right? 2 for bf16 is for dtype_num_bytes

youkaichao

my nit comments are not blocking. we can publish first and then fix nit comments.

gordicaleksa added 9 commits September 5, 2025 11:07

Add anatomy of VLLM blog - wip, needs formatting

6394e60

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

Add anchor links for chapters

f1ac4a1

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

Adding code annotation

f786d14

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

Add more code formatting - up to FSM section

0274f3f

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

Add more code formatting

f09488d

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

Add links, further code formatting

fb4538e

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

Fix few typos

8e09382

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

Few minor fixes

9e3b6b2

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

Fix references

c16a69e

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

gordicaleksa force-pushed the gordicaleksa/anatomy-vllm branch from 76ff8b4 to c16a69e Compare September 5, 2025 18:08

gordicaleksa changed the title ~~[New blog] Inside vLLM: Anatomy of a High-Throughput LLM Inference System - NEW~~ [New blog] Inside vLLM: Anatomy of a High-Throughput LLM Inference System Sep 5, 2025

Add links for references

66b6d03

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

gordicaleksa force-pushed the gordicaleksa/anatomy-vllm branch from 293aea1 to 66b6d03 Compare September 6, 2025 04:59

gordicaleksa added 2 commits September 5, 2025 22:04

Fix href->id bug

d452aad

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

Replace div with a tag

3476bef

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>

youkaichao reviewed Sep 6, 2025

View reviewed changes

youkaichao approved these changes Sep 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[New blog] Inside vLLM: Anatomy of a High-Throughput LLM Inference System #80

[New blog] Inside vLLM: Anatomy of a High-Throughput LLM Inference System #80

Uh oh!

gordicaleksa commented Sep 5, 2025

Uh oh!

cloudflare-workers-and-pages bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

gordicaleksa commented Sep 5, 2025

Uh oh!

simon-mo commented Sep 5, 2025

Uh oh!

gordicaleksa commented Sep 6, 2025

Uh oh!

simon-mo commented Sep 6, 2025

Uh oh!

gordicaleksa commented Sep 6, 2025 •

edited

Loading

Uh oh!

gordicaleksa commented Sep 6, 2025

Uh oh!

youkaichao Sep 6, 2025

Uh oh!

youkaichao Sep 6, 2025

Uh oh!

youkaichao left a comment

Uh oh!

Uh oh!

[New blog] Inside vLLM: Anatomy of a High-Throughput LLM Inference System #80

Are you sure you want to change the base?

[New blog] Inside vLLM: Anatomy of a High-Throughput LLM Inference System #80

Uh oh!

Conversation

gordicaleksa commented Sep 5, 2025

Uh oh!

cloudflare-workers-and-pages bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying vllm-blog-source with Cloudflare Pages

Uh oh!

gordicaleksa commented Sep 5, 2025

Uh oh!

simon-mo commented Sep 5, 2025

Uh oh!

gordicaleksa commented Sep 6, 2025

Uh oh!

simon-mo commented Sep 6, 2025

Uh oh!

gordicaleksa commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gordicaleksa commented Sep 6, 2025

Uh oh!

youkaichao Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cloudflare-workers-and-pages bot commented Sep 5, 2025 •

edited

Loading

gordicaleksa commented Sep 6, 2025 •

edited

Loading