Skip to content

Conversation

gordicaleksa
Copy link
Collaborator

Porting the original blog: https://www.aleksagordic.com/blog/vllm

Copy link

cloudflare-workers-and-pages bot commented Sep 5, 2025

Deploying vllm-blog-source with  Cloudflare Pages  Cloudflare Pages

Latest commit: 3476bef
Status: ✅  Deploy successful!
Preview URL: https://0b9d3992.vllm-blog-source.pages.dev
Branch Preview URL: https://gordicaleksa-anatomy-vllm.vllm-blog-source.pages.dev

View logs

@gordicaleksa
Copy link
Collaborator Author

Ok i see i haven't done the DCO thing, let me fix that.

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
@gordicaleksa gordicaleksa force-pushed the gordicaleksa/anatomy-vllm branch from 76ff8b4 to c16a69e Compare September 5, 2025 18:08
@gordicaleksa gordicaleksa changed the title [New blog] Inside vLLM: Anatomy of a High-Throughput LLM Inference System - NEW [New blog] Inside vLLM: Anatomy of a High-Throughput LLM Inference System Sep 5, 2025
@simon-mo
Copy link
Contributor

simon-mo commented Sep 5, 2025

Very nice!! Any way we can get footnote working properly?

@gordicaleksa
Copy link
Collaborator Author

@simon-mo which footnote? not sure i understood?

@simon-mo
Copy link
Contributor

simon-mo commented Sep 6, 2025

Ah sorry I meant citations [1] [2]... they don't link the actual references.

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
@gordicaleksa gordicaleksa force-pushed the gordicaleksa/anatomy-vllm branch from 293aea1 to 66b6d03 Compare September 6, 2025 04:59
@gordicaleksa
Copy link
Collaborator Author

gordicaleksa commented Sep 6, 2025

oh ok, added!

Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
Signed-off-by: Aleksa Gordic <gordicaleksa@gmail.com>
@gordicaleksa
Copy link
Collaborator Author

@simon-mo lmk what you think now and happy to merge!

<ol type="a">
<li>policy setting - it can be either <b>FCFS</b> (first come first served) or <b>priority</b> (higher priority requests are served first)</li>
<li><code>waiting</code> and <code>running</code> queues</li>
<li>KV cache manager - the heart of paged attention [[3]](#ref-3)</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reference rendering seems to be broken inside <li></li>


> [!NOTE]
> Block size for a standard transformer layer (non-MLA [[4]](#ref-4)) is computed as follows:
> 2 * <code>block_size</code> (default=16) * <code>num_kv_heads</code> * <code>head_size</code> * <code>dtype_num_bytes</code> (2 for bf16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first 2 is for k and v right? 2 for bf16 is for dtype_num_bytes

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my nit comments are not blocking. we can publish first and then fix nit comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants