Skip to content

How to know how much real GPU memory is used? #3056

Answered by chenxu2048
ikalista asked this question in Q&A
Discussion options

You must be logged in to vote

vLLM records cache usage, logs them and exposes them via prometheus. We can also recalculate the GPU memory usage from GPU block numbers and it usage. But here are not direct GPU memory usage by kvcache in vLLM for now.

INFO 03-14 11:50:43 llm_engine.py:338] # GPU blocks: 2236, # CPU blocks: 655
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO:     Started server process [112564]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by ikalista
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants