Skip to content

[Question]What does the service parameter max_tokens_in_paged_kv_cache mean? #67

@wjj19950828

Description

@wjj19950828

I have run through the entire process of llama2 and want to stress test and see the benchmark indicators.

Regarding max_tokens_in_paged_kv_cache, I may not understand it well

Is it similar to the max_num_batched_tokens parameter of vllm?

Thanks~

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions