You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hai guys,
thank you for making this super library.
i have a question about the output of vllm
i'm using GPU RTX A6000 50GB cuda 12 with model Vicuna13B-v1.5-4k from lmsys
vllm is serve with gpu_memory_utilization 0.8
the parameter that i change for request is:
max_token 4096
temperature 0
i'm make custom prompt with context from text/document.
why sometimes the output is not complete ?
The text was updated successfully, but these errors were encountered:
hai guys,
thank you for making this super library.
i have a question about the output of vllm
i'm using GPU RTX A6000 50GB cuda 12 with model Vicuna13B-v1.5-4k from lmsys
vllm is serve with gpu_memory_utilization 0.8
the parameter that i change for request is:
i'm make custom prompt with context from text/document.
why sometimes the output is not complete ?
The text was updated successfully, but these errors were encountered: