vllm-users-t-nitinkedia-sarathi-v2 sarathi implementation error #3422

bbietzsche · 2024-03-15T06:17:47Z

bbietzsche
Mar 15, 2024

model_name = 'mistralai/Mistral-7B-v0.1'

model = LLM(model=model_name, enforce_eager=True)
params = SamplingParams(max_tokens=256)

response = model.generate(example_prompts, params)

/content/vllm-users-t-nitinkedia-sarathi-v2/vllm/model_executor/layers/sampler.py in _get_logits(self, hidden_states, embedding, embedding_bias)
40 # Get the logits for the next tokens.
41 print(hidden_states.shape, embedding.t().shape)
---> 42 print(hidden_states.detach().cpu().max(), hidden_states.detach().cpu().min())
43 print(embedding.t().detach().cpu().max(), embedding.t().detach().cpu().min())
44 logits = torch.matmul(hidden_states, embedding.t())

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

is this any usage example for vllm-users-t-nitinkedia-sarathi-v2 branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm-users-t-nitinkedia-sarathi-v2 sarathi implementation error #3422

{{title}}

Replies: 0 comments

Select a reply

vllm-users-t-nitinkedia-sarathi-v2 sarathi implementation error #3422

bbietzsche Mar 15, 2024

Replies: 0 comments

bbietzsche
Mar 15, 2024