vllm-users-t-nitinkedia-sarathi-v2 sarathi implementation error #3422
bbietzsche
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
model_name = 'mistralai/Mistral-7B-v0.1'
model = LLM(model=model_name, enforce_eager=True)
params = SamplingParams(max_tokens=256)
response = model.generate(example_prompts, params)
/content/vllm-users-t-nitinkedia-sarathi-v2/vllm/model_executor/layers/sampler.py in _get_logits(self, hidden_states, embedding, embedding_bias)
40 # Get the logits for the next tokens.
41 print(hidden_states.shape, embedding.t().shape)
---> 42 print(hidden_states.detach().cpu().max(), hidden_states.detach().cpu().min())
43 print(embedding.t().detach().cpu().max(), embedding.t().detach().cpu().min())
44 logits = torch.matmul(hidden_states, embedding.t())
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.is this any usage example for vllm-users-t-nitinkedia-sarathi-v2 branch
Beta Was this translation helpful? Give feedback.
All reactions