Closed as not planned
Description
Your current environment
N/A
🐛 Describe the bug
There's a difference in the precision of SSM states.
- Authors' implementation uses
weight_type
, which is usually fp32: https://github.com/state-spaces/mamba/blob/v2.2.4/csrc/selective_scan/selective_scan.cpp#L313 - vLLM implementation uses
input_t
, which can be 16bit: https://github.com/vllm-project/vllm/blob/v0.7.2/csrc/mamba/mamba_ssm/selective_scan_fwd.cu#L131
This difference seems lowering the quality of generated texts.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.