[Bug]: Mamba should return states in fp32

### Your current environment

N/A

### 🐛 Describe the bug

There's a difference in the precision of SSM states.

- Authors' implementation uses `weight_type`, which is usually fp32: https://github.com/state-spaces/mamba/blob/v2.2.4/csrc/selective_scan/selective_scan.cpp#L313
- vLLM implementation uses `input_t`, which can be 16bit: https://github.com/vllm-project/vllm/blob/v0.7.2/csrc/mamba/mamba_ssm/selective_scan_fwd.cu#L131

This difference seems lowering the quality of generated texts.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Mamba should return states in fp32 #13466

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Mamba should return states in fp32 #13466

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions