Discussion about the difference with reference implementation

It is really awesome work for understanding the computing pattern of mamba2 model. But i notice that you mentioned that 

> The model's output logits follow the same distribution as the reference implementation but are not numerically equivalent.

Could you please give some hint on why the outputs are not numerically equivalent? Is there any modification or difference of model architecture or computing pattern with the reference implementation? If not, what caused different output? Since in [John Ma](https://github.com/johnma2006)'s mamba-minimal repo it mentioned that it has equivalent output logits as the original mamba, i wonder why this inplemention would lead to non-equivalent output logits. Looking forward to your kind answer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion about the difference with reference implementation #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Discussion about the difference with reference implementation #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions