Releases · microsoft/MInference

What's Changed

Feature

[PreRelease]: SCBench by @iofu728 in #96
Feature(MInference): support transformers>=4.46.0, add chunk mlp by @iofu728 in #133, #113
Feature(MInference): update release cuda version by @iofu728 in #134
Fix(MInference): fix the search pattern feature by @iofu728 in #156

Ops

Feature(FlexPrefill): add flex-prefill by @liyucheng09 in #100
Feature(MInference): add xAttention by @iofu728 in #149
Feature(MInference): support SGLang and vLLM vertical_and_slash flash attention and index kernels by @iofu728 in #153

Model support

Feature(MInference): support LLaMA-3-70B-1M and multi-gpu PP by @iofu728 in #59
Feature(MInference): add Qwen-Turbo-1M by @iofu728 in #115, #117, #135
Feature(MInference): add kv_type unittest by @iofu728 in #132

Bug Fix

Fix(MInference): fix e2e benchmark guideline & fix A-shape multi gpu by @iofu728 in #66
Fix(MInference): fix the vs pattern loss / sqrt(dk) by @iofu728 in #70
Fix(SCBench): fix the pipeline and load dataset by @iofu728 in #101
Fix(SCBench): fix default seq length by @iofu728 in #104
Feature(SCBench): update scbench scripts by @iofu728 in #105
Fix(SCBench): bug fix when use_cache == False by @liyucheng09 in #106
Feature(MInference): support 192 dim for streaming_kernel by @iofu728 in #125
Fix the Qwen config path in model2path.py by @GuoYiFantastic in #130
Fix(MInference): fix get config by @iofu728 in #121
Fix(MInference): fix the residual by @iofu728 in #136
Fix(MInference): fix multi worker by @iofu728 in #152
fix typo by @DefTruth in #141

New Contributors

@GuoYiFantastic made their first contribution in #130
@DefTruth made their first contribution in #141

Full Changelog: v0.1.5.post1...v0.1.6

Releases: microsoft/MInference

V0.1.6: Add SCBench

What's Changed

Feature

Ops

Model support

Bug Fix

New Contributors

Contributors

Uh oh!

V0.1.5.post1: Support LLaMA-3-70B, Multi-gpu, fix kernel / sqrt(dk)

What's Changed

Contributors

Uh oh!

v0.1.5

What's Changed #27

New Contributors

Contributors

Uh oh!

V0.1.4.post4: Hotfix vllm >= 0.4.1

What's Changed

Contributors

Uh oh!

V0.1.4.post3: remove flash_attn dependency

What's Changed

Contributors

Uh oh!

V0.1.4.post2: support multi-gpu, remove pycuda

What's Changed

Contributors

Uh oh!

V0.1.4.post1: support other vllm versions

What's Changed

New Contributors

Contributors

Uh oh!

V0.1.4: Hotfix config in pip

What's Changed

Contributors

Uh oh!

V0.1.3: fix the pip setup and add bdist cache

What's Changed

Contributors

Uh oh!

v0.1.2

What's Changed

Contributors

Uh oh!