Release vLLM v0.1.3 · vllm-project/vllm

What's Changed

Major changes

More model support: LLaMA 2, Falcon, GPT-J, Baichuan, etc.
Efficient support for MQA and GQA.
Changes in the scheduling algorithm: vLLM now uses a TGI-style continuous batching.
And many bug fixes.

All changes

fix: only response [DONE] once when streaming response. by @gesanqiu in #378
[Fix] Change /generate response-type to json for non-streaming by @nicolasf in #374
Add trust-remote-code flag to handle remote tokenizers by @codethazine in #364
avoid python list copy in sequence initialization by @LiuXiaoxuanPKU in #401
[Fix] Sort LLM outputs by request ID before return by @WoosukKwon in #402
Add trust_remote_code arg to get_config by @WoosukKwon in #405
Don't try to load training_args.bin by @lpfhs in #373
[Model] Add support for GPT-J by @AndreSlavescu in #226
fix: freeze pydantic to v1 by @kemingy in #429
Fix handling of special tokens in decoding. by @xcnick in #418
add vocab padding for LLama(Support WizardLM) by @esmeetu in #411
Fix the KeyError when loading bloom-based models by @HermitSun in #441
Optimize MQA Kernel by @zhuohan123 in #452
Offload port selection to OS by @zhangir-azerbayev in #467
[Doc] Add doc for running vLLM on the cloud by @Michaelvll in #426
[Fix] Fix the condition of max_seq_len by @zhuohan123 in #477
Add support for baichuan by @codethazine in #365
fix max seq len by @LiuXiaoxuanPKU in #489
Fixed old name reference for max_seq_len by @MoeedDar in #498
hotfix attn alibi wo head mapping by @Oliver-ss in #496
fix(ray_utils): ignore re-init error by @mspronesti in #465
Support trust_remote_code in benchmark by @wangruohui in #518
fix: enable trust-remote-code in api server & benchmark. by @gesanqiu in #509
Ray placement group support by @Yard1 in #397
Fix bad assert in initialize_cluster if PG already exists by @Yard1 in #526
Add support for LLaMA-2 by @zhuohan123 in #505
GPTJConfig has no attribute rotary. by @leegohi04517 in #532
[Fix] Fix GPTBigcoder for distributed execution by @zhuohan123 in #503
Fix paged attention testing. by @shanshanpt in #495
fixed tensor parallel is not defined by @MoeedDar in #564
Add Baichuan-7B to README by @zhuohan123 in #494
[Fix] Add chat completion Example and simplify dependencies by @zhuohan123 in #576
[Fix] Add model sequence length into model config by @zhuohan123 in #575
[Fix] fix import error of RayWorker (#604) by @zxdvd in #605
fix ModuleNotFoundError by @mklf in #599
[Doc] Change old max_seq_len to max_model_len in docs by @SiriusNEO in #622
fix biachuan-7b tp by @Sanster in #598
[Model] support baichuan-13b based on baichuan-7b by @Oliver-ss in #643
Fix log message in scheduler by @LiuXiaoxuanPKU in #652
Add Falcon support (new) by @zhuohan123 in #592
[BUG FIX] upgrade fschat version to 0.2.23 by @YHPeter in #650
Refactor scheduler by @WoosukKwon in #658
[Doc] Add Baichuan 13B to supported models by @zhuohan123 in #656
Bump up version to 0.1.3 by @zhuohan123 in #657

New Contributors

@nicolasf made their first contribution in #374
@codethazine made their first contribution in #364
@lpfhs made their first contribution in #373
@AndreSlavescu made their first contribution in #226
@kemingy made their first contribution in #429
@xcnick made their first contribution in #418
@esmeetu made their first contribution in #411
@HermitSun made their first contribution in #441
@zhangir-azerbayev made their first contribution in #467
@MoeedDar made their first contribution in #498
@Oliver-ss made their first contribution in #496
@mspronesti made their first contribution in #465
@wangruohui made their first contribution in #518
@Yard1 made their first contribution in #397
@leegohi04517 made their first contribution in #532
@shanshanpt made their first contribution in #495
@zxdvd made their first contribution in #605
@mklf made their first contribution in #599
@SiriusNEO made their first contribution in #622
@Sanster made their first contribution in #598
@YHPeter made their first contribution in #650

Full Changelog: v0.1.2...v0.1.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM v0.1.3

What's Changed

Major changes

All changes

New Contributors

Contributors