Pulse · InternLM/lmdeploy · GitHub

March 22, 2025 – March 29, 2025

Overview

29 Active pull requests

38 Active issues

Could not load contribution data

Please try again later

18 Pull requests merged by 9 people

add deepep
#3313 merged Mar 28, 2025
refactor dlinfer rope
#3326 merged Mar 27, 2025
Fix the finish_reason
#3350 merged Mar 27, 2025
optimize quant-fp8 kernel
#3345 merged Mar 27, 2025
merge dev into main
#3348 merged Mar 26, 2025
support dp decoding with cudagraph
#3311 merged Mar 26, 2025
remove think_end_token_id in streaming content
#3327 merged Mar 26, 2025
[ascend]support deepseekv2
#3206 merged Mar 26, 2025
Fix finish reasons
#3338 merged Mar 26, 2025
optimize mla, remove load v
#3334 merged Mar 26, 2025
Fix Qwen3MoE config parsing
#3336 merged Mar 25, 2025
add v check
#3307 merged Mar 25, 2025
[Feature] support qwen3 and qwen3-moe for pytorch engine
#3315 merged Mar 25, 2025
[ascend] support multi nodes on ascend device
#3260 merged Mar 25, 2025
Add Qwen3 and Qwen3MoE
#3305 merged Mar 25, 2025
Verbose log
#3329 merged Mar 24, 2025
Add mixed DP + TP
#3229 merged Mar 24, 2025
[ci] add think function testcase
#3299 merged Mar 24, 2025

11 Pull requests opened by 11 people

LMDeploy Distserve
#3304 opened Mar 22, 2025
Optimize internvit
#3316 opened Mar 24, 2025
Improve turbomind's prefix cache
#3332 opened Mar 25, 2025
Create SECURITY.md
#3333 opened Mar 25, 2025
[maca] support multinode for maca.
#3340 opened Mar 26, 2025
Update block_trie.py
#3353 opened Mar 27, 2025
Add AIOHTTP_TIMEOUT env var for proxy server
#3355 opened Mar 27, 2025
optimize moe get sorted idx
#3356 opened Mar 27, 2025
[Fix] fix `image_token_id` error of qwen2-vl and deepseek
#3358 opened Mar 27, 2025
[WIP] Add Gloo communication to turobmind
#3362 opened Mar 28, 2025
Optimize ascend moe
#3364 opened Mar 28, 2025

18 Issues closed by 9 people

[Bug] Qwen2.5-VL-7B-AWQ报错
#3337 closed Mar 28, 2025
increase the GPU utilization rate, even when the parameter "--cache-max-entry-count" is set to 0.99
#3359 closed Mar 28, 2025
[Docs] how to add max_pixels ?
#3352 closed Mar 28, 2025
[Bug] 部署Qwen/Qwen2.5-Omni-7B失败
#3357 closed Mar 27, 2025
[Bug] [v0.7.2.post1] 开启工具调用支持后，finish_reason始终为tool_calls
#3349 closed Mar 27, 2025
[Bug] Qwen2.5-VL-7B-Instruct error：ModuleNotFoundError: No module named 'partial_json_parser'
#3351 closed Mar 26, 2025
DeepSeek-R1-671B Floating point exception
#3297 closed Mar 26, 2025
[Bug] Floating point exception(core dumped)
#3342 closed Mar 26, 2025
[Bug](RayWorkerWrapper pid=23770) Fatal Python error: Floating point exception
#3343 closed Mar 26, 2025
[Bug]
#3328 closed Mar 26, 2025
[Bug] 流式调用DeepSeek R1时，content字段开头多了个</think>。非流式调用没有这个问题。
#3321 closed Mar 25, 2025
[Bug] QWQ 和DeepSeek-R1-Distill-Qwen 缺少 <think>
#3303 closed Mar 25, 2025
[Bug] RuntimeError: Triton Error [CUDA]: invalid argument
#3322 closed Mar 25, 2025
How to set detailed configuration when using lmdeploy API?
#3310 closed Mar 24, 2025
[Bug] function_call failed after updating to v0.7.2 with【--tool-call-parser qwen】
#3317 closed Mar 24, 2025
请问下int4 kvcache在做mma的时候是如何反量化的？
#3320 closed Mar 24, 2025
[Bug] bug for qwen-2.5-vl-7B-AWQ ,transformers &LMDeploy version conflict
#3250 closed Mar 24, 2025
[Feature] 自定义的多模态模型(不在HF中)如何适配进行推理加速
#2599 closed Mar 23, 2025

20 Issues opened by 18 people

[Bug] Win10运行直接没有反应了
#3365 opened Mar 29, 2025
[Bug] lmdeploy 部署internvl-38这种，张量并行使用3或者6这种，是否建议？效果和2卡部署有差别吗（比如并发，显存占用，调用速度）？是不是卡数却多性能越强（比如并发，显存占用，调用速度）？
#3363 opened Mar 28, 2025
[Feature] 量化lite命令能否支持单机多卡
#3361 opened Mar 28, 2025
[Bug] asyncio.exceptions.InvalidStateError: invalid state
#3360 opened Mar 28, 2025
[Bug] Ascend v0.7.2.post1，对serving api测速，概率性卡死
#3354 opened Mar 27, 2025
[Bug] AttributeError: '_turbomind.AbstractTransformerModel' object has no attribute 'create_nccl_params'
#3347 opened Mar 26, 2025
高并发推理效果变差的问题
#3344 opened Mar 26, 2025
Ascend v0.7.2.post1，对serving api测速，第一次测试吞吐少40%+
#3341 opened Mar 26, 2025
[Bug] 用lite命令auto_awq量化的QwQ-32B，推理时在单节点多卡机子上指定tp=2报错，在ascend和cuda环境下一样的报错。
#3339 opened Mar 26, 2025
[Bug] 在A100 40GB上使用profile_restful_api.py进行32B AWQ模型长文本基准测试出现bug
#3331 opened Mar 25, 2025
[Bug] 推理lora微调后的模型，报错：prompt.split(IMAGE_TOKEN)，AttributeError: 'NoneType' object has no attribute 'split'
#3330 opened Mar 25, 2025
[Docs] Ascend (Atlas 800T A2)平台是否支持multi-lora功能
#3325 opened Mar 24, 2025
[Bug] lmdeploy 0.7.2-cu12 inference error while serving MiniCPM-V-2.6
#3324 opened Mar 24, 2025
lmdeploy - ERROR - base.py:53 - ModuleNotFoundError: No module named 'dlinfer'
#3319 opened Mar 24, 2025
Using Smooth Quant to perform FP8 quantization corrupts the model
#3318 opened Mar 24, 2025
lmdeploy量化后速度比浮点速度慢的问题？？？InternVL2.5-4B
#3314 opened Mar 24, 2025
lmdeploy对于base64格式图片有长度限制么，如何修改
#3312 opened Mar 24, 2025
The difference in inference speed between different input methods.
#3309 opened Mar 23, 2025
[Bug] 开启prefix cache后，有时对于同prompt推理的结果不一样
#3308 opened Mar 23, 2025
[Feature] Support Aya Vision
#3306 opened Mar 22, 2025

16 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

support ascend w8a8 graph_mode
#3267 commented on Mar 28, 2025 • 1 new comment
[Bug] response里需要生成空格's时，text会错误的输出
#2190 commented on Mar 24, 2025 • 0 new comments
[Feature] turbomind后端是否会支持guided_decoding
#2771 commented on Mar 25, 2025 • 0 new comments
[Bug] 部署的qwen2.5 vl 7b之后，跑着跑着报错，接口调用失败怎么回事？
#3244 commented on Mar 25, 2025 • 0 new comments
[Bug] not all VLM use `pad_token_id` as image's pad
#3285 commented on Mar 26, 2025 • 0 new comments
[Bug] lmdeploy部署大模型报错 2025-03-19 04:28:52,017 - lmdeploy - ERROR - async_engine.py:777 - session 85 finished, reason "error"
#3280 commented on Mar 27, 2025 • 0 new comments
[Bug] RuntimeError: CUDA error: an illegal memory access was encountered
#3258 commented on Mar 28, 2025 • 0 new comments
[QA] lmdeploy部署internvl_38支持并发吗？
#3296 commented on Mar 29, 2025 • 0 new comments
[Bug] 部署qwen2.5-vl-7b之后，如果想传入视频的话，部署命令需要调整什么吗？
#3249 commented on Mar 29, 2025 • 0 new comments
使用batch推理进行多轮对话
#3238 commented on Mar 29, 2025 • 0 new comments
[Bug] 批量推理时显示eos stop_words_list 长度超限，最后(core dumped)
#3233 commented on Mar 29, 2025 • 0 new comments
[HELP] 求好心人帮忙量化 `DeepSeek-R1-Distill-Qwen-32B` 到 `int8`
#3200 commented on Mar 29, 2025 • 0 new comments
Add metrics endpoint
#1423 commented on Mar 26, 2025 • 0 new comments
support ascend 310P
#3085 commented on Mar 27, 2025 • 0 new comments
use half/bf16 lm_head output
#3213 commented on Mar 28, 2025 • 0 new comments
add deepseekv3 doc
#3265 commented on Mar 26, 2025 • 0 new comments