-
Notifications
You must be signed in to change notification settings - Fork 517
Insights: InternLM/lmdeploy
Overview
Could not load contribution data
Please try again later
18 Pull requests merged by 9 people
-
add deepep
#3313 merged
Mar 28, 2025 -
refactor dlinfer rope
#3326 merged
Mar 27, 2025 -
Fix the finish_reason
#3350 merged
Mar 27, 2025 -
optimize quant-fp8 kernel
#3345 merged
Mar 27, 2025 -
merge dev into main
#3348 merged
Mar 26, 2025 -
support dp decoding with cudagraph
#3311 merged
Mar 26, 2025 -
remove think_end_token_id in streaming content
#3327 merged
Mar 26, 2025 -
[ascend]support deepseekv2
#3206 merged
Mar 26, 2025 -
Fix finish reasons
#3338 merged
Mar 26, 2025 -
optimize mla, remove load
v
#3334 merged
Mar 26, 2025 -
Fix Qwen3MoE config parsing
#3336 merged
Mar 25, 2025 -
add
v
check#3307 merged
Mar 25, 2025 -
[Feature] support qwen3 and qwen3-moe for pytorch engine
#3315 merged
Mar 25, 2025 -
[ascend] support multi nodes on ascend device
#3260 merged
Mar 25, 2025 -
Add Qwen3 and Qwen3MoE
#3305 merged
Mar 25, 2025 -
Verbose log
#3329 merged
Mar 24, 2025 -
Add mixed DP + TP
#3229 merged
Mar 24, 2025 -
[ci] add think function testcase
#3299 merged
Mar 24, 2025
11 Pull requests opened by 11 people
-
LMDeploy Distserve
#3304 opened
Mar 22, 2025 -
Optimize internvit
#3316 opened
Mar 24, 2025 -
Improve turbomind's prefix cache
#3332 opened
Mar 25, 2025 -
Create SECURITY.md
#3333 opened
Mar 25, 2025 -
[maca] support multinode for maca.
#3340 opened
Mar 26, 2025 -
Update block_trie.py
#3353 opened
Mar 27, 2025 -
Add AIOHTTP_TIMEOUT env var for proxy server
#3355 opened
Mar 27, 2025 -
optimize moe get sorted idx
#3356 opened
Mar 27, 2025 -
[Fix] fix `image_token_id` error of qwen2-vl and deepseek
#3358 opened
Mar 27, 2025 -
[WIP] Add Gloo communication to turobmind
#3362 opened
Mar 28, 2025 -
Optimize ascend moe
#3364 opened
Mar 28, 2025
18 Issues closed by 9 people
-
[Bug] Qwen2.5-VL-7B-AWQ报错
#3337 closed
Mar 28, 2025 -
increase the GPU utilization rate, even when the parameter "--cache-max-entry-count" is set to 0.99
#3359 closed
Mar 28, 2025 -
[Docs] how to add max_pixels ?
#3352 closed
Mar 28, 2025 -
[Bug] 部署Qwen/Qwen2.5-Omni-7B失败
#3357 closed
Mar 27, 2025 -
[Bug] [v0.7.2.post1] 开启工具调用支持后,finish_reason始终为tool_calls
#3349 closed
Mar 27, 2025 -
[Bug] Qwen2.5-VL-7B-Instruct error:ModuleNotFoundError: No module named 'partial_json_parser'
#3351 closed
Mar 26, 2025 -
DeepSeek-R1-671B Floating point exception
#3297 closed
Mar 26, 2025 -
[Bug] Floating point exception(core dumped)
#3342 closed
Mar 26, 2025 -
[Bug](RayWorkerWrapper pid=23770) Fatal Python error: Floating point exception
#3343 closed
Mar 26, 2025 -
[Bug]
#3328 closed
Mar 26, 2025 -
[Bug] 流式调用DeepSeek R1时,content字段开头多了个</think>。非流式调用没有这个问题。
#3321 closed
Mar 25, 2025 -
[Bug] QWQ 和DeepSeek-R1-Distill-Qwen 缺少 <think>
#3303 closed
Mar 25, 2025 -
[Bug] RuntimeError: Triton Error [CUDA]: invalid argument
#3322 closed
Mar 25, 2025 -
How to set detailed configuration when using lmdeploy API?
#3310 closed
Mar 24, 2025 -
[Bug] function_call failed after updating to v0.7.2 with【--tool-call-parser qwen】
#3317 closed
Mar 24, 2025 -
请问下int4 kvcache在做mma的时候是如何反量化的?
#3320 closed
Mar 24, 2025 -
[Bug] bug for qwen-2.5-vl-7B-AWQ ,transformers &LMDeploy version conflict
#3250 closed
Mar 24, 2025 -
[Feature] 自定义的多模态模型(不在HF中)如何适配进行推理加速
#2599 closed
Mar 23, 2025
20 Issues opened by 18 people
-
[Bug] Win10运行直接没有反应了
#3365 opened
Mar 29, 2025 -
[Feature] 量化lite命令能否支持单机多卡
#3361 opened
Mar 28, 2025 -
[Bug] asyncio.exceptions.InvalidStateError: invalid state
#3360 opened
Mar 28, 2025 -
[Bug] Ascend v0.7.2.post1,对serving api测速,概率性卡死
#3354 opened
Mar 27, 2025 -
[Bug] AttributeError: '_turbomind.AbstractTransformerModel' object has no attribute 'create_nccl_params'
#3347 opened
Mar 26, 2025 -
高并发推理效果变差的问题
#3344 opened
Mar 26, 2025 -
Ascend v0.7.2.post1,对serving api测速,第一次测试吞吐少40%+
#3341 opened
Mar 26, 2025 -
[Bug] 用lite命令auto_awq量化的QwQ-32B,推理时在单节点多卡机子上指定tp=2报错,在ascend和cuda环境下一样的报错。
#3339 opened
Mar 26, 2025 -
[Bug] 在A100 40GB上使用profile_restful_api.py进行32B AWQ模型长文本基准测试出现bug
#3331 opened
Mar 25, 2025 -
[Docs] Ascend (Atlas 800T A2)平台是否支持multi-lora功能
#3325 opened
Mar 24, 2025 -
[Bug] lmdeploy 0.7.2-cu12 inference error while serving MiniCPM-V-2.6
#3324 opened
Mar 24, 2025 -
lmdeploy - ERROR - base.py:53 - ModuleNotFoundError: No module named 'dlinfer'
#3319 opened
Mar 24, 2025 -
Using Smooth Quant to perform FP8 quantization corrupts the model
#3318 opened
Mar 24, 2025 -
lmdeploy量化后速度比浮点速度慢的问题???InternVL2.5-4B
#3314 opened
Mar 24, 2025 -
lmdeploy对于base64格式图片有长度限制么,如何修改
#3312 opened
Mar 24, 2025 -
The difference in inference speed between different input methods.
#3309 opened
Mar 23, 2025 -
[Bug] 开启prefix cache后,有时对于同prompt推理的结果不一样
#3308 opened
Mar 23, 2025 -
[Feature] Support Aya Vision
#3306 opened
Mar 22, 2025
16 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
support ascend w8a8 graph_mode
#3267 commented on
Mar 28, 2025 • 1 new comment -
[Bug] response里需要生成空格's时,text会错误的输出
#2190 commented on
Mar 24, 2025 • 0 new comments -
[Feature] turbomind后端是否会支持guided_decoding
#2771 commented on
Mar 25, 2025 • 0 new comments -
[Bug] 部署的qwen2.5 vl 7b之后,跑着跑着报错,接口调用失败怎么回事?
#3244 commented on
Mar 25, 2025 • 0 new comments -
[Bug] not all VLM use `pad_token_id` as image's pad
#3285 commented on
Mar 26, 2025 • 0 new comments -
[Bug] lmdeploy部署大模型报错 2025-03-19 04:28:52,017 - lmdeploy - ERROR - async_engine.py:777 - session 85 finished, reason "error"
#3280 commented on
Mar 27, 2025 • 0 new comments -
[Bug] RuntimeError: CUDA error: an illegal memory access was encountered
#3258 commented on
Mar 28, 2025 • 0 new comments -
[QA] lmdeploy部署internvl_38支持并发吗?
#3296 commented on
Mar 29, 2025 • 0 new comments -
[Bug] 部署qwen2.5-vl-7b之后,如果想传入视频的话,部署命令需要调整什么吗?
#3249 commented on
Mar 29, 2025 • 0 new comments -
使用batch推理进行多轮对话
#3238 commented on
Mar 29, 2025 • 0 new comments -
[Bug] 批量推理时显示eos stop_words_list 长度超限,最后(core dumped)
#3233 commented on
Mar 29, 2025 • 0 new comments -
[HELP] 求好心人帮忙量化 `DeepSeek-R1-Distill-Qwen-32B` 到 `int8`
#3200 commented on
Mar 29, 2025 • 0 new comments -
Add metrics endpoint
#1423 commented on
Mar 26, 2025 • 0 new comments -
support ascend 310P
#3085 commented on
Mar 27, 2025 • 0 new comments -
use half/bf16 lm_head output
#3213 commented on
Mar 28, 2025 • 0 new comments -
add deepseekv3 doc
#3265 commented on
Mar 26, 2025 • 0 new comments