[Bugfix][Model] Fix inference for Hunyuan dense models #25354

Anionex · 2025-09-21T18:46:59Z

Purpose

This PR fixes a regression that broke inference support for Hunyuan v1 dense models, such as Hunyuan-1.8B-Instruct.

After recent updates to the model support files, the logic no longer correctly handles the architecture of the dense variant, causing the inference process to fail. This functionality was working correctly in previous versions.

This pr restores the necessary logic to ensure that Hunyuan v1 dense models can be loaded and used for inference successfully.

Test Plan

export VLLM_USE_MODELSCOPE=True
python3 -m vllm.entrypoints.openai.api_server --model Tencent-Hunyuan/Hunyuan-1.8B-Instruct --trust_remote_code

curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
  "model": "Tencent-Hunyuan/Hunyuan-1.8B-Instruct",
  "messages": [
    {"role": "user", "content": "please introduce yourself"}
  ],
  "max_tokens": 100,
  "temperature": 0.7
}'

Test Result

Before fix:

...
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=16004)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=16004)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/utils/__init__.py", line 3069, in run_method
(EngineCore_DP0 pid=16004)     return func(*args, **kwargs)
(EngineCore_DP0 pid=16004)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/v1/worker/gpu_worker.py", line 214, in load_model
(EngineCore_DP0 pid=16004)     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/v1/worker/gpu_model_runner.py", line 2563, in load_model
(EngineCore_DP0 pid=16004)     self.model = model_loader.load_model(
(EngineCore_DP0 pid=16004)                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=16004)     model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=16004)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=16004)     return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=16004)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/model_executor/models/hunyuan_v1.py", line 950, in __init__
(EngineCore_DP0 pid=16004)     raise RuntimeError("No HunYuanMoE layer found in model.layers.")
(EngineCore_DP0 pid=16004) RuntimeError: No HunYuanMoE layer found in model.layers.

After fix:

{"id":"chatcmpl-4d13a0bc43544e2fa6aca38e0d9733c2","object":"chat.completion","created":1758472457,"model":"Tencent-Hunyuan/Hunyuan-1.8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\nOkay, the user asked me to introduce myself. Let me start by thinking about what they might need. They could be a friend looking for an introduction, or maybe a new person they're trying to meet. Since I don't have personal experiences, I should keep it friendly and general.\n\nFirst, I'll mention my name and where I'm from to give context. Then, highlight key aspects like personality traits—curious, approachable, maybe a bit adventurous. Mentioning hobbies","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":6,"total_tokens":106,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: anion <1005128408@qq.com>

gemini-code-assist

Code Review

This pull request correctly fixes a regression that prevented dense Hunyuan v1 models from running. The fix involves refactoring the model classes to better separate the logic for dense and Mixture-of-Experts (MoE) variants, introducing a common base class HunyuanV1ModelBase. While the fix is correct, there is significant code duplication in the __init__ methods of HunYuanDenseV1Base and HunYuanMoEV1Base. I've provided suggestions to move the common initialization logic to the base class to improve maintainability and reduce redundancy.

vllm/model_executor/models/hunyuan_v1.py

Signed-off-by: anion <1005128408@qq.com>

Anionex · 2025-09-22T05:58:23Z

/gemini review

gemini-code-assist

Code Review

This pull request effectively addresses a regression that prevented inference on Hunyuan v1 dense models. The refactoring of the class hierarchy, by introducing HunyuanV1ModelBase for common logic and creating separate HunYuanMoEV1Base and HunYuanDenseV1Base for MoE and dense variants respectively, is a clean and correct approach. This properly isolates the MoE-specific logic that was causing runtime errors for dense models. The changes are well-structured and directly fix the reported issue. I have no further recommendations.

Anionex · 2025-09-22T06:42:22Z

Hi @abmfy,
This PR fixes a regression issue with the Hunyuan v1 dense model—previously, in order to support the EPLB functionality for the MoE model, the inference of the dense model was unexpectedly affected.

I noticed that you have reviewed related modifications and familiar with this area. Would you mind helping to review this? Let me know if you have any further suggestions for this PR.Thanks :)

abmfy

LGTM, thanks for fixing this issue! @simon-mo

Anionex · 2025-09-23T11:40:26Z

Hi @abmfy
Thank you again for your review and approval. But it seems this PR now need another review to merge. Would you know someone else who can help review the code?
Thx :)

Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>

tourcmd · 2025-09-30T12:49:29Z

@jeejeelee @Isotr0py It seems the other reviewers are quite busy. Could you please help merge this PR? Thank you!

…25354) Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>

Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

[Bugfix][Model] Fix inference for Hunyuan dense models

99131d8

Signed-off-by: anion <1005128408@qq.com>

gemini-code-assist bot reviewed Sep 21, 2025

View reviewed changes

vllm/model_executor/models/hunyuan_v1.py Show resolved Hide resolved

vllm/model_executor/models/hunyuan_v1.py Outdated Show resolved Hide resolved

Anionex and others added 2 commits September 22, 2025 03:10

refactor: fix HunYuan base class init and consolidate common logic

e2b508e

Signed-off-by: anion <1005128408@qq.com>

Merge branch 'main' into hunyuan_dense

23f481e

gemini-code-assist bot reviewed Sep 22, 2025

View reviewed changes

abmfy approved these changes Sep 22, 2025

View reviewed changes

Merge branch 'main' into hunyuan_dense

e6eeee1

Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>

Isotr0py approved these changes Sep 30, 2025

View reviewed changes

Isotr0py enabled auto-merge (squash) September 30, 2025 12:54

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025

Isotr0py merged commit f4db5e6 into vllm-project:main Sep 30, 2025
50 checks passed

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

[Bugfix][Model] Fix inference for Hunyuan dense models (vllm-project#…

8721e48

…25354) Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][Model] Fix inference for Hunyuan dense models #25354

[Bugfix][Model] Fix inference for Hunyuan dense models #25354

Uh oh!

Anionex commented Sep 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Anionex commented Sep 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Anionex commented Sep 22, 2025

Uh oh!

abmfy left a comment

Uh oh!

Anionex commented Sep 23, 2025 •

edited

Loading

Uh oh!

tourcmd commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Bugfix][Model] Fix inference for Hunyuan dense models #25354

[Bugfix][Model] Fix inference for Hunyuan dense models #25354

Uh oh!

Conversation

Anionex commented Sep 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Anionex commented Sep 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Anionex commented Sep 22, 2025

Uh oh!

abmfy left a comment

Choose a reason for hiding this comment

Uh oh!

Anionex commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tourcmd commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Anionex commented Sep 21, 2025 •

edited by github-actions bot

Loading

Anionex commented Sep 23, 2025 •

edited

Loading