Skip to content

Conversation

Anionex
Copy link
Contributor

@Anionex Anionex commented Sep 21, 2025

Purpose

This PR fixes a regression that broke inference support for Hunyuan v1 dense models, such as Hunyuan-1.8B-Instruct.

After recent updates to the model support files, the logic no longer correctly handles the architecture of the dense variant, causing the inference process to fail. This functionality was working correctly in previous versions.

This pr restores the necessary logic to ensure that Hunyuan v1 dense models can be loaded and used for inference successfully.

Test Plan

export VLLM_USE_MODELSCOPE=True
python3 -m vllm.entrypoints.openai.api_server --model Tencent-Hunyuan/Hunyuan-1.8B-Instruct --trust_remote_code 
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
  "model": "Tencent-Hunyuan/Hunyuan-1.8B-Instruct",
  "messages": [
    {"role": "user", "content": "please introduce yourself"}
  ],
  "max_tokens": 100,
  "temperature": 0.7
}'

Test Result

  • Before fix:
...
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=16004)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=16004)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/utils/__init__.py", line 3069, in run_method
(EngineCore_DP0 pid=16004)     return func(*args, **kwargs)
(EngineCore_DP0 pid=16004)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/v1/worker/gpu_worker.py", line 214, in load_model
(EngineCore_DP0 pid=16004)     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/v1/worker/gpu_model_runner.py", line 2563, in load_model
(EngineCore_DP0 pid=16004)     self.model = model_loader.load_model(
(EngineCore_DP0 pid=16004)                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=16004)     model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=16004)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=16004)     return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=16004)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=16004)   File "/root/vllm/vllm/model_executor/models/hunyuan_v1.py", line 950, in __init__
(EngineCore_DP0 pid=16004)     raise RuntimeError("No HunYuanMoE layer found in model.layers.")
(EngineCore_DP0 pid=16004) RuntimeError: No HunYuanMoE layer found in model.layers.
  • After fix:
{"id":"chatcmpl-4d13a0bc43544e2fa6aca38e0d9733c2","object":"chat.completion","created":1758472457,"model":"Tencent-Hunyuan/Hunyuan-1.8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\nOkay, the user asked me to introduce myself. Let me start by thinking about what they might need. They could be a friend looking for an introduction, or maybe a new person they're trying to meet. Since I don't have personal experiences, I should keep it friendly and general.\n\nFirst, I'll mention my name and where I'm from to give context. Then, highlight key aspects like personality traits—curious, approachable, maybe a bit adventurous. Mentioning hobbies","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":6,"total_tokens":106,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: anion <1005128408@qq.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a regression that prevented dense Hunyuan v1 models from running. The fix involves refactoring the model classes to better separate the logic for dense and Mixture-of-Experts (MoE) variants, introducing a common base class HunyuanV1ModelBase. While the fix is correct, there is significant code duplication in the __init__ methods of HunYuanDenseV1Base and HunYuanMoEV1Base. I've provided suggestions to move the common initialization logic to the base class to improve maintainability and reduce redundancy.

@Anionex
Copy link
Contributor Author

Anionex commented Sep 22, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a regression that prevented inference on Hunyuan v1 dense models. The refactoring of the class hierarchy, by introducing HunyuanV1ModelBase for common logic and creating separate HunYuanMoEV1Base and HunYuanDenseV1Base for MoE and dense variants respectively, is a clean and correct approach. This properly isolates the MoE-specific logic that was causing runtime errors for dense models. The changes are well-structured and directly fix the reported issue. I have no further recommendations.

@Anionex
Copy link
Contributor Author

Anionex commented Sep 22, 2025

Hi @abmfy,
This PR fixes a regression issue with the Hunyuan v1 dense model—previously, in order to support the EPLB functionality for the MoE model, the inference of the dense model was unexpectedly affected.

I noticed that you have reviewed related modifications and familiar with this area. Would you mind helping to review this? Let me know if you have any further suggestions for this PR.Thanks :)

Copy link
Member

@abmfy abmfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this issue! @simon-mo

@Anionex
Copy link
Contributor Author

Anionex commented Sep 23, 2025

Hi @abmfy
Thank you again for your review and approval. But it seems this PR now need another review to merge. Would you know someone else who can help review the code?
Thx :)

Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>
@tourcmd
Copy link

tourcmd commented Sep 30, 2025

@jeejeelee @Isotr0py It seems the other reviewers are quite busy. Could you please help merge this PR? Thank you!

@Isotr0py Isotr0py enabled auto-merge (squash) September 30, 2025 12:54
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025
@Isotr0py Isotr0py merged commit f4db5e6 into vllm-project:main Sep 30, 2025
50 checks passed
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
…25354)

Signed-off-by: anion <1005128408@qq.com>
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: anion <1005128408@qq.com>
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants