[Fix] unwantted bias in InternLM Model #740

wangruohui · 2023-08-11T11:35:54Z

This PR modifies a possible mistake.

The original InternLM implementation does not add bias to MLP layers, as shown https://huggingface.co/internlm/internlm-chat-7b-8k/blob/main/modeling_internlm.py#L152

Moreover, imports are sorted using isort.

zhuohan123

LGTM! Thank you for your contribution!

…ct#740) This is a hotfix for recompilation issues caused by upstream change 0f8cafe. The unified_attention uses layer_name to retrieve the layer from the ForwardContext, which causes dynamo to recompile each Attention layer. HOTFIX vllm-project#12410 Co-authored-by: Bartosz Kowalski <bkowalski@habana.ai>

…lm-project#740)" This reverts commit 5424a93.

### What this PR does / why we need it? moe support for llama4 and mllama4 in vllm-ascend ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? start sever: python -m vllm.entrypoints.openai.api_server --model /data/nfs/benchmark/tokenizer/Llama-4-Scout-17B-16E-Instruct \ --max-num-seqs=256 \ --max-model-len=8192 \ --tensor-parallel-size=8 \ --block-size=128 \ --dtype bfloat16 \ --host=0.0.0.0 \ --port=8000 \ --gpu-memory-utilization=0.9 \ --trust-remote-code client: python online_server.py --model-path /data/nfs/benchmark/tokenizer/Llama-4-Scout-17B-16E-Instruct --image-path /data/nfs/w60040464/cherry_blossom.jpg --docker-ip 7.242.108.253 --served-port 8000 --text "what is the content of this image?" result: {'id': 'chatcmpl-2b709a5d2e1a4017991ec4ba8248686a', 'object': 'chat.completion', 'created': 1747056823, 'model': '/data/nfs/benchmark/tokenizer/Llama-4-Scout-17B-16E-Instruct', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'reasoning_content': None, 'content': 'The image depicts a tower, likely Tokyo Skytree, framed by branches of a cherry blossom tree. The tower is white and has a distinctive shape, with a large sphere at the top and a long, thin spire extending from it. The branches of the cherry blossom tree are in the foreground, with pink flowers blooming on them. The background is a clear blue sky.\n\n**Key Features:**\n\n* **Tower:** White, spherical shape at the top, long thin spire\n', 'tool_calls': []}, 'logprobs': None, 'finish_reason': 'length', 'stop_reason': None}], 'usage': {'prompt_tokens': 2340, 'total_tokens': 2440, 'completion_tokens': 100, 'prompt_tokens_details': None}, 'prompt_logprobs': None} Signed-off-by: chenxu <chenxu68@huawei.com> Co-authored-by: chenxu <chenxu68@huawei.com> Co-authored-by: evian <eviantai@u.nus.edu>

wangruohui added 8 commits August 11, 2023 18:18

add output to throughput benchmark

ed6236e

add support for internlm

377783b

recover benchmark

290b29e

remove dynamic imports

0b28ba4

recover imports

b46c724

remove a comment

c7bf18c

Update model_loader.py

28a838e

yapf format

73b1f35

zhuohan123 approved these changes Aug 11, 2023

View reviewed changes

zhuohan123 merged commit 462ae52 into vllm-project:main Aug 11, 2023

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

[Fix] unwantted bias in InternLM Model (vllm-project#740)

b797646

yma11 pushed a commit to yma11/vllm that referenced this pull request Jan 31, 2025

Revert "Hotfix recompilations caused by unified attention for hpu (vl…

a7fcd43

…lm-project#740)" This reverts commit 5424a93.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Fix] unwantted bias in InternLM Model #740

[Fix] unwantted bias in InternLM Model #740

Uh oh!

wangruohui commented Aug 11, 2023

Uh oh!

zhuohan123 left a comment

Uh oh!

Uh oh!

Uh oh!

[Fix] unwantted bias in InternLM Model #740

[Fix] unwantted bias in InternLM Model #740

Uh oh!

Conversation

wangruohui commented Aug 11, 2023

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!