Add AWQ support for all models #1714

WoosukKwon · 2023-11-18T20:24:01Z

This PR adds AWQ support for all models, by adding ScaledActivation.

Tested:

I didn't test AWQ for GPT2, GPT-NeoX, and Phi models, since I couldn't find their quantized weights in HF model hub.

casper-hansen · 2023-11-18T21:44:47Z

FYI, Phi is not yet supported in AutoAWQ as there are some blockades associated with their architecture that have yet to be resolved. GPT-2 is also not supported yet, although I can add it if the community wants it.

WoosukKwon · 2023-11-18T23:34:40Z

Hi @casper-hansen, thanks for letting us know! Could you elaborate more on this?

Phi is not yet supported in AutoAWQ as there are some blockades associated with their architecture

I thought the Phi's architecture is not very different from other GPT models.

zhuohan123

LGTM! Thanks for the quick fix!

vllm/model_executor/models/opt.py

lonngxiang · 2023-11-19T08:36:54Z

can awq model use vllm.entrypoints.openai.api_server？

i test can run， but request cannot sucess

import openai
import json


# Modify OpenAI's API key and API base to use vLLM's API server.
openai.api_key = "EMPTY"

model = "AquilaChat2-7B"

openai.api_base = "http://localhost:10860/v1"

def test_chat_completion():
    message=[{"role": "user","content":"介绍下广州一日游?"}]
    res = openai.ChatCompletion.create(model =model, messages=message)
    # print(type(res),res)

    content= res["choices"][0]["message"]["content"]
    
    print(content)

error:

WoosukKwon added 9 commits November 18, 2023 19:51

Add get_scaled_act_names

cd18b3f

Add ScaledActivation

a1c7dad

Fix MPT

ca84f29

Fix BLOOM

d27bd19

Fix Falcon

9ca8eda

Fix GPTBigCode

71b4114

Fix GPT-J

6429a5d

Fix OPT

1eeef8b

Add GPT2, GPT-NeoX, Phi

e45a93e

WoosukKwon requested a review from zhuohan123 November 18, 2023 20:24

WoosukKwon mentioned this pull request Nov 18, 2023

Quantization is not supported for <class 'vllm.model_executor.models.bloom.BloomForCausalLM'> #1711

Closed

This was linked to issues Nov 18, 2023

Quantization is not supported for <class 'vllm.model_executor.models.bloom.BloomForCausalLM'> #1711

Closed

bug of opt awq model #1703

Closed

zhuohan123 approved these changes Nov 19, 2023

View reviewed changes

zhuohan123 reviewed Nov 19, 2023

View reviewed changes

vllm/model_executor/models/opt.py Show resolved Hide resolved

WoosukKwon merged commit 8d17774 into main Nov 19, 2023

WoosukKwon deleted the awq-rescale branch November 19, 2023 01:56

lonngxiang mentioned this pull request Nov 20, 2023

ValueError: Quantization is not supported for <class 'vllm.model_executor.models.aquila.AquilaForCausalLM'>. #1538

Closed

simon-mo mentioned this pull request Dec 2, 2023

[Docs] Update the AWQ documentation to highlight performance issue #1883

Merged

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add AWQ support for all models (vllm-project#1714)

94531dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add AWQ support for all models #1714

Add AWQ support for all models #1714

Uh oh!

WoosukKwon commented Nov 18, 2023 •

edited

Loading

Uh oh!

casper-hansen commented Nov 18, 2023

Uh oh!

WoosukKwon commented Nov 18, 2023 •

edited

Loading

Uh oh!

zhuohan123 left a comment

Uh oh!

Uh oh!

lonngxiang commented Nov 19, 2023

Uh oh!

Uh oh!

Uh oh!

Add AWQ support for all models #1714

Add AWQ support for all models #1714

Uh oh!

Conversation

WoosukKwon commented Nov 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casper-hansen commented Nov 18, 2023

Uh oh!

WoosukKwon commented Nov 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lonngxiang commented Nov 19, 2023

Uh oh!

Uh oh!

WoosukKwon commented Nov 18, 2023 •

edited

Loading

WoosukKwon commented Nov 18, 2023 •

edited

Loading