Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

@WoosukKwon WoosukKwon commented Nov 18, 2023

Fixes #1682

This PR adds AWQ support for all models, by adding ScaledActivation.

  • Tested:
  • MPT
  • Falcon
  • BLOOM
  • GPTBigCode
  • GPT-J
  • OPT (125M)

I didn't test AWQ for GPT2, GPT-NeoX, and Phi models, since I couldn't find their quantized weights in HF model hub.

@casper-hansen
Copy link
Contributor

FYI, Phi is not yet supported in AutoAWQ as there are some blockades associated with their architecture that have yet to be resolved. GPT-2 is also not supported yet, although I can add it if the community wants it.

@WoosukKwon
Copy link
Collaborator Author

WoosukKwon commented Nov 18, 2023

Hi @casper-hansen, thanks for letting us know! Could you elaborate more on this?

Phi is not yet supported in AutoAWQ as there are some blockades associated with their architecture

I thought the Phi's architecture is not very different from other GPT models.

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the quick fix!

@WoosukKwon WoosukKwon merged commit 8d17774 into main Nov 19, 2023
@WoosukKwon WoosukKwon deleted the awq-rescale branch November 19, 2023 01:56
@lonngxiang
Copy link

can awq model use vllm.entrypoints.openai.api_server?

i test can run, but request cannot sucess
image

import openai
import json


# Modify OpenAI's API key and API base to use vLLM's API server.
openai.api_key = "EMPTY"

model = "AquilaChat2-7B"

openai.api_base = "http://localhost:10860/v1"

def test_chat_completion():
    message=[{"role": "user","content":"介绍下广州一日游?"}]
    res = openai.ChatCompletion.create(model =model, messages=message)
    # print(type(res),res)

    content= res["choices"][0]["message"]["content"]
    
    print(content)

error:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants