Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete Response from 4bit Version of PhoGPT #27

Closed
DavidMediaX opened this issue Apr 2, 2024 · 1 comment
Closed

Incomplete Response from 4bit Version of PhoGPT #27

DavidMediaX opened this issue Apr 2, 2024 · 1 comment

Comments

@DavidMediaX
Copy link

Hello, I made some testing on 4bit and 8bit version of PhoGPT. I got issue with 4bit version detail is below:

Environment:
PhoGPT Version: 4bit
Execution Environment: Google Colab with T4 GPU

Issue Description:
When using the 4bit version of PhoGPT with the provided initialization code from the documentation, the model returns an incomplete response. Specifically, it only returns a newline character \n, in contrast to the 8bit version, which functions correctly and returns a comprehensive output.

Steps to Reproduce:
Initialize the 4bit PhoGPT model using the sample code from the official documentation.
Use instruction = "Viết bài văn nghị luận xã hội về an toàn giao thông"
Observe that the response is only a newline character, indicating an incomplete or failed generation.

Expected Behavior:
The 4bit version of PhoGPT should return a complete and coherent response similar to the 8bit version, which returns detailed and lengthy outputs.

Actual Behavior:
The 4bit version outputs only a newline character \n, indicating an error or issue in processing the input prompt.

8bit
Screenshot 2024-04-02 at 22 08 56
4bit
Screenshot 2024-04-02 at 22 07 12

@datquocnguyen
Copy link
Member

It might be because of the change in the recent Transformers library.
Can you try the example from: https://huggingface.co/docs/transformers/main/en/quantization#4-bit with PhoGPT?

We recently released 4- and 8-bit variants of PhoGPT with llama.cpp. You might want to try that too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants