Incomplete Response from 4bit Version of PhoGPT #27

DavidMediaX · 2024-04-02T15:09:45Z

Hello, I made some testing on 4bit and 8bit version of PhoGPT. I got issue with 4bit version detail is below:

Environment:
PhoGPT Version: 4bit
Execution Environment: Google Colab with T4 GPU

Issue Description:
When using the 4bit version of PhoGPT with the provided initialization code from the documentation, the model returns an incomplete response. Specifically, it only returns a newline character \n, in contrast to the 8bit version, which functions correctly and returns a comprehensive output.

Steps to Reproduce:
Initialize the 4bit PhoGPT model using the sample code from the official documentation.
Use instruction = "Viết bài văn nghị luận xã hội về an toàn giao thông"
Observe that the response is only a newline character, indicating an incomplete or failed generation.

Expected Behavior:
The 4bit version of PhoGPT should return a complete and coherent response similar to the 8bit version, which returns detailed and lengthy outputs.

Actual Behavior:
The 4bit version outputs only a newline character \n, indicating an error or issue in processing the input prompt.

8bit

4bit

datquocnguyen · 2024-04-03T07:52:02Z

It might be because of the change in the recent Transformers library.
Can you try the example from: https://huggingface.co/docs/transformers/main/en/quantization#4-bit with PhoGPT?

We recently released 4- and 8-bit variants of PhoGPT with llama.cpp. You might want to try that too.

datquocnguyen closed this as completed Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete Response from 4bit Version of PhoGPT #27

Incomplete Response from 4bit Version of PhoGPT #27

DavidMediaX commented Apr 2, 2024

datquocnguyen commented Apr 3, 2024

Incomplete Response from 4bit Version of PhoGPT #27

Incomplete Response from 4bit Version of PhoGPT #27

Comments

DavidMediaX commented Apr 2, 2024

datquocnguyen commented Apr 3, 2024