Skip to content

Conversation

HermitSun
Copy link
Contributor

@HermitSun HermitSun commented Aug 4, 2023

As issue #675 mentioned, GPT-BigCode-based models will produce gibberish outputs.

This is caused by a minor mistake during calculating the number of attention heads. If we are not using a new decoder arch Falcon model and turns on the multi_query option, we should return 1 rather than the default total_num_attention_heads // parallel_config.tensor_parallel_size.

After applying this patch, I believe at least the following models will work normally (tested on A100 with CUDA 11.8):

  • WizardLM/WizardCoder-15B-V1.0
  • openchat/opencoderplus
  • bigcode/starcoder

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this bug! LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants