Skip to content

Failed to quantify qwen1.5-110b-chat #430

Answered by turboderp
Yiximail asked this question in Q&A
Discussion options

You must be logged in to vote

Yes, I've been working on this myself. It turns out there are a couple of int overflow bugs while quantizing that had to be addressed (see dev branch).

There's still one I haven't sorted out yet. During quantizing it does a sanity check by multiplying the quantized matrix with an identity matrix using the custom kernels, and for the MLPs in this model one of those identity matrices has a shape of 48k x 48k, which is more than 2^31 elements and guess what? :)

Using the dev branch, bypassing that sanity check (e.g. set diff2 = 0 on line 116 of conversion/quantize.py) and running on a GPU with at least 48 GB of VRAM to accommodate the enormous matrices in this model, you should be able to qu…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@Yiximail
Comment options

@Yiximail
Comment options

@turboderp
Comment options

Answer selected by Yiximail
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants