Question: How to choose a dataset for quantizing with AQLM a model like Mistral 7b-Instruct v0.2 #60

remiconnesson · 2024-03-28T14:19:19Z

I'm curious about quantizing a 7b model like Mistral Instruct v2, from what I understand an important point would be the choice of the dataset. What would be a good dataset for quantizing with AQLM?

Is there any other important point to succeed in creating a good quality quantization of a model with AQLM?

Vahe1994 · 2024-03-29T10:15:15Z

Hello!

It is recommended to calibrate on the same data that the model was trained/fine-tuned. In the case of Mistral Instruct v2, if I'm not mistaken, this information is omitted. Because of that, for Instruct type models, we used https://huggingface.co/datasets/mosaicml/dolly_hhrlhf. But you may find something better.

As for succeeding in creating a good quantized model with AQLM, here are some recommendations:

More data often leads to better results. It is recommended to use 1k+ number of samples with 4k+ context length for Mistal/Mixtral (we used 1k number of samples with an 8k context length for Mistral v0.1, for Llama2 we used 2k number of samples with a 4k context length). After 1k data samples, the performance boost is negligible please see ablation study on this in the AQLM paper.
After calibration, to enhance results, consider doing global fine-tuning for a few epochs (one epoch is often sufficient). See fine-tune code for more details https://github.com/Vahe1994/AQLM/blob/main/finetune.py .
Define what is more import for you, the quality or speed. If quality use 1x16 with group size 8, if speed use 2x8 with group size 8 (for avg 2 bits setup). AQLM quantized Llama-2-7b, the 1x16 configuration with a group size of 8 gives a WikiText2 perplexity (PPL) of 5.92, while the 2x8 configuration with the same group size has a WikiText2 PPL of 6.69. For more details on inference speed for both 1x16 and 2x8 configurations, see https://github.com/Vahe1994/AQLM?tab=readme-ov-file#inference-kernels.
Don't forget to convert final quantized model it to HF format before inference https://github.com/Vahe1994/AQLM/blob/main/convert_to_hf.py
Tweaking parameters like learning rate (lr), relative_mse_tolerance, and finetune_lr may led slightly better results.
To check the quality of the quantized model, look not only on perplexity , but also on lm_eval harness results.

Example:

For the Mistral-v1 model, we used this set of parameters to calibrate the model:

python main.py \
    $MODEL_PATH \
    $DATASET_PATH \
    --nsamples=1024 \
    --val_size=128 \
    --model_seqlen=8192 \
    --num_codebooks=1 \
    --nbits_per_codebook=16 \
    --in_group_size=8 \
    --out_group_size=1 \
    --relative_mse_tolerance=0.01 \
    --finetune_lr=1e-4 \
    --finetune_adam_beta1=0.90 \
    --finetune_adam_beta2=0.999 \
    --finetune_keep_best \
    --finetune_batch_size=8 \
    --finetune_max_epochs=10 \
    --finetune_early_stop=3 \
    --local_batch_size=1 \
    --offload_activations \
    --save $DATA_PATH \
    --wandb

And got around 5.78 ppl on wikiText2.

We then perform global fine-tuning on quantized model with the script below:

python finetune.py \
  --base_model $MODEL_PATH \
  --quant_model $INPUT_PATH \
  --dataset $DATASET_PATH \
  --model_seqlen=8192 \
  --eval_datasets wikitext2 \
  --nsamples=1024 \
  --val_size=128 \
  --lr=1e-5 \
  --adam_beta1=0.90 \
  --adam_beta2=0.999 \
  --epochs=1 \
  --early_stop=3 \
  --batch_size=8 \
  --microbatch_size=1 \
  --temperature=1.0 \
  --save $DATA_PATH \
  --gradient_checkpointing \
  --amp \
  --wandb

After one epoch of global fine tuning we got 5.40 ppl on WikiText2.

Hope this helps, if you have further questions please don't hesitate to ask.

remiconnesson · 2024-03-29T10:36:12Z

This very helpful thank you!

Hope this helps, if you have further questions please don't hesitate to ask.

No more questions for now, (I think the budget and how many gpu/time are needed can be derived from other discussions in the repo :))

BlackSamorez · 2024-03-29T10:40:31Z

@remiconnesson I just wanted to mention that, it appears, somebody already did a quantization of Mistral-7B-Instruct-v0.2:
https://huggingface.co/alpindale/Mistral-7B-Instruct-v0.2-AQLM-2Bit-1x16
I couldn't find any evaluation results for it, but it might be what you're looking for.

Vahe1994 · 2024-03-29T10:48:49Z

@remiconnesson I just wanted to mention that, it appears, somebody already did a quantization of Mistral-7B-Instruct-v0.2: https://huggingface.co/alpindale/Mistral-7B-Instruct-v0.2-AQLM-2Bit-1x16 I couldn't find any evaluation results for it, but it might be what you're looking for.

Also it is not clear if they perform global fine-tuning after quantization at the end or not.

remiconnesson · 2024-03-29T12:16:26Z

@remiconnesson I just wanted to mention that, it appears, somebody already did a quantization of Mistral-7B-Instruct-v0.2:
https://huggingface.co/alpindale/Mistral-7B-Instruct-v0.2-AQLM-2Bit-1x16
I couldn't find any evaluation results for it, but it might be what you're looking for.

Thanks! How could evaluate the quality of the quantization myself? Should I use the same dataset than you used in the AQLM paper?

Vahe1994 · 2024-03-30T14:41:43Z

@remiconnesson I just wanted to mention that, it appears, somebody already did a quantization of Mistral-7B-Instruct-v0.2:
https://huggingface.co/alpindale/Mistral-7B-Instruct-v0.2-AQLM-2Bit-1x16
I couldn't find any evaluation results for it, but it might be what you're looking for.

Thanks! How could evaluate the quality of the quantization myself? Should I use the same dataset than you used in the AQLM paper?

Yes. For PPL, it is recommended to use slice of WikiText2 and c4 datasets. Please see this link for the code to load and calculate PPL after quantization.

For zero-shot evaluations, we utilized LM Eval Harness, specifically we used 2023 spring commit (to fix version) available at this location. Instructions on how to use it can be found here. There, you should provide the path to the quantized model (before HF format conversion).
P.s. With some modifications, newer versions of LM Eval Harness also can be used to calculate zero shot/few shot evals.

remiconnesson · 2024-03-31T18:02:28Z

@remiconnesson I just wanted to mention that, it appears, somebody already did a quantization of Mistral-7B-Instruct-v0.2:
https://huggingface.co/alpindale/Mistral-7B-Instruct-v0.2-AQLM-2Bit-1x16
I couldn't find any evaluation results for it, but it might be what you're looking for.

Thanks! How could evaluate the quality of the quantization myself? Should I use the same dataset than you used in the AQLM paper?

Yes. For PPL, it is recommended to use slice of WikiText2 and c4 datasets. Please see this link for the code to load and calculate PPL after quantization.

For zero-shot evaluations, we utilized LM Eval Harness, specifically we used 2023 spring commit (to fix version) available at this location. Instructions on how to use it can be found here. There, you should provide the path to the quantized model (before HF format conversion). P.s. With some modifications, newer versions of LM Eval Harness also can be used to calculate zero shot/few shot evals.

Thank you this is very useful! I'm going to try this out :)

github-actions · 2024-05-01T01:47:26Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2024-05-15T01:47:59Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label May 1, 2024

github-actions bot closed this as completed May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How to choose a dataset for quantizing with AQLM a model like Mistral 7b-Instruct v0.2 #60

Question: How to choose a dataset for quantizing with AQLM a model like Mistral 7b-Instruct v0.2 #60

remiconnesson commented Mar 28, 2024

Vahe1994 commented Mar 29, 2024

remiconnesson commented Mar 29, 2024

BlackSamorez commented Mar 29, 2024

Vahe1994 commented Mar 29, 2024

remiconnesson commented Mar 29, 2024

Vahe1994 commented Mar 30, 2024

remiconnesson commented Mar 31, 2024

github-actions bot commented May 1, 2024

github-actions bot commented May 15, 2024

Question: How to choose a dataset for quantizing with AQLM a model like Mistral 7b-Instruct v0.2 #60

Question: How to choose a dataset for quantizing with AQLM a model like Mistral 7b-Instruct v0.2 #60

Comments

remiconnesson commented Mar 28, 2024

Vahe1994 commented Mar 29, 2024

remiconnesson commented Mar 29, 2024

BlackSamorez commented Mar 29, 2024

Vahe1994 commented Mar 29, 2024

remiconnesson commented Mar 29, 2024

Vahe1994 commented Mar 30, 2024

remiconnesson commented Mar 31, 2024

github-actions bot commented May 1, 2024

github-actions bot commented May 15, 2024