You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. I am having a compilation error when trying AQLM models. Inference (and therefor training) of quantized models fails. This is observed with LLama2 7B, Llama3 8B and Mistral-7B.
fromtransformersimportAutoModelForSequenceClassification, AutoTokenizer, TrainerfrompeftimportLoraConfigmodel_id="ISTA-DASLab/Mistral-7B-v0.1-AQLM-PV-2Bit-1x16-hf"quantized_model=AutoModelForSequenceClassification.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype="auto",
device_map=device
)
lora_config=LoraConfig(
target_modules=["q_proj", "k_proj"],
init_lora_weights=False
)
quantized_model.add_adapter(lora_config, adapter_name="laura")
tokenizer=AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
input=tokenizer(["There is a problem somewhere"])
quantized_model(
input_ids=torch.tensor(input['input_ids']).to(device),
attention_mask=torch.tensor(input['attention_mask']).to(device)
)
I already saw this one, that is why I was already using gcc-10. But apparently you need to specify to nvcc which compiler to use indepedently of system settings. The issue was only fixed by adding this to the .bashrc :
Hello. I am having a compilation error when trying AQLM models. Inference (and therefor training) of quantized models fails. This is observed with LLama2 7B, Llama3 8B and Mistral-7B.
The compilation of the CUDA extension fails.
aqlm
version : 1.1.6torch
: 2.3.1+cu121nvcc version : 12.1
gcc version : 10.5.0
GPU : RTX 3070
The text was updated successfully, but these errors were encountered: