Loss not matching #344

ghost · 2024-04-17T02:50:53Z

Hi team,
I tried to do QLora for 30B llama with unsloth. I found that there is no much improvement on speed and memory usage. The detaild are as following.
seq_length=8192
batch size=1
use flash attn=true
gradient_checkpointing=true

With unslosh:

0%|          | 0/52 [00:00<?, ?it/s]
  2%|▏         | 1/52 [00:38<32:45, 38.54s/it]
  4%|▍         | 2/52 [01:15<31:25, 37.71s/it]
  6%|▌         | 3/52 [01:52<30:37, 37.50s/it]
  8%|▊         | 4/52 [02:30<29:54, 37.39s/it]
 10%|▉         | 5/52 [03:07<29:13, 37.31s/it]
{'loss': **4.7581**, 'grad_norm': 3.063769578933716, 'learning_rate': 9.911436253643445e-05, 'epoch': 0.1, 'num_input_tokens_seen': 162198}

without unslosh:

0%|          | 0/52 [00:00<?, ?it/s]
  2%|▏         | 1/52 [00:41<35:08, 41.35s/it]
  4%|▍         | 2/52 [01:21<33:59, 40.79s/it]
  6%|▌         | 3/52 [02:02<33:13, 40.69s/it]
  8%|▊         | 4/52 [02:42<32:30, 40.63s/it]
 10%|▉         | 5/52 [03:23<31:48, 40.60s/it]
{'loss': **0.8759**, 'grad_norm': 0.32929742336273193, 'learning_rate': 9.911436253643445e-05, 'epoch': 0.1, 'num_input_tokens_seen': 162198}

1. The speed has only increased by about 3s, which is very different from the acceleration ratio mentioned in the document.
2. nvidia-smi: with unsloth: 35G, w/o unsloth: 39G. Only 10% less.
3. The value of the loss is abnormal.

here is the code:

model, _ = FastLanguageModel.from_pretrained(
            model_name=model_kwargs['model_id_or_path'],
            max_seq_length=8192,
            dtype=None,
            load_in_4bit=True,
            low_cpu_mem_usage = True,
            device_map ='auto',
            trust_remote_code=True,
            attn_implementation="flash_attention_2",
        )
model = FastLanguageModel.get_peft_model(
            model,
            lora_alpha=model_args.lora_alpha,
            lora_dropout=model_args.lora_dropout,
            r=model_args.lora_r,
            target_modules=model_args.lora_target_modules.split(",")
            use_gradient_checkpointing=True,
            random_state=training_args.seed,
            max_seq_length=8192,
        )
trainer = SFTTrainer(
        model=model,
        ...
    )

Is there some setting I'm missing? Looking forward to your reply.

The text was updated successfully, but these errors were encountered:

shimmyshimmer · 2024-04-17T08:52:41Z

Hey @mxjyst. Do you have a reproducible example for non unsloth? Have you tried our Colab notebooks to confirm?

Also did you do benchmarking with unsloth first then hf in one script since unsloth first patches it.

All our benchmarking code is public for everyone to confirm ie see our HF blog post https://huggingface.co/blog/unsloth-trl in which HF did 3rd party benchmarking. Likewise llama factory and many others have confirmed our benchmarking

See llama factory's research paper: https://twitter.com/danielhanchen/status/1770870732475469926 in which it shows the OSS is the world's fastest by a large margin.

In terms of the loss diverging thats very abnormal. Can you reproduce this vai a Colab notebook?

danielhanchen · 2024-04-17T18:06:00Z

@mxjyst Interesting on the loss not matching - would you be able to provide a reproducible example via Colab?

danielhanchen changed the title ~~No performance improvement with unsloth~~ Loss not matching Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss not matching #344

Loss not matching #344

ghost commented Apr 17, 2024

shimmyshimmer commented Apr 17, 2024

danielhanchen commented Apr 17, 2024

Loss not matching #344

Loss not matching #344

Comments

ghost commented Apr 17, 2024

shimmyshimmer commented Apr 17, 2024

danielhanchen commented Apr 17, 2024