This notebook shows how to quantize models with GPTQ Transformers. It also shows how to fine-tune GPTQ models with TRL. It compares fine-tuning for 4-bit, 3-bit, 2-bit GPTQ, and bitsandbytes nf4 models.
More details in this article: [Quantize and Fine-tune LLMs with GPTQ Using Transformers and TRL](https://kaitchup.substack.com/p/quantize-and-fine-tune-llms-with)


The quantization cells only run with the A100 GPU (you need Google Colab PRO) since it consumes more than 30 GB of VRAM. The fine-tuning cells can run on the free instance of Google Colab with a T4 GPU.

First, you will need to install these libraries:

In [None]:
!pip install transformers optimum accelerate peft trl auto-gptq bitsandbytes

Collecting transformers
  Downloading transformers-4.33.0-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m27.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting optimum
  Downloading optimum-1.12.0-py3-none-any.whl (380 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m380.6/380.6 kB[0m [31m27.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.22.0-py3-none-any.whl (251 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.2/251.2 kB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting peft
  Downloading peft-0.5.0-py3-none-any.whl (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.6/85.6 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trl
  Downloading trl-0.7.1-py3-none-any.whl (117 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m118.0/118.0 kB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
[?2

This time, I use notebook_login instead of passing the access token to get the Llama 2 model and tokenizer. I think this is less convenient, but since the access token will be deprecated with HF Transformers V5 I think it's time to stop using it.

With notebook_login(), running the next cell, you have to enter your access token. You will have to do it every time you restart this notebook.

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

**Note: Restart your runtime after running each one of the following cells. If you don't, you will certainly get out-of-memory errors.**

This code quantizes Llama 2 7B with a 4-bit precision using GPTQ:

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig

model_id = "meta-llama/Llama-2-7b-hf"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
quantization_config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer)

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=quantization_config)


This code quantizes Llama 2 7B with a 3-bit precision using GPTQ:

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig

model_id = "meta-llama/Llama-2-7b-hf"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
quantization_config = GPTQConfig(bits=3, dataset = "c4", tokenizer=tokenizer)

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=quantization_config)

This code quantizes Llama 2 7B with a 2-bit precision using GPTQ:

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig

model_id = "meta-llama/Llama-2-7b-hf"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
quantization_config = GPTQConfig(bits=2, dataset = "c4", tokenizer=tokenizer)

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=quantization_config)

This code fine-tunes Llama 2 7B quantized with a 4-bit precision.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig, TrainingArguments
from trl import SFTTrainer
from peft import prepare_model_for_kbit_training, LoraConfig
from datasets import load_dataset

model_id = "kaitchup/Llama-2-7b-gptq-4bit"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

quantization_config_loading = GPTQConfig(bits=4, disable_exllama=True)
model = AutoModelForCausalLM.from_pretrained(model_id,quantization_config=quantization_config_loading, device_map="auto")
model = prepare_model_for_kbit_training(model)

peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.05,
        r=16,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules= ["gate_proj", "down_proj", "up_proj"]
)

data = load_dataset("timdettmers/openassistant-guanaco")

training_arguments = TrainingArguments(
        output_dir="./results",
        evaluation_strategy="steps",
        do_eval=True,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=1,
        per_device_eval_batch_size=4,
        log_level="debug",
        optim="adamw_hf",
        save_steps=20,
        logging_steps=20,
        learning_rate=1e-4,
        eval_steps=20,
        fp16=True,
        max_grad_norm=0.3,
        max_steps=100,
        warmup_ratio=0.03,
        lr_scheduler_type="constant",
)
trainer = SFTTrainer(
        model=model,
        train_dataset=data['train'],
        eval_dataset=data['test'],
        peft_config=peft_config,
        dataset_text_field="text",
        max_seq_length=256,
        tokenizer=tokenizer,
        args=training_arguments,
)

model.config.use_cache = False
trainer.train()

You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute and has already quantized weights. However, loading attributes (e.g. disable_exllama, use_cuda_fp16) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
Repo card metadata block was not found. Setting CardData to empty.


Map:   0%|          | 0/9846 [00:00<?, ? examples/s]

Map:   0%|          | 0/518 [00:00<?, ? examples/s]

The model is quantized. To train this model you need to add additional modules inside the model such as adapters using `peft` library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details.
max_steps is given, it will override any value given in num_train_epochs
Currently training with a batch size of: 4
***** Running training *****
  Num examples = 9,846
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 1
  Total optimization steps = 100
  Number of trainable parameters = 23,199,744
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
20,1.601,1.475716
40,1.3859,1.419421
60,1.2771,1.404179
80,1.3274,1.39274
100,1.2522,1.390208


***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-20
tokenizer config file saved in ./results/checkpoint-20/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-20/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-40
tokenizer config file saved in ./results/checkpoint-40/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-40/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-60
tokenizer config file saved in ./results/checkpoint-60/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-60/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-80
tokenizer config file saved in ./results/checkpoint-80/tok

TrainOutput(global_step=100, training_loss=1.3687346076965332, metrics={'train_runtime': 1376.4918, 'train_samples_per_second': 0.291, 'train_steps_per_second': 0.073, 'total_flos': 94948137369600.0, 'train_loss': 1.3687346076965332, 'epoch': 0.04})

This code fine-tunes Llama 2 7B quantized with bitsandbytes NF4.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
from trl import SFTTrainer
from peft import prepare_model_for_kbit_training, LoraConfig
from datasets import load_dataset
import torch

model_id = "meta-llama/Llama-2-7b-hf"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
tokenizer.pad_token = tokenizer.unk_token
tokenizer.padding_side = 'right'

compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
          model_id, quantization_config=bnb_config, device_map={"": 0}
)
model = prepare_model_for_kbit_training(model)


peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.05,
        r=16,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules= ["gate_proj", "down_proj", "up_proj"]
)

data = load_dataset("timdettmers/openassistant-guanaco")

training_arguments = TrainingArguments(
        output_dir="./results",
        evaluation_strategy="steps",
        do_eval=True,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=1,
        per_device_eval_batch_size=4,
        log_level="debug",
        optim="adamw_hf",
        save_steps=20,
        logging_steps=20,
        learning_rate=1e-4,
        eval_steps=20,
        fp16=True,
        max_grad_norm=0.3,
        max_steps=100,
        warmup_ratio=0.03,
        lr_scheduler_type="constant",
)
trainer = SFTTrainer(
        model=model,
        train_dataset=data['train'],
        eval_dataset=data['test'],
        peft_config=peft_config,
        dataset_text_field="text",
        max_seq_length=256,
        tokenizer=tokenizer,
        args=training_arguments,
)

model.config.use_cache = False
trainer.train()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Repo card metadata block was not found. Setting CardData to empty.


Map:   0%|          | 0/9846 [00:00<?, ? examples/s]

Map:   0%|          | 0/518 [00:00<?, ? examples/s]

The model is quantized. To train this model you need to add additional modules inside the model such as adapters using `peft` library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details.
max_steps is given, it will override any value given in num_train_epochs
Currently training with a batch size of: 4
***** Running training *****
  Num examples = 9,846
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 1
  Total optimization steps = 100
  Number of trainable parameters = 23,199,744
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
20,1.619,1.46335
40,1.3775,1.416547
60,1.2799,1.397291
80,1.3192,1.385957
100,1.249,1.380801


***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-20
tokenizer config file saved in ./results/checkpoint-20/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-20/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-40
tokenizer config file saved in ./results/checkpoint-40/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-40/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-60
tokenizer config file saved in ./results/checkpoint-60/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-60/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-80
tokenizer config file saved in ./results/checkpoint-80/tok

TrainOutput(global_step=100, training_loss=1.3689344787597657, metrics={'train_runtime': 1298.8265, 'train_samples_per_second': 0.308, 'train_steps_per_second': 0.077, 'total_flos': 2084376988876800.0, 'train_loss': 1.3689344787597657, 'epoch': 0.04})

This code fine-tunes Llama 2 7B quantized with a 3-bit precision.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig, TrainingArguments
from trl import SFTTrainer
from peft import prepare_model_for_kbit_training, LoraConfig
from datasets import load_dataset

model_id = "kaitchup/Llama-2-7b-gptq-3bit"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

quantization_config_loading = GPTQConfig(bits=3, disable_exllama=True)
model = AutoModelForCausalLM.from_pretrained(model_id,quantization_config=quantization_config_loading, device_map="auto")
model = prepare_model_for_kbit_training(model)

peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.05,
        r=16,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules= ["gate_proj", "down_proj", "up_proj"]
)

data = load_dataset("timdettmers/openassistant-guanaco")

training_arguments = TrainingArguments(
        output_dir="./results",
        evaluation_strategy="steps",
        do_eval=True,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=1,
        per_device_eval_batch_size=4,
        log_level="debug",
        optim="adamw_hf",
        save_steps=20,
        logging_steps=20,
        learning_rate=1e-4,
        eval_steps=20,
        fp16=True,
        max_grad_norm=0.3,
        max_steps=100,
        warmup_ratio=0.03,
        lr_scheduler_type="constant",
)
trainer = SFTTrainer(
        model=model,
        train_dataset=data['train'],
        eval_dataset=data['test'],
        peft_config=peft_config,
        dataset_text_field="text",
        max_seq_length=256,
        tokenizer=tokenizer,
        args=training_arguments,
)

model.config.use_cache = False
trainer.train()

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.09k [00:00<?, ?B/s]

You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute and has already quantized weights. However, loading attributes (e.g. disable_exllama, use_cuda_fp16) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.


Downloading pytorch_model.bin:   0%|          | 0.00/3.08G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/183 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


Map:   0%|          | 0/9846 [00:00<?, ? examples/s]

Map:   0%|          | 0/518 [00:00<?, ? examples/s]

The model is quantized. To train this model you need to add additional modules inside the model such as adapters using `peft` library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details.
max_steps is given, it will override any value given in num_train_epochs
Currently training with a batch size of: 4
***** Running training *****
  Num examples = 9,846
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 1
  Total optimization steps = 100
  Number of trainable parameters = 23,199,744
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
20,1.8439,1.637223
40,1.5408,1.556992
60,1.4126,1.534225
80,1.4523,1.514838
100,1.3683,1.507567


***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results3/checkpoint-20
tokenizer config file saved in ./results3/checkpoint-20/tokenizer_config.json
Special tokens file saved in ./results3/checkpoint-20/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results3/checkpoint-40
tokenizer config file saved in ./results3/checkpoint-40/tokenizer_config.json
Special tokens file saved in ./results3/checkpoint-40/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results3/checkpoint-60
tokenizer config file saved in ./results3/checkpoint-60/tokenizer_config.json
Special tokens file saved in ./results3/checkpoint-60/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results3/checkpoint-80
tokenizer config file saved in ./results3/checkp

TrainOutput(global_step=100, training_loss=1.5235796928405763, metrics={'train_runtime': 1891.959, 'train_samples_per_second': 0.211, 'train_steps_per_second': 0.053, 'total_flos': 94948137369600.0, 'train_loss': 1.5235796928405763, 'epoch': 0.04})

This code fine-tunes Llama 2 7B quantized with a 2-bit precision.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig, TrainingArguments
from trl import SFTTrainer
from peft import prepare_model_for_kbit_training, LoraConfig
from datasets import load_dataset

model_id = "kaitchup/Llama-2-7b-gptq-2bit"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

quantization_config_loading = GPTQConfig(bits=2, disable_exllama=True)
model = AutoModelForCausalLM.from_pretrained(model_id,quantization_config=quantization_config_loading, device_map="auto")
model = prepare_model_for_kbit_training(model)

peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.05,
        r=16,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules= ["gate_proj", "down_proj", "up_proj"]
)

data = load_dataset("timdettmers/openassistant-guanaco")

training_arguments = TrainingArguments(
        output_dir="./results",
        evaluation_strategy="steps",
        do_eval=True,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=1,
        per_device_eval_batch_size=4,
        log_level="debug",
        optim="adamw_hf",
        save_steps=20,
        logging_steps=20,
        learning_rate=1e-4,
        eval_steps=20,
        fp16=True,
        max_grad_norm=0.3,
        max_steps=100,
        warmup_ratio=0.03,
        lr_scheduler_type="constant",
)
trainer = SFTTrainer(
        model=model,
        train_dataset=data['train'],
        eval_dataset=data['test'],
        peft_config=peft_config,
        dataset_text_field="text",
        max_seq_length=256,
        tokenizer=tokenizer,
        args=training_arguments,
)

model.config.use_cache = False
trainer.train()

Downloading (…)okenizer_config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.09k [00:00<?, ?B/s]

You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute and has already quantized weights. However, loading attributes (e.g. disable_exllama, use_cuda_fp16) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.


Downloading pytorch_model.bin:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/183 [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/395 [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/20.9M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.11M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/9846 [00:00<?, ? examples/s]

Map:   0%|          | 0/518 [00:00<?, ? examples/s]

The model is quantized. To train this model you need to add additional modules inside the model such as adapters using `peft` library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details.
max_steps is given, it will override any value given in num_train_epochs
Currently training with a batch size of: 4
***** Running training *****
  Num examples = 9,846
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 1
  Total optimization steps = 100
  Number of trainable parameters = 23,199,744
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
20,6.7699,4.813233
40,4.2288,3.437706
60,2.9923,2.919594
80,2.6777,2.682141
100,2.4719,2.581752


***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-20
tokenizer config file saved in ./results/checkpoint-20/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-20/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-40
tokenizer config file saved in ./results/checkpoint-40/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-40/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-60
tokenizer config file saved in ./results/checkpoint-60/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-60/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 518
  Batch size = 4
Saving model checkpoint to ./results/checkpoint-80
tokenizer config file saved in ./results/checkpoint-80/tok

TrainOutput(global_step=100, training_loss=3.8281287002563475, metrics={'train_runtime': 1306.6371, 'train_samples_per_second': 0.306, 'train_steps_per_second': 0.077, 'total_flos': 94948137369600.0, 'train_loss': 3.8281287002563475, 'epoch': 0.04})