Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traning loss not showing with trainer #36102

Closed
4 tasks
mwnthainarzary opened this issue Feb 8, 2025 · 2 comments
Closed
4 tasks

Traning loss not showing with trainer #36102

mwnthainarzary opened this issue Feb 8, 2025 · 2 comments
Labels

Comments

@mwnthainarzary
Copy link

System Info

Python 3.11.11
transformers 4.48.2

Who can help?

@muellerzr
@SunMarc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Define the TrainingArguments for fine-tuning

training_args = TrainingArguments(
output_dir='/content/drive/MyDrive/Legal_Dataset/BartlargeFineTuned/',
num_train_epochs=10,
per_device_train_batch_size=10,
gradient_accumulation_steps=8,
evaluation_strategy="epoch",
save_total_limit=1,
save_steps=1000,
learning_rate=1e-3,
do_train=True,
do_eval=True,
remove_unused_columns=False,
push_to_hub=False,
report_to='tensorboard',
load_best_model_at_end=False,
lr_scheduler_type="cosine_with_restarts",
warmup_steps=100,
weight_decay=0.01,
logging_dir='/content/drive/MyDrive/Legal_Dataset/BartlargeFineTuned/',
logging_steps=200,

)

Create a data collator for sequence-to-sequence tasks

data_collator = MyDataCollatorForSeq2Seq(
tokenizer=tokenizer,
model=model,
padding=False,
max_length=80,
label_pad_token_id=tokenizer.pad_token_id,
)

Create Trainer

trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=train_dataset,
eval_dataset=validation_dataset,
optimizers=(custom_optimizer, None),
)

trainer.train()

Expected behavior

I trained the model for 10 epochs but in every epochs I saw the validation loss only not training loss. Please help

Image

@neonwatty
Copy link

neonwatty commented Feb 12, 2025

In your TrainingArguments perhaps try

  • examine output in your logging_dir to confirm
  • switching evaluation_strategy to eval_strategy (it looks like evaluation_strategy is depreciated)
  • lowering your logging_steps - at its current value of 200 may too large given the size of your dataset (you might try evaluating every X steps instead of epoch to triangulate the right value)

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants