Error in `run_vits_finetuning` when starting to train #22

khof312 · 2024-03-22T20:12:10Z

I am encountering an error when trying to run accelerate launch run_vits_finetuning.py. I get past the Weights & Biases authentication but then get stuck on training:

03/22/2024 19:57:22 - INFO - __main__ - ***** Running training *****
03/22/2024 19:57:22 - INFO - __main__ -   Num examples = 110
03/22/2024 19:57:22 - INFO - __main__ -   Num Epochs = 200
03/22/2024 19:57:22 - INFO - __main__ -   Instantaneous batch size per device = 16
03/22/2024 19:57:22 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 16
03/22/2024 19:57:22 - INFO - __main__ -   Gradient Accumulation steps = 1
03/22/2024 19:57:22 - INFO - __main__ -   Total optimization steps = 1400
Steps:   0%|                                                                                | 0/1400 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 155, in send_to_device
    return tensor.to(device, non_blocking=non_blocking)
TypeError: BatchEncoding.to() got an unexpected keyword argument 'non_blocking'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/finetune-hf-vits/run_vits_finetuning.py", line 1494, in <module>
    main()
  File "/content/finetune-hf-vits/run_vits_finetuning.py", line 1090, in main
    for step, batch in enumerate(train_dataloader):
  File "/usr/local/lib/python3.10/dist-packages/accelerate/data_loader.py", line 461, in __iter__
    current_batch = send_to_device(current_batch, self.device)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 157, in send_to_device
    return tensor.to(device)
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 789, in to
    self.data = {k: v.to(device=device) for k, v in self.data.items()}
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 789, in <dictcomp>
    self.data = {k: v.to(device=device) for k, v in self.data.items()}
AttributeError: 'NoneType' object has no attribute 'to'

I am relatively new to all of this, so any ideas on where the problem might be would be really helpful. It looks to me like the training data is not loading successfully but I'm trying to figure out if maybe there is an issue in the config? For context:

I am trying to run everything locally so I don't push to the hub.
I don't need the wandb functionality, but the problem persists whether or not I visualize
This problem appears on colab but also locally when running WSL, same error.

I have put a reproducible example in this colab notebook which also includes the configs I'm using. To get started I was just trying to reproduce the Gujarati training example. Any pointers to where I'm going wrong would be greatly appreciated!

The text was updated successfully, but these errors were encountered:

oza75 · 2024-03-30T03:31:23Z

I was able to fix this by using the exact same version of the transformers, datasets and accelerate as mentionned in the requirements.txt file.

pip uninstall transformers datasets accelerate # remove the ones installed when you run pip install -r requirements.txt

pip install transformers==4.35.1 datasets[audio]==2.14.7 accelerate==0.24.1

khof312 · 2024-03-31T05:39:08Z

Thank you SO much, that worked for me as well! I will close this issue but perhaps if you don't mind @ylacombe, I will open a pull request to change requirements.txt to hard code the versions. I can verify at least that transformers==4.38.2 datasets==2.18.0 accelerate==0.28.0 and transformers==4.37.2 datasets==2.18.0 accelerate==0.28.0 combinations were not working for me.

khof312 closed this as completed Mar 31, 2024

khof312 mentioned this issue Mar 31, 2024

Update requirements.txt #25

Open

isolveit-aps mentioned this issue Apr 3, 2024

BatchEncoding.to() non_blocking error #28

Closed

khof312 mentioned this issue Apr 25, 2024

AttributeError: 'NoneType' object has no attribute 'to' when running example #29

Closed

Aml-Hassan-Abd-El-hamid mentioned this issue Sep 3, 2024

Who had this error please #40

Open

imPdhar mentioned this issue Nov 7, 2024

Unable to run due to 'ValueError: Unable to avoid copy while creating an array as requested.' #44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in `run_vits_finetuning` when starting to train #22

Error in `run_vits_finetuning` when starting to train #22

khof312 commented Mar 22, 2024

oza75 commented Mar 30, 2024

khof312 commented Mar 31, 2024 •

edited

Loading

Error in run_vits_finetuning when starting to train #22

Error in run_vits_finetuning when starting to train #22

Comments

khof312 commented Mar 22, 2024

oza75 commented Mar 30, 2024

khof312 commented Mar 31, 2024 • edited Loading

Error in `run_vits_finetuning` when starting to train #22

Error in `run_vits_finetuning` when starting to train #22

khof312 commented Mar 31, 2024 •

edited

Loading