Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调在WhisperProcessor.from_pretrained调用时就报错 #42

Closed
lichq5 opened this issue Dec 7, 2023 · 5 comments
Closed

微调在WhisperProcessor.from_pretrained调用时就报错 #42

lichq5 opened this issue Dec 7, 2023 · 5 comments

Comments

@lichq5
Copy link

lichq5 commented Dec 7, 2023

我使用单卡训练,一启动就报错:
Traceback (most recent call last):
File "/workspace/Whisper-Finetune-master/finetune.py", line 47, in
processor = WhisperProcessor.from_pretrained(args.base_model,
File "/opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py", line 228, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py", line 272, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
return cls._from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
init_kwargs[key] = added_tokens_map.get(init_kwargs[key], init_kwargs[key])
TypeError: unhashable type: 'dict'
这个是怎么回事,是哪里搞错了吗?

@yeyupiaoling
Copy link
Owner

这有可能是你下载的模型文件不完整。或者是错的。

@lichq5
Copy link
Author

lichq5 commented Dec 14, 2023

我把openai/whisper-small/的[flax_model.msgpack][model.safetensors][pytorch_model.bin][tf_model.h5]四个模型都下载下来了,都不行,这是为什么。没有md5也没法校验是否不一致,但下载过程都没有报错

@yeyupiaoling
Copy link
Owner

@lichq5 不止这几个文件,还有很多文件的

@lichq5
Copy link
Author

lichq5 commented Dec 26, 2023

我现在在训练的时候会报这个错:
raise ValueError(
"Asking to pad but the tokenizer does not have a padding token. "
"Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) "
"or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'})."
)
如果我手动修改源码,加上self.pad_token="[PAD]"这个代码,会影响训练效果吗

@yeyupiaoling
Copy link
Owner

这样应该是不行的。 你还是要下载完整的文件去读取里面的token

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants