Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when tokenizer is set to string: AttributeError: 'str' object has no attribute 'pad_token_id' #36731

Open
TrevinAvery opened this issue Mar 14, 2025 · 2 comments

Comments

@TrevinAvery
Copy link

I'm using the SageMaker HuggingFace inference toolkit, with the image-text-to-text task. I am getting the following error from this line in transformers/pipelines/base.py:

AttributeError: 'str' object has no attribute 'pad_token_id'

This toolkit passes a string for the tokenizer parameter to the pipeline function. The passed string matches model_dir (as seen here).

I expect the string to either 1) be used to load a valid tokenizer object, 2) be dropped because it is unused, or 3) throw an error because it is invalid. However, instead, it just passes the string forward to pipeline_class (in this case ImageTextToText), which does not accept a string for the tokenizer parameter.

Suggested fix:

When evaluating load_tokenizer, if it is false, it should either set tokenizer to None or raise an exception.

@Rocketknight1
Copy link
Member

Is this happening with Gemma3?

@zucchini-nlp
Copy link
Member

It should have been fixed by recent patch on the release branch, we had a bug where the tokenizer wasn't added to model-mapping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants