Newest Unsloth version silently FORCES Qwen2VL tokenizer padding side to right in inference, while training is left

I just migrated my code to my new dev server and noticed a very degraded result for OCR inference for Qwen2VL.
I first suspected a mismatch between my older adapter with the newly uploaded qwen2 4bit quant (which silently got replaced 11 days ago. I adress this in a different issue. This ATLEAST needs warnings in the future!).
But the degraded output stayed even after retraining the model on the same dataset with the new version of unsloth.

After further digging I now know the issue is the tokenizer padding side.

For very weird reasons when training the model the tokenizer uses the left side, BUT forces the right side when doing inference.


Here is the re-decoded input_ids that I get from unsloth/zoo 2025.2.15/2025.2.7:

'<|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|>japanese OCR:\n<|im_end|>\n<|im_start|>assistant\n


And here the same output for the newest version:
<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|>japanese OCR:\n<|im_end|>\n<|im_start|>assistant\n<|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|><|vision_pad|>



The most annoying part is that apperently unsloth decides to force the padding side internally now? I've set the padding side multiple times in my inference code with tokenizer.padding_side = "left" and right up to before the model generates outputs the python debugger is reporting a padding side of "left". But after the model.generate call, the tokenizer side is back to right?



So yeah. We need 1) a consistent tokenizer side and 2) not overwriting user specified values.

I advocate for consistent tokenizer side "left" as that ensures the token-distance to the user input stays always the same, while tokenizer "right" creates variable spacing between input and output.


Sorry that I am not going further and creating a PR. My git isn't yet quite set up for that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Newest Unsloth version silently FORCES Qwen2VL tokenizer padding side to right in inference, while training is left #2138

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Newest Unsloth version silently FORCES Qwen2VL tokenizer padding side to right in inference, while training is left #2138

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions