-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gemma3 (and Paligemma) position_ids 1-indexed? #36856
Comments
Hmm... cc @molbap , I remember you told that was needed to match the original implementation. Can you take a look? |
Hey, sure - IIRC it was to make our implementation match jax at the time in PaliGemma because the bos token was added after/had an unusual positioning, and it needed to be added afterwards. Not sure why in Gemma3, taking a look |
This is because, in the |
@molbap I'm curious for paligemma also, so far as I could tell, all the Jax implementations of paligemma I've seen also do 0-indexing, such as the implementation in https://github.com/google-research/big_vision/blob/main/big_vision/models/proj/paligemma/paligemma.py, which does 0-indexing. Would you happen to know which Jax implementation was the reference here which had 1-indexing? |
There is none: it's not per se related to the original implementation as it was to a tokenizer issue IIRC, and this was a quickfix at the time which ended up staying there. We had 100% logit matching with the original implementation though. You're right to bring this up, I'll reopen the issue in order to remember to investigate. |
In the official google implementation of gemma3, all the position_id preparation indicates that position_ids are 0-indexed, the same is true of paligemma in big vision. https://github.com/google-deepmind/gemma/blob/91ee586fbb2f3b8bfeb07b99967008348a229689/gemma/transformer.py#L791.
However, in transformers
transformers/src/transformers/models/gemma3/modeling_gemma3.py
Line 1430 in cf8091c
The text was updated successfully, but these errors were encountered: