Gemma3 (and Paligemma) position_ids 1-indexed? #36856

oceanusxiv · 2025-03-20T13:54:30Z

In the official google implementation of gemma3, all the position_id preparation indicates that position_ids are 0-indexed, the same is true of paligemma in big vision. https://github.com/google-deepmind/gemma/blob/91ee586fbb2f3b8bfeb07b99967008348a229689/gemma/transformer.py#L791.

However, in transformers

transformers/src/transformers/models/gemma3/modeling_gemma3.py

Line 1430 in cf8091c

# position_ids in Gemma3 are 1-indexed

, it's stated that position_ids are 1-indexed, this seems like a weird discrepancy between model implementations, is this intended?

zucchini-nlp · 2025-03-20T14:20:12Z

Hmm...

cc @molbap , I remember you told that was needed to match the original implementation. Can you take a look?

molbap · 2025-03-20T14:59:58Z

Hey, sure - IIRC it was to make our implementation match jax at the time in PaliGemma because the bos token was added after/had an unusual positioning, and it needed to be added afterwards. Not sure why in Gemma3, taking a look

molbap · 2025-03-20T15:10:46Z

This is because, in the modular file, Gemma3ForConditionalGeneration inherits from PaliGemmaForConditionalGeneration where this specific fix happens. But it is specific to PaliGemma input ordering and should not have been propagated to Gemma3. I'm opening a PR to change it, although it's a global shift of position ids so it should not change much for RoPE and subsequent logits. cc @gante as we discussed this last time

oceanusxiv · 2025-03-20T19:27:42Z

@molbap I'm curious for paligemma also, so far as I could tell, all the Jax implementations of paligemma I've seen also do 0-indexing, such as the implementation in https://github.com/google-research/big_vision/blob/main/big_vision/models/proj/paligemma/paligemma.py, which does 0-indexing. Would you happen to know which Jax implementation was the reference here which had 1-indexing?

molbap · 2025-03-24T08:29:27Z

There is none: it's not per se related to the original implementation as it was to a tokenizer issue IIRC, and this was a quickfix at the time which ended up staying there. We had 100% logit matching with the original implementation though. You're right to bring this up, I'll reopen the issue in order to remember to investigate.

molbap mentioned this issue Mar 20, 2025

🔴 🔴 🔴 supersede paligemma forward to shift pos id indexing #36859

Merged

molbap closed this as completed in #36859 Mar 21, 2025

molbap reopened this Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma3 (and Paligemma) position_ids 1-indexed? #36856

Gemma3 (and Paligemma) position_ids 1-indexed? #36856

oceanusxiv commented Mar 20, 2025

zucchini-nlp commented Mar 20, 2025

molbap commented Mar 20, 2025

molbap commented Mar 20, 2025

oceanusxiv commented Mar 20, 2025

molbap commented Mar 24, 2025

Gemma3 (and Paligemma) position_ids 1-indexed? #36856

Gemma3 (and Paligemma) position_ids 1-indexed? #36856

Comments

oceanusxiv commented Mar 20, 2025

zucchini-nlp commented Mar 20, 2025

molbap commented Mar 20, 2025

molbap commented Mar 20, 2025

oceanusxiv commented Mar 20, 2025

molbap commented Mar 24, 2025