You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using distributed or parallel set-up in script?: no
Using GPU in script?: yes
GPU type: NVIDIA L40S
Who can help?
Hi @ArthurZucker@itazap (tagging you per instructions), when I use the Llama3 tokenizers and encode the string ' ...', and then decode the resulting token, I get the string '...' back instead of the string ' ...' (leading space is missing).
I believe that decode should be the inverse of encode in this case, and it's unclear to me why it isn't.
Sorry if I'm misunderstanding something! Thanks for your time :)
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
Naqu6
changed the title
Llama3 tokenizer decode is incorrect for '...' with leading space
Llama3 tokenizer decode is incorrect for ' ...' with leading space
Mar 9, 2025
System Info
transformers
version: 4.49.0Who can help?
Hi @ArthurZucker @itazap (tagging you per instructions), when I use the Llama3 tokenizers and encode the string ' ...', and then decode the resulting token, I get the string '...' back instead of the string ' ...' (leading space is missing).
I believe that decode should be the inverse of encode in this case, and it's unclear to me why it isn't.
Sorry if I'm misunderstanding something! Thanks for your time :)
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
This outputs
'...'
(no leading space).Expected behavior
I believe that
decode
should be the inverse ofencode
.E.g
outputs " Hello world", as expected.
The text was updated successfully, but these errors were encountered: