[Phi-3-mini-128k-instruct] Difference of encodings for Slow and Fast Tokenizer

### System Info

- `transformers` version: 4.46.2
- Platform: Linux-5.10.223-212.873.amzn2.x86_64-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.26.2
- Safetensors version: 0.4.5
- Accelerate version: 1.1.1
- Accelerate config:    not found
- PyTorch version (GPU?): 2.4.0a0+3bcc3cddb5.nv24.07 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: No
- GPU type: NVIDIA L40S


### Who can help?

@ArthurZucker 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('microsoft/Phi-3-mini-128k-instruct', use_fast=True)
text = '<|user|>\n I am good <|end|>\n<|endoftext|>'
ids=tokenizer(text).input_ids
print(ids)
print(tokenizer.convert_ids_to_tokens(ids))
print(tokenizer.encode(text))

tokenizer = AutoTokenizer.from_pretrained('microsoft/Phi-3-mini-128k-instruct', use_fast=False)
ids = tokenizer(text).input_ids
print(ids)
print(tokenizer.convert_ids_to_tokens(ids))
print(tokenizer.encode(text))

### Expected behavior

In expected case, both slow and fast tokenizer should return the same IDs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Phi-3-mini-128k-instruct] Difference of encodings for Slow and Fast Tokenizer #35973

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Phi-3-mini-128k-instruct] Difference of encodings for Slow and Fast Tokenizer #35973

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions