Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

piece id is out of range #314

Open
chethanwiz opened this issue Apr 9, 2024 · 3 comments
Open

piece id is out of range #314

chethanwiz opened this issue Apr 9, 2024 · 3 comments

Comments

@chethanwiz
Copy link

can someone help me with this error please

Traceback (most recent call last):
File "C:\Users\cheth\Music\new chaya\OneReality\OneRealityMemory.py", line 68, in
ExLlamatokenizer = ExLlamaV2Tokenizer(config)
File "C:\Python310\lib\site-packages\exllamav2\tokenizer\tokenizer.py", line 192, in _init
self.eos_token = (self.tokenizer_model.eos_token() or self.extended_id_to_piece.get(self.eos_token_id, None)) or self.tokenizer_model.id_to_piece(self.eos_token_id)
File "C:\Python310\lib\site-packages\exllamav2\tokenizer\spm.py", line 43, in id_to_piece
return self.spm.id_to_piece(idx)
File "C:\Python310\lib\site-packages\sentencepiece_init
.py", line 1179, in _batched_func
return func(self, arg)
File "C:\Python310\lib\site-packages\sentencepiece_init
.py", line 1172, in _func
raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.

@turboderp
Copy link
Owner

This is usually caused by conflicting vocabularies in merged models. Would help to know what model this is.

@chethanwiz
Copy link
Author

dolphin-2.1-mistral-7B-GPTQ

@turboderp
Copy link
Owner

The model seems to be using the same tokenizer as Mistral, which doesn't define the two ChatML tokens that Dolphin needs. You can try adding an added_tokens.json file to the model directory with this content:

{
  "<|im_end|>": 32000,
  "<|im_start|>": 32001
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants