Hello, I would like to ask if the tokenizers used for Qwen and Llama are both Qwen2.5-7B and Llama-3.1-8B?
Why not directly use the word lists of DeepSeek-R1-Distill-Qwen and DeepSeek-R1-Distill-Llama? Is there any design-related reason for this?
thank you.
Hello, I would like to ask if the tokenizers used for Qwen and Llama are both Qwen2.5-7B and Llama-3.1-8B?
Why not directly use the word lists of DeepSeek-R1-Distill-Qwen and DeepSeek-R1-Distill-Llama? Is there any design-related reason for this?
thank you.