In [14]:
from transformers import AutoTokenizer, AutoModelForCausalLM


tokenizer_qwq = AutoTokenizer.from_pretrained("Qwen/QwQ-32B")
prompt_qwq = tokenizer_qwq.apply_chat_template(

    [
        {"role": "user", "content": "What is 1 + 2?"},
        {"role": "assistant", "content": "The answer is 3."}
    ],
    tokenize=False
)

print('-- QwQ prompt: --')
print(prompt_qwq)

-- QwQ prompt: --
<|im_start|>user
What is 1 + 2?<|im_end|>
<|im_start|>assistant
The answer is 3.<|im_end|>



In [11]:
tokenizer_qwen3 = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")
prompt_qwen3 = tokenizer_qwen3.apply_chat_template(
    [
        {"role": "user", "content": "What is 1 + 2?"},
        {"role": "assistant", "content": "The answer is 3."}
    ],
    tokenize=False
)

print('-- Qwen3 full prompt: --')
print(prompt_qwen3)

-- Qwen3 full prompt: --
<|im_start|>user
What is 1 + 2?<|im_end|>
<|im_start|>assistant
<think>

</think>

The answer is 3.<|im_end|>



In [12]:
prompt_qwen3_only_user = tokenizer_qwen3.apply_chat_template(
    [
        {"role": "user", "content": "What is 1 + 2?"},
    ],
    tokenize=False,
    add_generation_prompt=True
)

print('-- Qwen3 only user prompt: --')
print(prompt_qwen3_only_user)


-- Qwen3 only user prompt: --
<|im_start|>user
What is 1 + 2?<|im_end|>
<|im_start|>assistant



In [None]:
model_qwen3 = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", device_map="cuda")
output = model_qwen3.generate(tokenizer_qwen3.encode(prompt_qwen3_only_user, return_tensors="pt").to("cuda"), max_new_tokens=512)
print('-- Qwen3 output: --')
print(tokenizer_qwen3.decode(output[0], skip_special_tokens=False))

-- Qwen3 output: --
<|im_start|>user
What is 1 + 2?<|im_end|>
<|im_start|>assistant
<think>
Okay, so the question is asking, what is 1 plus 2? Let me think. Well, I know that in mathematics, addition is the operation that combines two numbers. So, 1 plus 2 would be combining the two numbers. Let me visualize this. If I have a number line, starting at 1, and then adding 2, that would take me to 3. So, 1 plus 2 equals 3. 

Wait, but maybe I should check if there's any trick here. Sometimes problems can be tricky, like if they're in a different context or if there's a special rule. But in basic arithmetic, addition is straightforward. So, 1 plus 2 is definitely 3. 

I don't think there's anything else to it. Maybe if there's a different interpretation, like if they're asking about something else, but the question is very simple. 1 plus 2 is a basic addition problem. So, the answer should be 3. 

I should also consider if there's any possible misunderstanding, like if they're asking for so

So the conclusions are: 
1) QwQ does NOT have <think></think> tags structure for reasoning, but Qwen3 have.
2) When given the assistant's response, Qwen3's apply_chat_template does add <think></think> tags automatically, however with empty content.
3) When given only the user's prompt, Qwen3's apply_chat_template does NOT add <think> tag. Instead the model knows to start with the opening tag <think> itself. And after the reasoning chain, it closes the tag and then continues with the direct response to the user.


In [24]:
string = "Wait, but maybe I miscalculated. Let me verify each step again."
# print(AutoTokenizer.from_pretrained('Qwen/Qwen3-32B').tokenize(string))
print(list(enumerate(AutoTokenizer.from_pretrained('Qwen/Qwen3-0.6B').tokenize(string), start=1)))

[(1, 'Wait'), (2, ','), (3, 'Ġbut'), (4, 'Ġmaybe'), (5, 'ĠI'), (6, 'Ġm'), (7, 'iscal'), (8, 'culated'), (9, '.'), (10, 'ĠLet'), (11, 'Ġme'), (12, 'Ġverify'), (13, 'Ġeach'), (14, 'Ġstep'), (15, 'Ġagain'), (16, '.')]
