In [2]:
import torch

def main():
    print("Hello from torch-test!")
    if torch.backends.mps.is_available():
        print("Excellent! MPS backend is available.")
    else:
        print("MPS backend is not available: Something went wrong! Are you running this on a Mac with Apple Silicon chip?")

if __name__ == "__main__":
    main()

Hello from torch-test!
Excellent! MPS backend is available.


In [3]:
import transformers
print(transformers.__version__)
from transformers import AutoModelForCausalLM, AutoTokenizer


4.55.4


In [4]:
model_name = "Qwen/Qwen3-0.6B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
print(tokenizer)

Qwen2TokenizerFast(name_or_path='Qwen/Qwen3-0.6B', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized

In [5]:
# load the tokenizer and the model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

In [7]:
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

In [8]:
print(prompt)
print(text)
print(model_inputs)


Give me a short introduction to large language model.
<|im_start|>user
Give me a short introduction to large language model.<|im_end|>
<|im_start|>assistant

{'input_ids': tensor([[151644,    872,    198,  35127,    752,    264,   2805,  16800,    311,
           3460,   4128,   1614,     13, 151645,    198, 151644,  77091,    198]],
       device='mps:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
       device='mps:0')}


In [9]:
# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

thinking content: <think>
Okay, the user wants a short introduction to a large language model. Let me start by recalling what I know about LLMs. They're big language models, right? So I need to mention that they're trained on massive amounts of text. But how to make it short and engaging?

First, maybe start with their purpose. Like, "A large language model is a type of AI that can understand and generate human-like text." Then add something about training data. "They are trained on vast amounts of text to understand and create complex languages." 

Wait, should I mention their capabilities? Like, they can answer questions, write essays, or even code? Maybe include that. Also, emphasize that they are not humans but are built on large datasets. That's important. 

I should keep it concise. Let me check the example response. It's friendly and informative. Maybe add something about their efficiency and how they help in various tasks. Oh, and mention that they are trained on massive data. 

In [10]:
# conduct text completion with streaming output
from transformers import TextStreamer

# Create a streamer for real-time output
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

# Generate with streaming
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1000,
    streamer=streamer,  # Enable streaming
    do_sample=True,
    temperature=0.7
)

<think>
Okay, the user is asking for a short introduction to a large language model. Let me start by recalling what a large language model is. They're AI models that can understand and generate text. I should mention their key features like understanding language, generating text, and being capable of tasks like writing articles or answering questions.

Wait, I need to make sure it's concise. Maybe start with "Large language models (LLMs)" and explain their purpose. Also, include examples like GPT or BERT. Should I add something about their applications? Yes, like helping with writing, research, or customer service. Keep it simple and highlight their importance in modern AI.

Check for any technical terms that might confuse someone. Avoid jargon unless it's necessary. Make sure the tone is friendly and informative. Let me put that together now.
</think>

A large language model (LLM) is an AI model designed to understand and generate human language. These models can comprehend text, gen