In [12]:
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline

# 选择模型名称，例如：bert-base-uncased、distilbert-base-uncased、roberta-base 等
model_name = "bert-base-uncased"

# 加载分词器和模型
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)

# 使用 pipeline 简化推理（例如填空任务）
nlp = pipeline("fill-mask", model=model, tokenizer=tokenizer)

# 测试一句话：BERT 会尝试预测 [MASK] 的词
result = nlp("The capital of France is [MASK].")

# 打印结果
for r in result:
    print(f"> {r['sequence']} (score: {r['score']:.4f})")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


> the capital of france is paris. (score: 0.4168)
> the capital of france is lille. (score: 0.0714)
> the capital of france is lyon. (score: 0.0634)
> the capital of france is marseille. (score: 0.0444)
> the capital of france is tours. (score: 0.0303)


In [13]:
pip install transformers accelerate torch sentencepiece

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
814.31s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Note: you may need to restart the kernel to use updated packages.


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# huggingface-cli login --token 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

# meta-llama/Meta-Llama-3-8B-Instruct
# 以 Meta 发布的官方模型为例（需要你登录 huggingface 并接受 license）
# model_id = "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
model_id = 'gpt2'

# 加载 tokenizer 和模型
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16,     
    offload_folder="./offload",       # 把多余的参数卸载到硬盘目录
    offload_state_dict=True,        # 开启状态字典卸载
    device_map="auto"
)

# 使用 pipeline 进行文本生成
llama_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

# 生成文本
prompt = "Explain the concept of black holes in simple terms:"
outputs = llama_pipeline(prompt, max_new_tokens=100, do_sample=True, temperature=0.7)

print('output=', outputs[0]["generated_text"])


Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'attn_factor'}


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Some parameters are on the meta device because they were offloaded to the cpu.
Device set to use cuda:0


output= Explain the concept of black holes in simple terms: a) What are they? b) How do they form? c) What are the key features? d) What is the event horizon? e) What is Hawking radiation? f) How do we know they exist? g) What is the singularity? h) What is the accretion disk? i) What is the Schwarzschild radius? j) What are the different types of black holes?
I need to explain black holes in simple terms, addressing multiple specific points.


Explain the concept of black holes in simple terms: a) What are they? b) How do they form? c) What are the key features? d) What is the event horizon? e) What is Hawking radiation? f) How do we know they exist? g) What is the singularity? h) What is the accretion disk? i) What is the Schwarzschild radius? j) What are the different types of black holes?
I need to explain black holes in simple terms, addressing multiple specific points.

In [None]:
prompt = "Explain the concept of RAG"
outputs = llama_pipeline(prompt, max_new_tokens=50, do_sample=True, temperature=0.5)
print('output=', outputs)


output= [{'generated_text': 'Explain the concept of RAG (Retrieval-Augmented Generation) and its role in improving long-form content generation, and then provide an example of how you would implement RAG for generating a blog post about the future of AI in healthcare. First, explain RAG concis'}]


output= [{'generated_text': 'Explain the concept of RAG (Retrieval-Augmented Generation) and its role in improving long-form content generation, and then provide an example of how you would implement RAG for generating a blog post about the future of AI in healthcare. First, explain RAG concis'}

In [2]:
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))


True
1
NVIDIA GeForce RTX 4060 Laptop GPU


In [13]:

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name = "gpt2"
token = "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"

tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=token)
model = AutoModelForCausalLM.from_pretrained(model_name, use_auth_token=token)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

result = pipe("Hello world!", max_new_tokens=50)
print(result[0]['generated_text'])
print(result)

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Hello world!

We are always looking for new clients. We want all our clients to be able to use our service in production and provide a great experience. We are looking for new clients in the following categories:

1.1 - We are looking
[{'generated_text': 'Hello world!\n\nWe are always looking for new clients. We want all our clients to be able to use our service in production and provide a great experience. We are looking for new clients in the following categories:\n\n1.1 - We are looking'}]


In [None]:

result = pipe("Explain the concept of RAG", max_new_tokens=200)
print(result[0]['generated_text'])


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Explain the concept of RAG, the concept of non-linear equations.

The key to understanding RAG is to understand that RAG works in two different ways. First, there is the equation (e.g., RAG=a) that is used to explain the RAG. This equation provides the basic definitions of the RAG. Second, we can use the equation (e.g., RAG =a). This equation is used to explain the RAG in two different ways. First, we can use the equation (e.g., RAG =b) that is used to explain the RAG in two different ways. Second, we can use the equation (e.g., RAG =c) that is used to explain the RAG in two different ways.

In this lecture, we will show how to use the RAG as an intuitive way to demonstrate the concept of RAG. We will also show how to use the RAG to explain a problem or a solution
