In [1]:
from datasets import load_dataset

dataset = load_dataset("./data/", split="train")

In [2]:
from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    model_name="lora_model", # 训练的模型
    load_in_4bit=True,  # 如果设置成False对应的是16bit LoRA
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.5.7: Fast Qwen2_5_Vl patching. Transformers: 4.51.3.
   \\   /|    NVIDIA GeForce RTX 4090. Num GPUs = 1. Max memory: 23.546 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


In [3]:
FastVisionModel.for_inference(model)

image = dataset[0]["image"]
instruction = "你是一名专业的放射科医生。请准确描述你在图片中看到的内容。"

messages = [
    {
        "role": "user", "content":[
            {"type": "image"},
            {"type": "text", "text": instruction}
        ]
    }
]

# add_generation_prompt 设置为True相当于是添加 assistant 信息
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens = False, # 前面已经是应用了apply_chat_template()已经添加了这些特殊符号
    return_tensors = 'pt',
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True) # skip_prompt设置为True就是返回结果不包括输入的提示词
# min_p 设置了一个最低值，只有达到这个值的token才会被考虑。这个值会根据最高概率token的置信度而变化。
# 如果设置为0.1，那意味着它只会允许概率至少是最大概率token的1/10，如果设置为0.05，则会允许至少是最大概率token的1/20的token。
# temperature 温度是一个用于控制AI生成文本的创造力/多样性水平的参数。
# 每一个时刻输出的tokens可选项会有一个概率分布，温度越高，创造力就变低
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True, temperature=1.5, min_p=0.1)

The panoramic radiograph revealed diffuse periapical radiolucencies with mixed calcified contents involving multiple teeth (arrows). The periapical lesions extended to the root apices of 45 and 46 (top arrow), and of the left maxillary canine (left middle arrow), right mandibular first molar (middle arrow) and lateral incisor (right middle arrow). Additionally, it exhibited a periodontal radiolucency around the tooth 28.<|im_end|>
