In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

# Load tokenizer and model
print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(
    "medalpaca/medalpaca-7b", 
    use_fast=False
    )

print("Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    "medalpaca/medalpaca-7b", 
    device_map="auto",
    torch_dtype=torch.float16
)

# Create pipeline
print("Creating pipeline...")
generator = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=100,
    temperature=0.1
)

# Test simple medical query
print("Testing model...")
prompt = "What are the symptoms of diabetes?"
result = generator(prompt)

print("\nResult:")
print(result[0]['generated_text'])
print("\nTest completed successfully!")

Loading tokenizer...


tokenizer_config.json:   0%|          | 0.00/260 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message


Loading model...


config.json:   0%|          | 0.00/542 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/28.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/9.88G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/9.89G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/7.18G [00:00<?, ?B/s]



Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Device set to use cuda:0


Creating pipeline...
Testing model...





Result:
What are the symptoms of diabetes?
Diabetes is a chronic disease that affects the way your body processes food. It is caused by high levels of glucose (sugar) in the blood.
The symptoms of diabetes are due to high blood sugar levels.
The symptoms of diabetes are due to high blood sugar levels. Symptoms of diabetes include:
Increased thirst and urination.
Fatigue and weakness.
Blur

Test completed successfully!
