Excerpted from _[Pipelines for inference](https://huggingface.co/docs/transformers/pipeline_tutorial)_ in the [🤗 Transformers documentation](https://huggingface.co/docs/transformers).

## Text pipeline

In [5]:
from transformers import pipeline

# This model is a `zero-shot-classification` model.
# It will classify text, except you are free to choose any label you might imagine
classifier = pipeline(model="facebook/bart-large-mnli")
classifier(
    "I have a problem with my iphone that needs to be resolved asap!!",
    candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"],
)

config.json: 100%|██████████| 1.15k/1.15k [00:00<00:00, 7.70MB/s]
model.safetensors: 100%|██████████| 1.63G/1.63G [00:19<00:00, 83.3MB/s]
tokenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 198kB/s]
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 33.0MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 32.1MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 46.6MB/s]


{'sequence': 'I have a problem with my iphone that needs to be resolved asap!!',
 'labels': ['urgent', 'phone', 'computer', 'not urgent', 'tablet'],
 'scores': [0.5036353468894958,
  0.478799968957901,
  0.012600121088325977,
  0.0026557904202491045,
  0.002308748895302415]}

## Generation with LLMs

In [6]:
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:25<00:00, 12.68s/it]


Preprocess text input with tokenizer:

In [7]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", padding_side="left")
model_inputs = tokenizer(["A list of colors: red, blue"], return_tensors="pt").to("cuda")

In [8]:
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


'A list of colors: red, blue, green, yellow, orange, purple, pink,'

In [9]:
tokenizer.pad_token = tokenizer.eos_token  # Most LLMs don't have a pad token by default
model_inputs = tokenizer(
    ["A list of colors: red, blue", "Portugal is"], return_tensors="pt", padding=True
).to("cuda")
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


['A list of colors: red, blue, green, yellow, orange, purple, pink,',
 'Portugal is a country in southwestern Europe, on the Iber']