<a href="https://colab.research.google.com/github/vkjadon/llm/blob/main/pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Under the hood, pipeline does the following

- Load model + tokenizer from the hub
- Convert input into tokens
- Run through the model
- Convert output to human-readable text
- Return structured output (JSON-like)

pipeline is like a fully automatic washing machine.
You don't need to know drum rotation speed (model internals).
You choose a program (task), give input (clothes), and get the result (clean output).

In [15]:
import warnings
warnings.filterwarnings("ignore")

In [16]:
from transformers import pipeline

In [17]:
classifier = pipeline("sentiment-analysis")
classifier(
    [
        "I've been waiting for a HuggingFace course my whole life.",
        "I hate this so much!",
    ]
)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9598049521446228},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

In [18]:
print(classifier.model)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


In [19]:
print(classifier.tokenizer)

DistilBertTokenizerFast(name_or_path='distilbert/distilbert-base-uncased-finetuned-sst-2-english', vocab_size=30522, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True, added_tokens_decoder={
	0: AddedToken("[PAD]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	100: AddedToken("[UNK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	101: AddedToken("[CLS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	102: AddedToken("[SEP]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	103: AddedToken("[MASK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
)


In [None]:
sentiment = pipeline("sentiment-analysis")
sentiment([
    "This class is fantastic!",
    "I hate slow WiFi."
])

In [None]:
print(sentiment.model)

`pipeline(
    task,
    model=None,
    tokenizer=None,
    framework=None,
    device=-1,
    **kwargs
)`

In [None]:
generator = pipeline("text-generation", model="gpt2")

In [None]:
generator("AI in education will", max_new_tokens=20)

In [None]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-hi")

translator("Students love learning new technologies.")


In [None]:
summarizer = pipeline("summarization")

text = """LLMs are transforming education by enabling intelligent tutoring,
automated grading, content creation, and personalized learning."""
summarizer(text)


In [None]:
summarizer(text, max_length=10)

In [None]:
from PIL import Image
import requests

image = Image.open(requests.get("https://huggingface.co/datasets/mishig/sample_images/resolve/main/panda.jpg", stream=True).raw)
classifier = pipeline("image-classification")

classifier(image)


In [None]:
sentiment = pipeline(
    task="sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english"
)
sentiment("Great teaching!")


In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
nlp("Transformers are amazing.")


In [None]:
sentiment.save_pretrained("my_pipeline")

from transformers import pipeline
loaded = pipeline("sentiment-analysis", model="my_pipeline")


First step of our pipeline is to convert the text inputs into numbers (tokenization) that the model can make sense of. So we first download that information from the Model using AutoTokenizer class and its from_pretrained() method.

In [None]:
from transformers import AutoTokenizer

In [None]:
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

Next, we can directly pass our sentences to it to get a dictionary that is ready to feed to our model.

In [None]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")

In [None]:
print(inputs)

The output is a dictionary containing two keys, input_ids and attention_mask. input_ids contains two rows of integers (one for each sentence) that are the unique identifiers of the tokens in each sentence.

In [None]:
from transformers import AutoModel

In [None]:
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

In [None]:
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

**We are getting torch.Size tensor because the Hugging Face Transformers model internally uses PyTorch, even if you didn't import it.**

The returned object of (outputs.last_hidden_state) is a PyTorch tensor.

In [None]:
outputs[0]

In [None]:
outputs["last_hidden_state"].shape[0]

AutoModel = Base Transformer (No Task Head)
This gives you only the core Transformer ‚Äî the encoder/decoder blocks

Only hidden states Useful for embeddings, representation learning, similarity, clustering, etc.

Task-Specific Models = AutoModel + Task Head

Each class like AutoModelForCausalLM, AutoModelForSequenceClassification, etc. adds a head on top of the base model.

That means:

AutoModel = backbone
AutoModelForXxx = backbone + task layer



AutoModel is Bare Transformer (BERT encoder or GPT decoder)

Use when You want token embeddings, You want sentence embeddings, You want to do cosine similarity or clustering, You are building your own model head manually

AutoModelForCausalLM For text generation tasks (GPT-style models)

What it adds: A causal language modeling head that predicts the next token using left-to-right attention.

Use when: Chatbots, Story generation, Code generation, Autocomplete

AutoModelForMaskedLM

‚û°Ô∏è For fill-in-the-blanks tasks (BERT-style models)

What it adds:

A masked language modeling head that predicts masked tokens ([MASK]).

Use when:

Fill in missing words

Pretraining-like tasks

Masked token prediction

AutoModelForSequenceClassification

‚û°Ô∏è For sentence-level classification

What it adds:

A classification head on top of the [CLS] token.

Use when:

Sentiment analysis

Spam detection

Intent classification

Fake news detection

In [None]:
from transformers import AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)

In [None]:
print(outputs.logits.shape)


Now if we look at the shape of our outputs, the dimensionality will be much lower: the model head takes as input the high-dimensional vectors we saw before, and outputs vectors containing two values (one per label):

The values we get as output from our model don‚Äôt necessarily make sense by themselves. Let‚Äôs take a look:



In [None]:
print(outputs.logits)

Our model predicted [-1.5607, 1.6123] for the first sentence and [ 4.1692, -3.3464] for the second one. Those are not probabilities but logits, the raw, unnormalized scores outputted by the last layer of the model. To be converted to probabilities, they need to go through a SoftMax layer (all ü§ó Transformers models output the logits, as the loss function for training will generally fuse the last activation function, such as SoftMax, with the actual loss function, such as cross entropy):

In [None]:
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)

Now we can see that the model predicted [0.0402, 0.9598] for the first sentence and [0.9995, 0.0005] for the second one. These are recognizable probability scores.

To get the labels corresponding to each position, we can inspect the id2label attribute of the model config (more on this in the next section):

Copied


In [None]:
model.config.id2label

We have successfully reproduced the three steps of the pipeline: preprocessing with tokenizers, passing the inputs through the model, and postprocessing! Now let‚Äôs take some time to dive deeper into each of those steps.

AutoModelForTokenClassification

‚û°Ô∏è For tagging each token

What it adds:

A head that produces a label per token.

Use when:

Named Entity Recognition (NER)

Parts-of-speech tagging (POS)

Chunking

AutoModelForQuestionAnswering

‚û°Ô∏è For extractive QA (SQuAD-style)

What it adds:

Two heads:

Start position of answer

End position of answer

Use when:

Reading comprehension

Extracting answers from passages

utoModelForMultipleChoice

‚û°Ô∏è For MCQs (multiple-choice questions)

What it adds:

A head that compares several options.

Example:

For 4 options:
se when:

AI answering exam-type MCQs

Sentence reasoning choices

AutoModelForSeq2SeqLM

‚û°Ô∏è For encoder‚Äìdecoder models (translation, summarization)

Use when:

Translation

Summarization

Paraphrasing