<a href="https://colab.research.google.com/github/vkjadon/llm/blob/main/pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Under the hood, pipeline does the following

- Load model + tokenizer from the hub
- Convert input into tokens
- Run through the model
- Convert output to human-readable text
- Return structured output (JSON-like)

In [None]:
from transformers import pipeline

In [None]:
classifier = pipeline("sentiment-analysis")
classifier(
    [
        "I've been waiting for a HuggingFace course my whole life.",
        "I hate this so much!",
    ]
)

In [None]:
print(classifier.model)

In [None]:
sentiment = pipeline("sentiment-analysis")
sentiment([
    "This class is fantastic!",
    "I hate slow WiFi."
])

In [None]:
print(sentiment.model)

`pipeline(
    task,
    model=None,
    tokenizer=None,
    framework=None,
    device=-1,
    **kwargs
)`

In [None]:
generator = pipeline("text-generation", model="gpt2")

In [7]:
generator("AI in education will", max_new_tokens=20)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'AI in education will be able to offer a much more robust, integrated, and efficient approach to the development of teaching,'}]

In [8]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-hi")

translator("Students love learning new technologies.")


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/306M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/306M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/812k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/1.07M [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

Device set to use cpu


[{'translation_text': 'विद्यार्थियों को नयी तकनीक सीखने में बेहद खुशी होती है ।'}]

In [9]:
summarizer = pipeline("summarization")

text = """LLMs are transforming education by enabling intelligent tutoring,
automated grading, content creation, and personalized learning."""
summarizer(text)


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Device set to use cpu
Your max_length is set to 142, but your input_length is only 27. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=13)


[{'summary_text': ' LLMs are transforming education by enabling intelligent tutoring, automated grading, content creation, and personalized learning . LLMs enable automated grading and content creation for students to learn with personalized learning. LLMs can also be used to help students understand and understand content .'}]

In [10]:
summarizer(text, max_length=10)

Your min_length=56 must be inferior than your max_length=10.


[{'summary_text': ' LLMs are transforming education by enabling'}]

First step of our pipeline is to convert the text inputs into numbers (tokenization) that the model can make sense of. So we first download that information from the Model using AutoTokenizer class and its from_pretrained() method.

In [None]:
from transformers import AutoTokenizer

In [None]:
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

Next, we can directly pass our sentences to it to get a dictionary that is ready to feed to our model.

In [None]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")

In [None]:
print(inputs)

The output is a dictionary containing two keys, input_ids and attention_mask. input_ids contains two rows of integers (one for each sentence) that are the unique identifiers of the tokens in each sentence.

In [None]:
from transformers import AutoModel

In [None]:
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

In [None]:
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

**We are getting torch.Size tensor because the Hugging Face Transformers model internally uses PyTorch, even if you didn't import it.**

The returned object of (outputs.last_hidden_state) is a PyTorch tensor.

In [None]:
outputs[0]

In [None]:
outputs["last_hidden_state"].shape[0]

AutoModel = Base Transformer (No Task Head)
This gives you only the core Transformer — the encoder/decoder blocks

Only hidden states Useful for embeddings, representation learning, similarity, clustering, etc.

Task-Specific Models = AutoModel + Task Head

Each class like AutoModelForCausalLM, AutoModelForSequenceClassification, etc. adds a head on top of the base model.

That means:

AutoModel = backbone
AutoModelForXxx = backbone + task layer



AutoModel is Bare Transformer (BERT encoder or GPT decoder)

Use when You want token embeddings, You want sentence embeddings, You want to do cosine similarity or clustering, You are building your own model head manually

AutoModelForCausalLM For text generation tasks (GPT-style models)

What it adds: A causal language modeling head that predicts the next token using left-to-right attention.

Use when: Chatbots, Story generation, Code generation, Autocomplete

AutoModelForMaskedLM

➡️ For fill-in-the-blanks tasks (BERT-style models)

What it adds:

A masked language modeling head that predicts masked tokens ([MASK]).

Use when:

Fill in missing words

Pretraining-like tasks

Masked token prediction

AutoModelForSequenceClassification

➡️ For sentence-level classification

What it adds:

A classification head on top of the [CLS] token.

Use when:

Sentiment analysis

Spam detection

Intent classification

Fake news detection

In [None]:
from transformers import AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)

In [None]:
print(outputs.logits.shape)


Now if we look at the shape of our outputs, the dimensionality will be much lower: the model head takes as input the high-dimensional vectors we saw before, and outputs vectors containing two values (one per label):

The values we get as output from our model don’t necessarily make sense by themselves. Let’s take a look:



In [None]:
print(outputs.logits)

Our model predicted [-1.5607, 1.6123] for the first sentence and [ 4.1692, -3.3464] for the second one. Those are not probabilities but logits, the raw, unnormalized scores outputted by the last layer of the model. To be converted to probabilities, they need to go through a SoftMax layer (all 🤗 Transformers models output the logits, as the loss function for training will generally fuse the last activation function, such as SoftMax, with the actual loss function, such as cross entropy):

In [None]:
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)

Now we can see that the model predicted [0.0402, 0.9598] for the first sentence and [0.9995, 0.0005] for the second one. These are recognizable probability scores.

To get the labels corresponding to each position, we can inspect the id2label attribute of the model config (more on this in the next section):

Copied


In [None]:
model.config.id2label

We have successfully reproduced the three steps of the pipeline: preprocessing with tokenizers, passing the inputs through the model, and postprocessing! Now let’s take some time to dive deeper into each of those steps.

AutoModelForTokenClassification

➡️ For tagging each token

What it adds:

A head that produces a label per token.

Use when:

Named Entity Recognition (NER)

Parts-of-speech tagging (POS)

Chunking

AutoModelForQuestionAnswering

➡️ For extractive QA (SQuAD-style)

What it adds:

Two heads:

Start position of answer

End position of answer

Use when:

Reading comprehension

Extracting answers from passages

utoModelForMultipleChoice

➡️ For MCQs (multiple-choice questions)

What it adds:

A head that compares several options.

Example:

For 4 options:
se when:

AI answering exam-type MCQs

Sentence reasoning choices

AutoModelForSeq2SeqLM

➡️ For encoder–decoder models (translation, summarization)

Use when:

Translation

Summarization

Paraphrasing