Understanding Transformers

A transformer is a type of deep learning model architecture introduced in 2017 by Vaswani et al. in the paper "Attention Is All You Need" that processes sequences (like text, audio, or time-series data) using a mechanism called self-attention to understand relationships between elements, regardless of their position in the sequence.

Unlike older models like RNNs or LSTMs, transformers don’t read inputs strictly left-to-right or right-to-left — they look at the entire sequence at once, figuring out which parts are important for each prediction.

In simple terms:
It’s like reading a whole paragraph and instantly noticing which words are most connected, instead of reading word-by-word and slowly remembering the context.

Query (Q): “What am I looking for?”
Key (K): “What do I have to offer?”
Value (V): “What’s my actual information?” (this is the vector embedding each word in a sentence has)

Q and K give us the probability of how important a word is in a sentence and that probability is dot producted with the V.

The pipeline() function you’re talking about is not part of the original Transformer architecture at all — it’s a convenience feature that Hugging Face created in their Transformers library to make using pre-trained models super quick without having to manually tokenize, run the model, and decode the output yourself.

Encoder Transformer Model : BERT (Bidirectional encoder representations from transformers) Google 2018 
Useful for understanding text, tasks like masked token prediction and next sentence prediction

Text  → Tokenizer → IDs → BERT Encoder → Output embeddings → Task-specific head → Prediction

Decoder-only Transformer Model:  GPT (Generative Pretrained Transformer) — OpenAI, 2018

Useful for text generation, completion, and autoregressive prediction.
Trained with causal masking to predict the next token in a sequence.

Text → Tokenizer → IDs → Decoder Transformer (causal mask) → Output logits → Softmax → Next token prediction

Encoder–Decoder (Seq2Seq) Transformer Model : T5 (Text-to-Text Transfer Transformer) — Google, 2020 / BART — Facebook, 2019

Useful for translation, summarization, question answering — tasks where input and output can be different sequences.
Encoder understands the input, decoder generates the output conditioned on encoder output.

Input text → Tokenizer → IDs → Encoder → Context embeddings
Context embeddings + Start token → Decoder → Output logits → Softmax → Next token prediction


In [None]:
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a Hugging Face course my whole life.")



In [18]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445985913276672, 0.11197440326213837, 0.04342705383896828]}