<a href="https://colab.research.google.com/github/yaswanthd333/Huggingface_LLM/blob/main/Transformers_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformers pipeline

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [1]:
!uv pip install datasets evaluate transformers[sentencepiece]

[2mUsing Python 3.12.12 environment at: /usr[0m
[2mAudited [1m3 packages[0m [2min 435ms[0m[0m


In [2]:
# Sentiment analysis of given text
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("Langchain looks like an interesting subject")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9986820816993713}]

In [3]:
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this OpenAI API pricing!"]
)

[{'label': 'POSITIVE', 'score': 0.9598050713539124},
 {'label': 'NEGATIVE', 'score': 0.9996367692947388}]

In [4]:
# Classification of sentences according to categories
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "Formula 1 Technical Regulations changed drastically from 2026 season.",
    candidate_labels=["Technology","Entertainment","Sports"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'sequence': 'Formula 1 Technical Regulations changed drastically from 2026 season.',
 'labels': ['Technology', 'Sports', 'Entertainment'],
 'scores': [0.9453946352005005, 0.0380333736538887, 0.01657198742032051]}

In [5]:
# Text generation from the given text
from transformers import pipeline

generator = pipeline("text-generation")
generator("In this Langchain course, we will teach you how to do Langchain concepts")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this Langchain course, we will teach you how to do Langchain concepts. In this course, I would suggest you take several key concepts from the Langchain course, and then introduce them to the students. By the time you get to the actual course, you will have learned about the concepts, but you will not need to understand the concepts in order to understand the concepts.\n\nWe will also use the Langchain concepts to apply to the Langchain class. This course is an introduction to the Langchain concepts in order to gain a better understanding and feel for them.\n\nYou will also learn how to use the Langchain concept to apply for a Bachelor of Science degree in Computer Science.\n\nLearning to speak and write in a Chinese language\n\nWhat are the basic concepts that you need to learn in order to become proficient in a Chinese language?\n\nYou can learn this with basic knowledge in a language that is not easy to learn: Mandarin.\n\nIn this course, we will teach you the

In [6]:
# Text generation using distilgpt2 model
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to do Langchain concepts",
    max_length=50,
    num_return_sequences=3,
)

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'In this course, we will teach you how to do Langchain concepts and how to make your own Langchain framework.\n\n\n\nIn the beginning, weâ€™ll start with our basic Langchain framework, which is a simple one to follow. We do not have a specific basic Langchain framework, and we will start with a simple tutorial. This tutorial will show you how to use all the Langchain libraries in your project.\nOnce you have your Langchain framework installed, you will learn how to use each Langchain framework to develop your own Langchain framework.\nOnce you have your Langchain framework installed, you will learn how to use each Langchain framework to develop your own Langchain framework.\nWe will introduce you to Langchain Framework concepts and how to use the Langchain framework to develop your own Langchain framework.\nWe will introduce you to Langchain Framework concepts and how to use the Langchain framework to develop your own Langchain framework.\nThe Langchain framework us

In [7]:
# Mask fill
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=3)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.19619743525981903,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052726551890373,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'},
 {'score': 0.033017825335264206,
  'token': 27930,
  'token_str': ' predictive',
  'sequence': 'This course will teach you all about predictive models.'}]

In [8]:
# Named Entity Recognition
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Yaswanth and I work at TCS in Kolkata.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'entity_group': 'PER',
  'score': np.float32(0.9903352),
  'word': 'Yaswanth',
  'start': 11,
  'end': 19},
 {'entity_group': 'ORG',
  'score': np.float32(0.99776757),
  'word': 'TCS',
  'start': 34,
  'end': 37},
 {'entity_group': 'LOC',
  'score': np.float32(0.9962852),
  'word': 'Kolkata',
  'start': 41,
  'end': 48}]

In [9]:
# Question answering using context
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Yaswanth and I work at TCS in Kolkata.",
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'score': 0.5612614750862122,
 'start': 34,
 'end': 48,
 'answer': 'TCS in Kolkata'}

In [10]:
# Summarization
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other
    industrial countries in Europe and Asia, continue to encourage and advance
    the teaching of engineering. Both China and India, respectively, graduate
    six and eight times as many traditional engineers as does the United States.
    Other industrial countries at minimum maintain their output, while America
    suffers an increasingly serious decline in the number of engineering graduates
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'summary_text': ' The number of engineering graduates in the United States has declined in recent years . China and India graduate six and eight times as many traditional engineers as the U.S. does . Rapidly developing economies such as China continue to encourage and advance the teaching of engineering . There are declining offerings in engineering subjects dealing with infrastructure, infrastructure, the environment, and related issues .'}]

In [11]:
# translation
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par TCS.")

Device set to use cpu


[{'translation_text': 'This course is produced by TCS.'}]

In [12]:
# image classifier
from transformers import pipeline

image_classifier = pipeline(
    task="image-classification", model="google/vit-base-patch16-224"
)
result = image_classifier(
    "https://cdn-uploads.huggingface.co/production/uploads/6402366d06c715b9340068ae/4kzu2tiVTJwuy0q_ZjuRN.png"
)
print(result)

Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
Device set to use cpu


[{'label': 'CD player', 'score': 0.4938413202762604}, {'label': 'tape player', 'score': 0.11106492578983307}, {'label': 'radio, wireless', 'score': 0.10971023142337799}, {'label': 'oscilloscope, scope, cathode-ray oscilloscope, CRO', 'score': 0.03644983097910881}, {'label': 'cassette player', 'score': 0.03403060883283615}]


In [13]:
# Automatic Speech Recognition

from transformers import pipeline

transcriber = pipeline(
    task="automatic-speech-recognition", model="openai/whisper-large-v3"
)
result = transcriber(
    "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"
)
print(result)

Device set to use cpu
`return_token_timestamps` is deprecated for WhisperFeatureExtractor and will be removed in Transformers v5. Use `return_attention_mask` instead, as the number of frames can be inferred from it.
Using custom `forced_decoder_ids` from the (generation) config. This is deprecated in favor of the `task` and `language` flags/config options.
Transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English. This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`. See https://github.com/huggingface/transformers/pull/28687 for more details.


{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}
