# Chapter 1

The most basic object in the 🤗 Transformers library is the `pipeline()` function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer:

In [3]:
from transformers import pipeline

classifier = pipeline('sentiment-analysis')

classifier(["Shadrack is an amazing developer", "Shadrack is not funny"])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


[{'label': 'POSITIVE', 'score': 0.9998669624328613},
 {'label': 'NEGATIVE', 'score': 0.9998083710670471}]

By default, this pipeline selects a particular pretrained model (in the above case it was the `distilbert/distilbert-base-uncased-finetuned-sst-2-english`) that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when you create the classifier object. If you rerun the command, the cached model will be used instead and there is no need to download the model again.

However you can use other models from places such as hugging face.

The pipeline also handles all the steps in between e.g. moving from human readable text, to numbers that the computer can handle and then return a meaningful output that users can understand.

There are three main steps involved when you pass some text to a pipeline:

1. The text is preprocessed into a format the model can understand.
2. The preprocessed inputs are passed to the model.
3. The predictions of the model are post-processed, so you can make sense of them.

### Some of the currently available pipelines are:

1. feature-extraction (get the vector representation of a text)
2. fill-mask
3. ner (named entity recognition)
4. question-answering
5. sentiment-analysis
6. summarization
7. text-generation
8. translation
9. zero-shot-classification

We'll dive into a few of these, namely:
1. zero-shot-classification
2. text-generation
3. ner (named entity recognition)
4. fill-mask (Mask filling)
5. question-answering
6. summarization
7. translation

## Zero Shot Classification

This is useful for classifying texts that have not yet been labelled. This quite common in the real world since it's hard to assume that the LLM has been trained on all types of data, which have been successfully labelled and annotated by domain experts. After all, new things come up every day 😃.

For zero shot classification, you get to define the labels that are used during classification

In [5]:
zero_shot_classifier = pipeline("zero-shot-classification")

zero_shot_classifier("Today is a day where the American president has won the election and there are many people who are not happy about it", candidate_labels=["education", "politics", "sports", "business", "film"])

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


{'sequence': 'Today is a day where the American president has won the election and there are many people who are not happy about it',
 'labels': ['politics', 'business', 'film', 'education', 'sports'],
 'scores': [0.9157540798187256,
  0.02831583097577095,
  0.025808537378907204,
  0.015261565335094929,
  0.014859919436275959]}

## Text generation

In [7]:
generator = pipeline("text-generation")

print(generator("Tomorrow I will"))

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Tomorrow I will go out of the city, I will go to the monastery.\n\nI had come for that reason, I had come to the monastery because I knew that God told me to, for I was going to receive what He told me'}]


You can control how many different sequences/versions are generated with the argument `num_return_sequences` and the total length of the output text with the argument `max_length`.

In [8]:
generator("Tomorrow I will", max_length=100, num_return_sequences=6)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Tomorrow I will be at that."\n\nBoris added: "I just hope that we get a little bit of a breakthrough at the top for a while, so I don\'t think that it\'s going to drag us down in terms of being better, but you know I don\'t think we\'ll get too far ahead of ourselves. I think we\'ll have to find a way to put a little bit forward the more we can get some rest."\n\nArsenal have played less than 7,'},
 {'generated_text': 'Tomorrow I will tell you, you are probably just looking at the news too much.\n\nWhen it comes to getting out of prison, you are more than likely on some level a prison inmate. This includes the average citizen, and the prison population has grown over the past few decades. It is the reason in many cases that you are no longer the inmate at all.\n\nA person who has spent 20 or more years in an institution has a large portion of his life devoted to helping others'},
 {'generated_text': 'Tomorrow I will be here. I want to do this."'},
 {'generated_text

In [None]:
# Use a pipeline as a high-level helper
swahili_pipeline = pipeline("text-generation", model="Jacaranda/UlizaLlama3")

swahili_pipeline("Kesho nitaenda shule halafu")

Downloading shards:   0%|                                                                                                                                                                                                 | 0/4 [00:00<?, ?it/s]