# Pipeline은 Transformer에서 가장 간단한 object이다.

Huggingface에서 pipeline function은 NLP task 수행을 위한 end to end object를 내놓는다.

## 1. Sentimental analysis

In [None]:
# !pip install transformers

In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9516069889068604}]

In [None]:
classifier("I've been waiting for a HuggingFcae course my whole life")

[{'label': 'NEGATIVE', 'score': 0.8496447205543518}]

의도적으로 오타를 한번 내봤는데 (Face -> Fcae) 갑자기 결과가 바뀌었다.

In [None]:
# 여러 텍스트를 여러개 줄수도 있다.
classifier(["I've been waiting for a HuggingFace course my whole life",
           "I've been waiting for a HuggingFcae course my whole life"])

[{'label': 'POSITIVE', 'score': 0.9516069889068604},
 {'label': 'NEGATIVE', 'score': 0.8496447205543518}]

## 2. Zero-shot classification pipeline

이건 우리가 원하는 label 목록을 받아서 우리의 문장이 어떤 label에 분류되는지 알려준다.

In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier("This is a course about the transformer library", candidate_labels = ["education", "politics", "business"])

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

{'sequence': 'This is a course about the transformer library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8367913961410522, 0.11911763995885849, 0.044090963900089264]}

# 3. Text generation

프롬프트를 주면 이어서 작성하는데, 랜덤성이 약간 가미되서 매 실행마다 결과가 달라질 수 있다.

In [None]:
from transformers import pipeline

classifier = pipeline("text-generation")
classifier("In this course, we will teach you how to")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to use the Windows 8 SDK to deploy Windows apps. This course is designed to be fun and effective for both developers and the general public, so please feel free to contact us at support@sc.com'}]

**지금까지 사용했던 모델은 각 task에 대한 deafult model이였고, 어떤 pre trained 혹은 fine tuned 모델이라도 해당 task에 대해 있으면 그 모델도 사용할 수 있다.**

1. huggingface.co/models에 가서 task별로 가능한 모델을 필터링 한다.
2. Text generation의 경우, 우리가 사용했던 default 모델은 gpt2 였다. 하지만 그 외에 더 많은 모델들도 사용할 수 있고, 영어 이외에 다른 언어에 대해서도 가능하다.

text generation에서 다른 모델을 선택해보자.
- 여기서는 distilgpt2를 사용한다. distilgpt2는 huggingface 팀에서 제작된 가벼운 버전의 gpt2이다.

In [None]:
from transformers import pipeline

classifier = pipeline("text-generation", model = "distilgpt2")
classifier("In this course, we will teach you how to", max_length=30, num_return_sequences=2)

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to create great products in this course, where all of the methods for building your product are well established.'},
 {'generated_text': 'In this course, we will teach you how to set up a self-help platform. You can find a detailed overview on how to use the app'}]