<a href="https://colab.research.google.com/github/shhuangmust/AI/blob/112-2/Pipeline_%E6%A1%83%E5%9C%92%E7%A8%8B%E5%BC%8F%E8%AA%B2%E7%A8%8B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 利用Huggingface Pipeline進行自然語言處理

目前可用的一些pipeline是：
- 文字摘要
- 文字生成
- 翻譯
- 零樣本分類
- 填充空缺
- ner（命名實體識別）
- 問答
- 情感分析


In [1]:
# 安裝Huggingface的transformer套件
!pip install transformers -U



## 安裝pipeline套件

In [23]:
# 安裝pipeline套件
from transformers import pipeline

## 情感分析
- 判斷語言是正向，還是負向

In [24]:
# sentiment-analysis為情感分析功能
classifier = pipeline("sentiment-analysis")
classifier("你敢出去玩試看看")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.8599118590354919}]

In [25]:
# 同時判斷多句
classifier(
    ["我覺得東西不好吃", "I love this cake"]
)

[{'label': 'NEGATIVE', 'score': 0.8526802659034729},
 {'label': 'POSITIVE', 'score': 0.9998714923858643}]

## 零樣本分類
- 對沒有訓練過的文字進行分類
- 可指定分類

In [26]:
# zero-shot-classification 為零樣本分類功能
classifier = pipeline("zero-shot-classification")
classifier(
    "你敢出去玩試看看",
    candidate_labels=["出門", "留在家", "再問一次"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': '你敢出去玩試看看',
 'labels': ['出門', '再問一次', '留在家'],
 'scores': [0.35582491755485535, 0.34951213002204895, 0.2946630120277405]}

## 文字生成
- 寫一段文字，接下來文字接龍

In [27]:
# text-generation為文字生成功能
generator = pipeline("text-generation")
generator("我等一下要去", num_return_sequences = 1, max_length = 60)

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': '我等一下要去景銀,才能灯對平语翻就译可以说明你和还种众�'}]

## 問答系統
- 給定上下文，然後問問題，讓系統回答

In [28]:
# question-answering為問答功能
question_answerer = pipeline("question-answering", model="ckiplab/bert-base-chinese-qa")
question_answerer(
    context="黃小明參加大甲鎮瀾宮繞境",
    question="黃小明參加甚麼活動?",
)

{'score': 0.0001248956541530788, 'start': 5, 'end': 10, 'answer': '大甲鎮瀾宮'}

## 摘要

In [29]:
# summarization為摘要功能
summarizer = pipeline("summarization", max_length=56)
summarizer(
    """
    Moby Dick is an 1851 novel by American writer Herman Melville.
    The book is the sailor Ishmael's narrative of the maniacal quest of Ahab,
    captain of the whaling ship Pequod, for vengeance against Moby Dick,
    the giant white sperm whale that bit off his leg on the ship's previous voyage.
    """
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': " Herman Melville's 1851 novel is the tale of the maniacal quest of Ahab, the captain of the whaling ship Pequod, for vengeance against Moby Dick . The book is the sailor Ishmael's narrative of the crazed quest of"}]

## 翻譯
- 提供類似中翻英的翻譯

In [30]:
# translation為翻譯功能
translator = pipeline("translation", model="DunnBC22/opus-mt-zh-en-Chinese_to_English")
translator("我參加明新科技大學的程式訓練課程")



[{'translation_text': '"I\'m in a program training course at the University of New Technologies.", \'zh\': \'\'}'}]