# 使用管道工具（pipeline）
HuggingFace 有一个模型库，里面有很多已经训练好的模型可以直接使用，很多模型可以不进行任何训练直接得出较好的预测结果。zero shot learning  
使用管道工具时，使用者只需告诉管道工具需要做的任务类型，管道工具就会自动分配合适的模型，直接给出预测结果。

一些nlp任务
Conversational  
FillMask  
Ner  
QuestionAnswering  
Summarization  
TableQuestionAnswering  
TextClassification  
TextGeneration  
Text2TextGeneration  
TokenClassification  
Translation  
ZeroShotClassification  


In [None]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m11.5 MB/s[0m eta [36m0:00:0

## 1. 文本分类

In [None]:
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
result = classifier("I hate you")
print(result)
result = classifier("wonderful")
print(result)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


[{'label': 'NEGATIVE', 'score': 0.9991129040718079}]
[{'label': 'POSITIVE', 'score': 0.9998772144317627}]


## 2.阅读理解

In [None]:
from transformers import pipeline
question_answer = pipeline("question-answering")
context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. A example of a question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune a model on a SQuAD task, you may leverage the examples/PyTorch/question-answering/run_squad.py script.
"""
result = question_answer(
    question = "what is extractive question answering?",
    context = context,
)
print(result)
result = question_answer(
    question = "What is a good example of a question answering dataset?",
    context = context,
)
print(result)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

{'score': 0.6204072833061218, 'start': 34, 'end': 95, 'answer': 'the task of extracting an answer from a text given a question'}
{'score': 0.5356913805007935, 'start': 146, 'end': 159, 'answer': 'SQuAD dataset'}


## 3. 完形填空

In [None]:
from transformers import pipeline
unmasker = pipeline("fill-mask")
from pprint import pprint
sentence = "HuggingFace is creating a <mask> that the community uses to solve NLP task."
unmasker(sentence)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

[{'score': 0.22808292508125305,
  'token': 3944,
  'token_str': ' tool',
  'sequence': 'HuggingFace is creating a tool that the community uses to solve NLP task.'},
 {'score': 0.09079990535974503,
  'token': 7208,
  'token_str': ' framework',
  'sequence': 'HuggingFace is creating a framework that the community uses to solve NLP task.'},
 {'score': 0.0406670980155468,
  'token': 17715,
  'token_str': ' prototype',
  'sequence': 'HuggingFace is creating a prototype that the community uses to solve NLP task.'},
 {'score': 0.03157924488186836,
  'token': 5560,
  'token_str': ' library',
  'sequence': 'HuggingFace is creating a library that the community uses to solve NLP task.'},
 {'score': 0.02486600913107395,
  'token': 27663,
  'token_str': ' template',
  'sequence': 'HuggingFace is creating a template that the community uses to solve NLP task.'}]

## 4. 文本生成

In [None]:
from transformers import pipeline
text_generator = pipeline("text-generation")
text_generator("As far as I am concerned, I will",
      max_length = 100,
      do_sample = False,
               )

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea of a free market is a bit of a stretch. I think that the idea of a free market is a bit of a stretch. I think that the idea of a free market is a bit of a stretch. I think that the idea of a'}]

## 5. 命名实体识别
找出一段文字中的人名、地名、组织机构、电话等

In [None]:
from transformers import pipeline
ner_pipe = pipeline("ner")
sequence = '''
  Hugging Face Inc. is a compy based in New York City. Its headquarters are in DUMBO, therefore very close to the Manhattan Bridge which is visible from the window.
'''
for entity in ner_pipe(sequence):
  print(entity)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

{'entity': 'I-ORG', 'score': 0.99942017, 'index': 1, 'word': 'Hu', 'start': 3, 'end': 5}
{'entity': 'I-ORG', 'score': 0.98855025, 'index': 2, 'word': '##gging', 'start': 5, 'end': 10}
{'entity': 'I-ORG', 'score': 0.99820006, 'index': 3, 'word': 'Face', 'start': 11, 'end': 15}
{'entity': 'I-ORG', 'score': 0.99952805, 'index': 4, 'word': 'Inc', 'start': 16, 'end': 19}
{'entity': 'I-LOC', 'score': 0.99940395, 'index': 12, 'word': 'New', 'start': 41, 'end': 44}
{'entity': 'I-LOC', 'score': 0.9993794, 'index': 13, 'word': 'York', 'start': 45, 'end': 49}
{'entity': 'I-LOC', 'score': 0.9993648, 'index': 14, 'word': 'City', 'start': 50, 'end': 54}
{'entity': 'I-LOC', 'score': 0.986231, 'index': 20, 'word': 'D', 'start': 80, 'end': 81}
{'entity': 'I-LOC', 'score': 0.94759804, 'index': 21, 'word': '##UM', 'start': 81, 'end': 83}
{'entity': 'I-LOC', 'score': 0.9357034, 'index': 22, 'word': '##BO', 'start': 83, 'end': 85}
{'entity': 'I-LOC', 'score': 0.9844134, 'index': 29, 'word': 'Manhattan', 's

## 6. 文本摘要

In [None]:
from transformers import pipeline
summarizer = pipeline("summarization")
articale = """
When kids return to school this month, all Berkeley students will have the option to eat school meals for free, regardless of family income.
It’s the third year that Berkeley will offer free meals to all its students. The program started in the spring of 2020, when the school district began offering grab-and-go meals for students learning remotely.
Then in 2021, as students returned to the classroom, California became the first state to offer universal free meals. Borne out of a state budget surplus, the program has stuck around. Advocates say universal meals help reduce the stigma of free lunch and reach families in need who did not qualify under federal guidelines.
Last year, a quarter of students in Berkeley schools qualified for free or reduced lunch, though the number varies by school.
In 2010, Berkeley made school meals free for students who qualify for reduced-price lunch, concerned that families could not always afford it and citing reports that students are less likely to purchase the reduced-price meals toward the end of the month.
Families do not need to fill out an application for children to get free school meals, though filling out an application may qualify them for other benefits.
This summer, all children under 18 can pick up free meals at Berkeley Arts Magnet, Rosa Parks Elementary and Oxford Elementary. They do not need to be a BUSD student to be eligible. The program ends Aug. 11.
"""
summarizer(articale, max_length = 130, min_length = 20, do_sample=False)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': " Berkeley students will be able to eat school meals for free this summer . It's the third year free meals have been offered to all students . Last year, a quarter of students qualified for free meals ."}]

## 7. 翻译
英文翻译中文模型不能通过 pipeline 直接调用

In [None]:
from transformers import pipeline
translator = pipeline("translation_en_to_fr")
sentence = "When kids return to school this month, all Berkeley students will have the option to eat school meals for free, regardless of family income."
translator(sentence, max_length=200)

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


[{'translation_text': "Lorsque les enfants retourneront à l'école ce mois-ci, tous les élèves de Berkeley auront la possibilité de manger gratuitement à l'école, quel que soit leur revenu familial."}]

## 替换模型
中译英任务

In [3]:
# !pip install SentencePiece accelerate sacremoses
# !pip install transformers
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-zh-en")

translator = pipeline(task = "translation_zh_to_en",
            model = model,
            tokenizer = tokenizer,
                      )
sentence = "我叫李航，我住在拉萨"
translator(sentence, max_length = 20)



[{'translation_text': 'My name is Li Sheng. I live in Lassard.'}]

英译汉任务

In [4]:
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-zh")
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh")

translator = pipeline(task = "translation_zh_to_en",
            model = model,
            tokenizer = tokenizer,
                      )
sentence = "My name is Li Sheng. I live in Lassard."
translator(sentence, max_length = 20)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/312M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

Downloading (…)olve/main/source.spm:   0%|          | 0.00/806k [00:00<?, ?B/s]

Downloading (…)olve/main/target.spm:   0%|          | 0.00/805k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.62M [00:00<?, ?B/s]

[{'translation_text': '我叫李胜,住在拉萨'}]