## The ultimate guide to Hugging Face

### Pipeline


In [None]:
# Sentiment Analysis
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier(["I am suffering from Cancer", "My healthi is not good", "I am not saying I don't have Cancer"])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.9983137845993042},
 {'label': 'NEGATIVE', 'score': 0.9997236132621765},
 {'label': 'POSITIVE', 'score': 0.9991331696510315}]

In [None]:
# Text classification pipeline
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "I am suffering from cold and cough",
    candidate_labels=["Health", "Season", "Human", "Education", "Medical"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'I am suffering from cold and cough',
 'labels': ['Health', 'Human', 'Medical', 'Season', 'Education'],
 'scores': [0.3902086615562439,
  0.347473680973053,
  0.156494602560997,
  0.08563731610774994,
  0.020185789093375206]}

In [None]:
# Text generation pipeline
from transformers import Pipeline

generator = pipeline("text-generation")
generator("I am not feeling well")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'I am not feeling well. The first three times I have been going for about 15 minutes but have not been able to finish. There is no sleep.\n\nI am very weak in the left side of my knees. The ligaments in the'}]

In [None]:
# Distilgpt2
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "I am Badal",
    max_length=30,
    num_return_sequences=2,
)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'I am Badalaya (born 17 January 1951 in Londonderry, UK) and his father, S.N.R. (19'},
 {'generated_text': 'I am Badal, is in a relationship with you. When I am married on this date I had no sex with you. I am happy for'}]

In [None]:
# fill-mask
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("head-ache <mask> pain", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.14543795585632324,
  'token': 7050,
  'token_str': ' chest',
  'sequence': 'head-ache chest pain'},
 {'score': 0.10714221000671387,
  'token': 337,
  'token_str': 'al',
  'sequence': 'head-acheal pain'}]

In [None]:
# NER pipeline to identify entities such as persons, organizations, or locations in a sentence

from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("Hi, my name is Anushka, I have been suffering from back pain for nearly 2 days now. I am not able to sit, walk, I am in deep trouble. Kindly help me out")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.94391847,
  'word': 'Anushka',
  'start': 15,
  'end': 22}]

In [None]:
# Question-answering

from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question = "Where medical issues the person is having?",
    context = "I am suffering from severe back pain and abdominal pain"
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

{'score': 0.5107333660125732,
 'start': 20,
 'end': 55,
 'answer': 'severe back pain and abdominal pain'}

In [None]:
# Summarization

from transformers import pipeline

summarizer = pipeline("summarization")
summarizer("""
    Hi, my name is Sara, I am pursuing bacherlor's degree in Computer Science. From last three times, my period cycle has showing massive changes. My gap has changed abnormally, kindly help me out
""")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Your max_length is set to 142, but your input_length is only 51. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=25)


[{'summary_text': " Sara's period cycle has showing massive changes . From last three times, my period cycles have shown massive changes. My gap has changed abnormally, kindly help me out here . I am pursuing bacherlor's degree in Computer Science. Please help us out of this situation ."}]

In [None]:
# Translation

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est")

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



[{'translation_text': 'This course is'}]

### The carbon footprint of Transformers

Carbon emissions because of training Transformers
 - Fine tuning is always better than training from scratch
 - Doing a literature review to choose hyperparamtere ranges
 - Starting with smaller experiments and debugging
 - Random Search vs Grid Search

## Transfer Learning

The act of initializing a model from another model's weights. Example: IMAGENET, NLP
Usually, it is applied by throwing away the head of the pretrained model while keeping its body