In [3]:
#!pip install transformers datasets accelerate

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Pipeline method lets us use any huggingface model with a few lines of code.

[Tutorial](https://huggingface.co/docs/transformers/v4.28.1/en/pipeline_tutorial)

[Details on API usage](https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/pipelines#transformers.pipeline)

[Details on NLP Task Pipelines](https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/pipelines#natural-language-processing)[link text](https://)


# Default pipeline

The pipeline() is the easiest and fastest way to use a pretrained model for inference. You can use the pipeline() out-of-the-box for many tasks across different modalities. [Details](https://huggingface.co/docs/transformers/v4.28.1/en/quicktour)


## Sentiment Classification

In [73]:
from transformers import pipeline
# The pipeline() downloads and caches a default pretrained model and tokenizer for sentiment analysis. Now you can use the classifier on your target text:
classifier = pipeline(task="sentiment-analysis", device_map="auto")
classifier("This movie is really good.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


NameError: ignored

## Summarization

In [62]:
from transformers import pipeline
summarizer = pipeline(task="summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [63]:
text = """
The Federal Reserve on Friday released its assessment of what led to Silicon Valley Bank's 
collapse, saying the lender's failure was due to a "textbook case of mismanagement" 
while taking some responsibility for insufficient supervision of the institution.
The report details the bank’s rapid growth, the challenges Fed supervisors faced in 
identifying SVB's vulnerabilities and their reluctance to force the bank to fix them. 
The review was led by Fed Vice Chair for Supervision Michael S. Barr, who wrote in a 
letter summarizing the report that SVB "failed because of a textbook case of 
mismanagement by the bank. Its senior leadership failed to manage basic 
interest rate and liquidity risk. Its board of directors failed to oversee senior 
leadership and hold them accountable."
The report said SVB's leadership did not fully appreciate the bank's vulnerabilities, 
pointing to foundational and widespread managerial weaknesses, its highly concentrated 
business model catering overwhelmingly to the venture capital community, and its reliance 
on uninsured deposits which left the bank "acutely exposed to rising interest rates" 
amid a slowdown in the tech sector.
"""

In [64]:
summarizer(text)

[{'summary_text': " Federal Reserve releases its assessment of what led to Silicon Valley Bank's collapse . Report details the bank’s rapid growth, the challenges Fed supervisors faced in identifying SVB's vulnerabilities and their reluctance to force the bank to fix them . Review was led by Fed Vice Chair for Supervision Michael S. Barr, who wrote in a letter summarizing the report ."}]

## Text Generation

In [60]:
from transformers import pipeline
txt_gen = pipeline(task="text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [65]:
txt_gen("The capital of India is")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'The capital of India is now the second-largest economy of the world and has become a major centre of global commerce. With the expansion of Indian technology and manufacturing, the growing presence of international businesses and a rising middle class, the country is poised to'}]

# Pipeline with specific models


## Sentiment Classification

In [66]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# find models for tasks at: https://huggingface.co/models
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [67]:
from transformers import pipeline
classifier = pipeline("sentiment-analysis", model=model_name, tokenizer=model_name)

## Example usage over entire dataset

In [68]:
# find datasets for tasks at: https://huggingface.co/datasets
from datasets import load_dataset
dataset = load_dataset("sst2", split="test")
text, labels = dataset["sentence"], dataset["label"]



In [69]:
sst2_results = classifier(text)

## Question-Answering

In [71]:
# Question answering pipeline, specifying the checkpoint identifier
oracle = pipeline(task="question-answering",\
                  model="distilbert-base-cased-distilled-squad",\
                  tokenizer="bert-base-cased")

In [4]:
oracle