In [2]:
!pip install torch



In [3]:
!pip install --upgrade torch



In [4]:
!pip install -q transformers

In [5]:
!pip install datasets evaluate transformers[sentencepiece]



In [6]:
!pip install sacremoses



Please **restart the kernel** after running the above installs.

### Importing Required Libraries

_We recommend you import all required libraries in one place (here):_

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from transformers import pipeline
from transformers import AutoTokenizer
from transformers import AutoModel

  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'


# **Let's Practice**


### Exercise 1 - Sentiment Analysis
For sentiment analysis, we can also use a specific model that is better suited to our use case by providing the name of the model. For example, if we want a sentiment analysis model for tweets, we can specify the following model id: "cardiffnlp/twitter-roberta-base-sentiment". This model has been trained on  ~58M tweets and fine-tuned for sentiment analysis with the "TweetEval" benchmark. 
The output labels for this model are: 0 -> Negative; 1 -> Neutral; 2 -> Positive.

In this Exercise, use "cardiffnlp/twitter-roberta-base-sentiment" model pre-trained on tweets data, to analyze any tweet of choice. Optionally, use the default model (used in Example 1) on the same tweet, to see if the result will change.


In [3]:
# TODO
classifier = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment")

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [4]:
# TODO
classifier("Taped up and dripped out. Know what I'm talking bout #paint #painting #diy #homeimprovement…")

[{'label': 'LABEL_1', 'score': 0.7315167784690857}]

In [5]:
specific_model = pipeline(model="cardiffnlp/twitter-roberta-base-sentiment")
data = "Taped up and dripped out. Know what I'm talking bout #paint #painting #diy #homeimprovement…"
specific_model(data)

[{'label': 'LABEL_1', 'score': 0.7315167784690857}]

### Exercise 2 - Topic Classification
In this Exercise, use any sentence of choice to classify it under any classes/ topics of choice. Use "zero-shot-classification" and specify the model="facebook/bart-large-mnli".


In [6]:
# TODO
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
classifier(
    "Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.",
    candidate_labels=["mobile", "website", "billing", "account access"],
)

{'sequence': 'Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.',
 'labels': ['mobile', 'account access', 'billing', 'website'],
 'scores': [0.9600788354873657,
  0.016832027584314346,
  0.014393891207873821,
  0.008695190772414207]}

### Exercise 3 - Text Generation Models

In this Exercise, use 'text-generator' and 'gpt2' model to complete any sentence. Define any desirable number of returned sentences.


In [9]:
# TODO
generator = pipeline("text-generation", model="gpt2")
generator(
    "Natural resources can be utilized in ways",
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Natural resources can be utilized in ways that help us make some money," says Ms. Jafari. "We want to build those opportunities into our'},
 {'generated_text': 'Natural resources can be utilized in ways that allow for a better local economy, while at the same time minimizing the potential for corruption and exploitation.\n\n'}]

### Exercise 4 - Name Entity Recognition
In this Exercise, use any sentence of choice to extract entities: person, location and organization, using Name Entity Recognition task, specify model as "Jean-Baptiste/camembert-ner".


In [7]:
# TODO
ner = pipeline("ner", model="Jean-Baptiste/camembert-ner", grouped_entities=True)
ner("In the morning at Dian Nuswantoro University, the excitement of the cultural degree event organized by the Cultural Student Association was starting to get busy. The event was officially opened by the dean of the cultural sciences faculty, Reno Wicaksono.")


[{'entity_group': 'LOC',
  'score': 0.705639,
  'word': 'Dian Nuswantoro University',
  'start': 17,
  'end': 44},
 {'entity_group': 'ORG',
  'score': 0.9890928,
  'word': 'Cultural Student Association',
  'start': 106,
  'end': 135},
 {'entity_group': 'PER',
  'score': 0.99900717,
  'word': 'Reno Wicaksono',
  'start': 239,
  'end': 254}]

In [8]:
del ner

### Exercise 5 - Question Answering
In this Exercise, use any sentence and a question of choice to extract some information, using "distilbert-base-cased-distilled-squad" model.


In [9]:
# TODO
qa_model = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
question = "Which prize did Frederick Buechner create?"
context = """
    The university is the major seat of the Congregation of Holy Cross (albeit not its official headquarters, which are in Rome). 
    Its main seminary, Moreau Seminary, is located on the campus across St. Joseph lake from the Main Building. 
    Old College, the oldest building on campus and located near the shore of St. Mary lake, houses undergraduate seminarians. 
    Retired priests and brothers reside in Fatima House (a former retreat center), Holy Cross House, as well as Columba Hall near the Grotto. 
    The university through the Moreau Seminary has ties to theologian Frederick Buechner. 
    While not Catholic, Buechner has praised writers from Notre Dame and Moreau Seminary created a Buechner Prize for Preaching.
"""
qa_model(question = question, context = context)

{'score': 0.7424728870391846,
 'start': 705,
 'end': 733,
 'answer': 'Buechner Prize for Preaching'}

### Exercise 6 - Text Summarization
In this Exercise, use any document/paragraph of choice and summarize it, using "sshleifer/distilbart-cnn-12-6" model.


In [10]:
# TODO
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")
summarizer(
    """
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris.
Its base is square, measuring 125 metres (410 ft) on each side.
During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930.
It was the first structure to reach a height of 300 metres.
Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft).
Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.

"""
)

Downloading tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': ' The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building . It was the first structure to reach a height of 300 metres . It is now taller than the Chrysler Building in New York City by 5.2 metres (17 ft)'}]

In [11]:
del summarizer

### Exercise 7 - Translation
In this Exercise, use any sentence of choice to translate English to German. The translation model you can use is "translation_en_to_de".


In [12]:
# TODO
en_de_translator = pipeline("translation_en_to_de", model="t5-small")
en_de_translator("I love eating fried chicken!")

Downloading config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

Downloading generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

[{'translation_text': 'Ich liebe es, gebratenes Hühner zu essen!'}]