# Cracking Open the Hugging Face Transformers Library

Code authored by: Shawhin Talebi <br>
Blog: https://medium.com/towards-data-science/cracking-open-the-hugging-face-transformers-library-350aa0ef0161

### import modules

In [1]:
from transformers import pipeline, Conversation
import gradio as gr

  from .autonotebook import tqdm as notebook_tqdm


### code

#### Sentiment Analysis

In [8]:
# toy example 1
pipeline(task="sentiment-analysis")("Love this!")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9998745918273926}]

In [9]:
# toy example 123 million parameters 
pipeline(task="sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")("Love this!")

[{'label': 'POSITIVE', 'score': 0.9998745918273926}]

#### More Sentiment Analysis

In [10]:
# defining classifier
classifier = pipeline(task="sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

In [11]:
classifier("Hate this.")

[{'label': 'NEGATIVE', 'score': 0.9997110962867737}]

In [14]:
# we can also pass in a list to classifier
text_list = ["This is great", \
             "Thanks for nothing", \
             "You've got to work on your face", \
             "You're beautiful, never change!", 
             "You are the most beautifal and stuipd i have met"]

classifier(text_list)

[{'label': 'POSITIVE', 'score': 0.9998785257339478},
 {'label': 'POSITIVE', 'score': 0.9680058360099792},
 {'label': 'NEGATIVE', 'score': 0.8776116371154785},
 {'label': 'POSITIVE', 'score': 0.9998120665550232},
 {'label': 'POSITIVE', 'score': 0.9955606460571289}]

In [15]:
# if there are multiple target labels, we can return them all
classifier = pipeline(task="text-classification", model="SamLowe/roberta-base-go_emotions", top_k=None)

config.json: 100%|██████████| 1.92k/1.92k [00:00<?, ?B/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
model.safetensors: 100%|██████████| 499M/499M [03:42<00:00, 2.24MB/s] 
tokenizer_config.json: 100%|██████████| 380/380 [00:00<?, ?B/s] 
vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.37MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 2.36MB/s]
tokenizer.json: 100%|██████████| 2.11M/2.11M [00:00<00:00, 2.66MB/s]
special_tokens_map.json: 100%|██████████| 280/280 [00:00<00:00, 28.0kB/s]


In [19]:
classifier(text_list[0])

[[{'label': 'admiration', 'score': 0.9526104927062988},
  {'label': 'approval', 'score': 0.030472073704004288},
  {'label': 'neutral', 'score': 0.015236252918839455},
  {'label': 'excitement', 'score': 0.00606377562507987},
  {'label': 'gratitude', 'score': 0.005296191666275263},
  {'label': 'joy', 'score': 0.004475211258977652},
  {'label': 'curiosity', 'score': 0.004322331864386797},
  {'label': 'realization', 'score': 0.004089601803570986},
  {'label': 'optimism', 'score': 0.004077218472957611},
  {'label': 'disapproval', 'score': 0.004076561890542507},
  {'label': 'annoyance', 'score': 0.0035287411883473396},
  {'label': 'surprise', 'score': 0.0029730682726949453},
  {'label': 'disappointment', 'score': 0.002734640846028924},
  {'label': 'love', 'score': 0.0026945816352963448},
  {'label': 'amusement', 'score': 0.0024867451284080744},
  {'label': 'confusion', 'score': 0.0023607397451996803},
  {'label': 'pride', 'score': 0.002101337304338813},
  {'label': 'sadness', 'score': 0.0017

#### Summarization

In [20]:
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

config.json: 100%|██████████| 1.58k/1.58k [00:00<?, ?B/s]
model.safetensors: 100%|██████████| 1.63G/1.63G [08:43<00:00, 3.10MB/s]
generation_config.json: 100%|██████████| 363/363 [00:00<?, ?B/s] 
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 1.71MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 3.26MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 3.94MB/s]


In [25]:
text = """
Hugging Face is an AI company that has become a major hub for open-source machine learning. 
Their platform has 3 major elements which allow users to access and share machine learning resources. 
First, is their rapidly growing repository of pre-trained open-source machine learning models for things such as natural language processing (NLP), computer vision, and more. 
Second, is their library of datasets for training machine learning models for almost any task. 
Third, and finally, is Spaces which is a collection of open-source ML apps.

The power of these resources is that they are community generated, which leverages all the benefits of open source i.e. cost-free, wide diversity of tools, high quality resources, and rapid pace of innovation. 
While these make building powerful ML projects more accessible than before, there is another key element of the Hugging Face ecosystem—their Transformers library.
"""
summarized_text = summarizer(text, min_length=5, max_length=120)[0]['summary_text']
summarized_text

'Hugging Face is an AI company that has become a major hub for open-source machine learning. They have 3 major elements which allow users to access and share machine learning resources.'

In [22]:
classifier(summarized_text)

[[{'label': 'neutral', 'score': 0.9101783633232117},
  {'label': 'approval', 'score': 0.08781369775533676},
  {'label': 'realization', 'score': 0.023256273940205574},
  {'label': 'annoyance', 'score': 0.006623789668083191},
  {'label': 'admiration', 'score': 0.0049810768105089664},
  {'label': 'disapproval', 'score': 0.0047301193699240685},
  {'label': 'optimism', 'score': 0.0033590742386877537},
  {'label': 'disappointment', 'score': 0.0026190048083662987},
  {'label': 'confusion', 'score': 0.001953981351107359},
  {'label': 'excitement', 'score': 0.001241705846041441},
  {'label': 'disgust', 'score': 0.0011407802812755108},
  {'label': 'joy', 'score': 0.0010540130315348506},
  {'label': 'amusement', 'score': 0.0009572383714839816},
  {'label': 'love', 'score': 0.0008871059399098158},
  {'label': 'desire', 'score': 0.0008553271181881428},
  {'label': 'curiosity', 'score': 0.000826106930617243},
  {'label': 'anger', 'score': 0.0007336385897360742},
  {'label': 'caring', 'score': 0.0006

#### Conversational

In [26]:
chatbot = pipeline(model="facebook/blenderbot-400M-distill")

config.json: 100%|██████████| 1.57k/1.57k [00:00<00:00, 1.55MB/s]
pytorch_model.bin: 100%|██████████| 730M/730M [04:16<00:00, 2.84MB/s] 
generation_config.json: 100%|██████████| 347/347 [00:00<?, ?B/s] 
tokenizer_config.json: 100%|██████████| 1.15k/1.15k [00:00<?, ?B/s]
vocab.json: 100%|██████████| 127k/127k [00:00<00:00, 664kB/s]
merges.txt: 100%|██████████| 62.9k/62.9k [00:00<00:00, 3.45MB/s]
added_tokens.json: 100%|██████████| 16.0/16.0 [00:00<00:00, 16.0kB/s]
special_tokens_map.json: 100%|██████████| 772/772 [00:00<?, ?B/s] 
tokenizer.json: 100%|██████████| 310k/310k [00:00<00:00, 1.69MB/s]


In [34]:
conversation = Conversation("who is the president of the united state?")
conversation = chatbot(conversation)

In [35]:
conversation

Conversation id: a70c9f83-07b0-439d-b4f1-30ab2647ee18
user: who is the president of the united state?
assistant:  The current president is Donald J. Trump. He is the 45th President of the United States.

In [36]:
conversation.add_user_input("Where do you work?")
conversation = chatbot(conversation)

In [37]:
conversation

Conversation id: a70c9f83-07b0-439d-b4f1-30ab2647ee18
user: who is the president of the united state?
assistant:  The current president is Donald J. Trump. He is the 45th President of the United States.
user: Where do you work?
assistant:  I don't work. I am a student. Donald Trump was born in New York City.

### Deploy Chatbot UI

#### Text Sentiment Chatbot

In [42]:
def top3_text_classes(message, history):
    return str(classifier(message)[0][:3]).replace('}, {', '\n').replace('[{', '').replace('}]', '')

demo_sentiment = gr.ChatInterface(top3_text_classes, title="Text Sentiment Chatbot", description="Enter your text, and the chatbot will classify the sentiment.")

demo_sentiment.launch(share = True)

Running on local URL:  http://127.0.0.1:7863

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.




#### Summarizer Chatbot

In [43]:
def summarizer_bot(message, history):
    return summarizer(message, min_length=5, max_length=140)[0]['summary_text']

demo_summarizer = gr.ChatInterface(summarizer_bot, title="Summarizer Chatbot", description="Enter your text, and the chatbot will return the summarized version.")

demo_summarizer.launch()

Running on local URL:  http://127.0.0.1:7864

To create a public link, set `share=True` in `launch()`.




#### Vanilla Chatbot

In [44]:
message_list = []
response_list = []

def vanilla_chatbot(message, history):
    conversation = Conversation(text=message, past_user_inputs=message_list, generated_responses=response_list)
    conversation = chatbot(conversation)

    return conversation.generated_responses[-1]

demo_chatbot = gr.ChatInterface(vanilla_chatbot, title="Vanilla Chatbo", description="Enter text to start chatting.")

demo_chatbot.launch()

Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.


