# Install Hugging Face Transformers library T5 Paraphrasing

<img src="https://frenzy86.s3.eu-west-2.amazonaws.com/python/hugging.png" width="1300">




HuggingFace is an AI and Deep Learning platform focused on NLP with the goal of democratizing AI technologies. They have streamlined and simplified applying and fine-tuning pre-trained language models.
Transformers is a open-source library with a model hub allowing users to implement state-of-the-art Deep Learning models based on general-purpose architectures like BERT, XLM, DistilBert, etc... It’s built on top of PyTorch , TensorFlow and Jax and is known to have good interoperability between the frameworks.

In [1]:
!pip install transformers --quiet
!pip install sentencepiece --quiet

[K     |████████████████████████████████| 3.3 MB 5.1 MB/s 
[K     |████████████████████████████████| 596 kB 62.3 MB/s 
[K     |████████████████████████████████| 895 kB 65.8 MB/s 
[K     |████████████████████████████████| 3.3 MB 42.7 MB/s 
[K     |████████████████████████████████| 61 kB 321 kB/s 
[K     |████████████████████████████████| 1.2 MB 5.4 MB/s 
[?25h

## Load the tokenizer and the model from Transformers

In [2]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Vamsi/T5_Paraphrase_Paws")
model = AutoModelForSeq2SeqLM.from_pretrained("Vamsi/T5_Paraphrase_Paws")

Downloading:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.74k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/850M [00:00<?, ?B/s]

In [3]:
def paraphrase(sentence):
    text =  "paraphrase: " + sentence + " </s>"

    encoding = tokenizer.encode_plus(text,padding=True, return_tensors="pt")
    input_ids, attention_masks = encoding["input_ids"], encoding["attention_mask"]


    outputs = model.generate(
        input_ids=input_ids, attention_mask=attention_masks,
        max_length=256,
        do_sample=True,
        top_k=120,
        top_p=0.95,
        early_stopping=False,
        num_return_sequences=1
    )

    print ("\n")
    print("Origianl sentence:")
    print(sentence)
    print ("\n")
    print("Paraphrasing:")

    for output in outputs:
        line = tokenizer.decode(output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
        print(line)

In [9]:
sentence_1 = "Washing your hands Properly will keep you away from COVID-19."
sentence_2 = "Wikipedia was launched on January 15, 2001, and was created by Jimmy Wales and Larry Sanger."
sentence_3 = "NLP is one of the interesting fields for Data Scientists to focus on."
sentence_4 = "Do I really need to take a flu shot if I’m healthy with few or no underlying conditions?"
sentence_5 = "Which course should I take to get started in data science?"
sentence_6 = "There will be 3 Walmart Black Friday events held in November starting on November 4, November 11 and November 25!"
sentence_7 = "The FCC says the $200 million civil penalty is the largest fixed-amount penalty in the commission's history."
sentence_8 = "Southwest Airlines travelers can now fly directly from San Diego to Honolulu on a new service that took off Wednesday out of the San Diego International Airport."
sentence_9 = "Gasoline production averaged 9.1 million bpd last week, slightly down on the previous week."
sentence_10 = "If you fall into the latter group, here’s how to replace Google’s new icons for Gmail, Calendar, and other apps with the older, arguably better versions on Android, iPhone, and Chrome."
sentence_11 = "Apple has been working on ARM-based Macs for some time, but only made them official at this year's WWDC."
sentence_12 = "Microsoft is investigating reports that some users are seeing error 0x80070426 when using their Microsoft account to sign into various apps."
sentence_13 = "On Saturday, Connery’s family announced that the Oscar-winning Scottish actor died peacefully in his sleep at home in the Bahamas."
sentence_14 = "Baby Shark Dance, from South Korean brand Pinkfong, officially surpassed the song by Luis Fonsi as the most viewed YouTube video of all time, having racked up 7.05 billion views to 7.04 billion."
sentence_15 = "The University of Washington has informed the NFL office that due to an increase in COVID-19 infection rate and indications of increased community spread in the local area, NFL personnel are no longer allowed to attend games at Husky Stadium."
sentence_16 = "The NBA's basketball-related income was down $1.5 billion last season, according to data provided to teams and obtained by ESPN."
sentence_17 = "Yesterday, the huge orbiting laboratory celebrated 20 years of continuous human occupation, a big milestone in humanity's push to extend its footprint into the final frontier."
sentence_18 = "A team of researchers led by Osaka University and National Taiwan University created a system of nanoscale silicon resonators that can act as logic gates for light pulses."
sentence_19 = "The research on 100 people shows that all had T-cell responses against a range of the coronavirus’s proteins, including the spike protein used as a marker in many vaccine studies, after half a year."
sentence_20 = "A group of researchers at MIT recently developed an artificial intelligence model that can detect asymptomatic COVID-19 cases by listening to subtle differences in coughs between healthy people and infected people."

In [11]:
#sentence_19
paraphrase(sentence_19)



Origianl sentence:
The research on 100 people shows that all had T-cell responses against a range of the coronavirus’s proteins, including the spike protein used as a marker in many vaccine studies, after half a year.


Paraphrasing:
 The research on 100 people shows that after a half year all had T-cell responses against a range of the coronavirus proteins, including spike protein used as a marker in many vaccine studies.
