<a href="https://colab.research.google.com/github/vlordier/colabs/blob/main/T5_Large_Diverse_High_Quality_Paraphraser.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Author: **Ramsri Goutham Golla**  [Linkedin](https://www.linkedin.com/in/ramsrig/)   [Twitter](https://twitter.com/ramsri_goutham/)

In [None]:
!pip install transformers==4.10.2
!pip install sentencepiece==0.1.96

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("ramsrigouthamg/t5-large-paraphraser-diverse-high-quality")
tokenizer = AutoTokenizer.from_pretrained("ramsrigouthamg/t5-large-paraphraser-diverse-high-quality")

Downloading:   0%|          | 0.00/1.32k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.86k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

In [None]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print ("device ",device)
model = model.to(device)

device  cuda


In [None]:
# Beam Search

context = "Once, a group of frogs was roaming around the forest in search of water."
text = "paraphrase: "+context + " </s>"

encoding = tokenizer.encode_plus(text,max_length =128, padding=True, return_tensors="pt")
input_ids,attention_mask  = encoding["input_ids"].to(device), encoding["attention_mask"].to(device)

model.eval()
beam_outputs = model.generate(
    input_ids=input_ids,attention_mask=attention_mask,
    max_length=128,
    early_stopping=True,
    num_beams=15,
    num_return_sequences=3

)

print ("\n\n")
print ("Original: ",context)
for beam_output in beam_outputs:
    sent = tokenizer.decode(beam_output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
    print (sent)




Original:  Once, a group of frogs was roaming around the forest in search of water.
paraphrasedoutput: A herd of frogs was wandering around the woods in search of water.
paraphrasedoutput: A herd of frogs was wandering around the woods in search of water once more.
paraphrasedoutput: A gang of frogs was wandering around the woods in search of water once more.


In [None]:
# Diverse Beam search

context = "Once, a group of frogs was roaming around the forest in search of water."
text = "paraphrase: "+context + " </s>"

encoding = tokenizer.encode_plus(text,max_length =128, padding=True, return_tensors="pt")
input_ids,attention_mask  = encoding["input_ids"].to(device), encoding["attention_mask"].to(device)

model.eval()
diverse_beam_outputs = model.generate(
    input_ids=input_ids,attention_mask=attention_mask,
    max_length=128,
    early_stopping=True,
    num_beams=5,
    num_beam_groups = 5,
    num_return_sequences=5,
    diversity_penalty = 0.70

)

print ("\n\n")
print ("Original: ",context)
for beam_output in diverse_beam_outputs:
    sent = tokenizer.decode(beam_output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
    print (sent)

  "Passing `max_length` to BeamSearchScorer is deprecated and has no effect."





Original:  Once, a group of frogs was roaming around the forest in search of water.
paraphrasedoutput: A herd of frogs was wandering around the woods in search of water.
paraphrasedoutput: A herd of frogs was wandering around the woods in search of water.
paraphrasedoutput: A gang of frogs was wandering around the forest in search of water at one time.
paraphrasedoutput: A herd of frogs was swaning around the woods in search of water.
paraphrasedoutput: A gang of frogs was roaming about the woods in search of water once more.


In [None]:
# Diverse Beam search

context = "Four private astronauts launched to orbit by Elon Musk’s SpaceX returned to Earth Saturday evening, splashing down into the ocean off the east coast of Florida after a three-day mission."
text = "paraphrase: "+context + " </s>"

encoding = tokenizer.encode_plus(text,max_length =128, padding=True, return_tensors="pt")
input_ids,attention_mask  = encoding["input_ids"].to(device), encoding["attention_mask"].to(device)

model.eval()
diverse_beam_outputs = model.generate(
    input_ids=input_ids,attention_mask=attention_mask,
    max_length=128,
    early_stopping=True,
    num_beams=5,
    num_beam_groups = 5,
    num_return_sequences=5,
    diversity_penalty = 0.70

)

print ("\n\n")
print ("Original: ",context)
for beam_output in diverse_beam_outputs:
    sent = tokenizer.decode(beam_output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
    print (sent)

  "Passing `max_length` to BeamSearchScorer is deprecated and has no effect."





Original:  Four private astronauts launched to orbit by Elon Musk’s SpaceX returned to Earth Saturday evening, splashing down into the ocean off the east coast of Florida after a three-day mission.
paraphrasedoutput: After a three-day mission, four private astronauts sent by Elon Musk's SpaceX returned to Earth on Saturday evening, splashing down into the ocean off the east coast of Florida.
paraphrasedoutput: Four private astronauts sent by Elon Musk's SpaceX returned to Earth on Saturday evening, splashing down into the ocean off the east coast of Florida after a three-day mission.
paraphrasedoutput: After a three-day mission, four private astronauts sent by Elon Musk's SpaceX returned to Earth on Saturday evening, splashing down into the ocean off the east coast of Florida.
paraphrasedoutput: Four private astronauts sent by Elon Musk's SpaceX returned to Earth on Saturday evening, splashing down into the ocean off the east coast of Florida following a three-day mission.
paraphras