Paraphrase Generator with T5

A Paraphrase-Generator built using transformers which takes an English sentence as an input and produces a set of paraphrased sentences. This is an NLP task of conditional text-generation. The model used here is the T5ForConditionalGeneration from the huggingface transformers library. This model is trained on the Google's PAWS Dataset and the model is saved in the transformer model hub of hugging face library under the name Vamsi/T5_Paraphrase_Paws.

List of publications using Paraphrase-Generator (please open a pull request to add missing entries):

DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models

Sports Narrative Enhancement with Natural Language Generation

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records

Wissensgenerierung für deutschprachige Chatbots

Causal Document-Grounded Dialogue Pre-training

Creativity Evaluation Method for Procedural Content Generated Game Items via Machine Learning

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Streamlit library
Huggingface transformers library
Pytorch
Tensorflow

Installing

Streamlit

$ pip install streamlit

Huggingface transformers library

$ pip install transformers

Tensorflow

$ pip install --upgrade tensorflow

Pytorch

Head to the docs and install a compatible version
https://pytorch.org/

Running the web app

Clone the repository

$ git clone [repolink]

Running streamlit app

$ cd Streamlit

$ streamlit run paraphrase.py

Running the flask app

$ cd Server

$ python server.py

The initial server call will take some time as it downloads the model parameters. The later calls will be relatively faster as it will store the model params in the cache.

General Usage

PyTorch and TF models are available

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Vamsi/T5_Paraphrase_Paws")  
model = AutoModelForSeq2SeqLM.from_pretrained("Vamsi/T5_Paraphrase_Paws")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

sentence = "This is something which i cannot understand at all"

text =  "paraphrase: " + sentence + " </s>"

encoding = tokenizer.encode_plus(text,pad_to_max_length=True, return_tensors="pt")

input_ids, attention_masks = encoding["input_ids"].to(device), encoding["attention_mask"].to(device)

outputs = model.generate(
    input_ids=input_ids, attention_mask=attention_masks,
    max_length=256,
    do_sample=True,
    top_k=200,
    top_p=0.95,
    early_stopping=True,
    num_return_sequences=5
)

for output in outputs:
    line = tokenizer.decode(output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
    print(line)

Dockerfile

The repository also contains a minimal reproducible Dockerfile that can be used to spin up a server with the API endpoints to perform text paraphrasing.

Note: The Dockerfile uses the built-in Flask development server, hence it's not recommended for production usage. It should be replaced with a production-ready WSGI server.

After cloning the repository, starting the local server it's a two lines script:

docker build -t paraphrase .
docker run -p 5000:5000 paraphrase

and then the API is available on localhost:5000

curl -XPOST localhost:5000/run_forward \
-H 'content-type: application/json' \
-d '{"sentence": "What is the best paraphrase of a long sentence that does not say much?", "decoding_params": {"tokenizer": "", "max_len": 512, "strategy": "", "top_k": 168, "top_p": 0.95, "return_sen_num": 3}}'

Built With

Streamlit - Fastest way for building data apps
Flask - Backend framework
Transformers-Huggingface - On a mission to solve NLP, one commit at a time. Transformers Library.

Authors

Sai Vamsi Alisetti

Citing

@misc{alisetti2021paraphrase,
  title={Paraphrase generator with t5},
  author={Alisetti, Sai Vamsi},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
Colab-Notebooks		Colab-Notebooks
Images		Images
Server		Server
Streamlit		Streamlit
Swagger-Documentation		Swagger-Documentation
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CustomDataset.py		CustomDataset.py
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md
evaluate.py		evaluate.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paraphrase Generator with T5

Getting Started

Prerequisites

Installing

Running the web app

General Usage

Dockerfile

Built With

Authors

Citing

About

Releases 1

Packages

Contributors 6

Languages

License

Vamsi995/Paraphrase-Generator

Folders and files

Latest commit

History

Repository files navigation

Paraphrase Generator with T5

Getting Started

Prerequisites

Installing

Running the web app

General Usage

Dockerfile

Built With

Authors

Citing

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 6

Languages

Packages