HuggingTweets - Train a model to generate tweets

Create in 5 minutes a tweet generator based on your favorite Tweeter

I developed HuggingTweets to try to predict Elon Musk's next breakthrough ;)

Try the demo →

Introduction

This project fine-tunes a pre-trained transformer on a user's tweets using HuggingFace, an awesome open source library for Natural Language Processing.

Training and results are automatically logged on W&B through the HuggingFace integration.

Usage

If you just want to test the demo, click on below link and share your predictions on Twitter with #huggingtweets!

To understand how the model works, check huggingtweets-dev.ipynb or use the following link.

Results

My favorite sample is definitely on Andrej Karpathy, start of sentence "I don't like":

I don't like this :) 9:20am: Forget this little low code and preprocessor optimization. Even if it's neat, for top-level projects. 9:27am: Other useful code examples? It's not kind of best code, :) 9:37am: Python drawing bug like crazy, restarts regular web browsing ;) 9:46am: Okay, I don't mind. Maybe I should try that out! I'll investigate it :) 10:00am: I think I should try Shigemitsu's imgur page. Or the minimalist website if you're after 10/10 results :) Also maybe Google ImageNet on "Yelp" instead :) 10:05am: Looking forward to watching it talk!

I had a lot of fun running predictions on other people too!

Explore the live report →

Future research

Lot more interesting research to do:

test training top layers vs bottom layers to see how it affects learning of lexical field (subject of content) vs word predictions, memorization vs creativity ;
data pre-processing can be optimized (padding, end tokens, definition of one sample…) ;
augment text data with adversarial approaches ;
test more models and do some fine-tuning ;
pre-train on large Twitter dataset of many people ;
explore few-shot learning approaches as we have limited data per user though there are probably only few writing styles ;
implement a pipeline to continuously train the network on new tweets ;
cluster users and identify topics, writing style…

About

Built by Boris Dayma

My main goals with this project are:

to experiment with how to train, deploy and maintain neural networks in production ;
to make AI accessible to everyone.

To see how the model works, visit the project repository.

Disclaimer: this project is not to be used to publish any false generated information but to perform research on Natural Language Generation.

FAQ

Does this project pose a risk of being used for disinformation?

Large NLP models can be misused to publish false data. OpenAI performed a staged release of GPT-2 to study any potential misuse of their models.

I want to ensure latest AI technologies are accessible to everyone to ensure fairness and prevent social inequality.

HuggingTweets shall not be used for creating innapropriate content, nor for any illicit or unethical purposes. Any generated text from other users tweets must explicitly be referenced as such and cannot be published with the intent of hiding their origin. No generated content can be published against a person unwilling to have their data used as such.
Why is the demo in colab instead of being a real independent web app?

It actually looks much better with Voilà as the code cells are hidden and automatically executed. Also we can easily deploy it through for free on Binder.

However training such large neural networks requires GPU (not available on Binder, and not cheap) and I wanted to make HuggingTweets accessible to everybody. Google Colab generously offers free GPU so is the perfect place to host the demo.

Resources

A Step by Step Guide to Tracking Hugging Face Model Performance
W&B Forum: If you have any questions, reach out to the slack community

Acknowledgements

I was able to make the first version of this program in just a few days.

It would not have been possible without these people and these open-source tools:

W&B for the great tracking & visualization tools for ML experiments ;
Huggingface for providing a great framework for Natural Language Understanding ;
Tweepy for providing a great API to interact with Twitter (used in the dev notebook) ;
Chris Van Pelt for hacking with me on the demo ;
Lavanya Shukla for her great continuous feedback on the demo ;
Google Colab for letting people access free GPU!

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
Pipfile		Pipfile
README.md		README.md
huggingtweets-demo.ipynb		huggingtweets-demo.ipynb
huggingtweets-dev.ipynb		huggingtweets-dev.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HuggingTweets - Train a model to generate tweets

Try the demo →

Introduction

Usage

Results

Explore the live report →

Future research

About

FAQ

Resources

Acknowledgements

About

Releases

Packages

Languages

unhkd-fez/huggingtweets

Folders and files

Latest commit

History

Repository files navigation

HuggingTweets - Train a model to generate tweets

Try the demo →

Introduction

Usage

Results

Explore the live report →

Future research

About

FAQ

Resources

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages