text2video GPT

The model structure info refer to

A PyTorch implementation of text 2 video based on transformers. In order to generate videos, we need encoder and decoder to handle frames in the same hidden space as text. In addition, the transformer will handle the temporal & sequential relationship between frames.

Usage

Here's how you'd instantiate a GPT-2 (124M param version):

from VideoVAEGPT import VideoVAEGPT as VideoGPT
from VideoData import loadData, getlabels
from TextVideoDataset import TextVideoDataset
config.model.vocab_size = tokenizer.vocab_size
config.model.block_size = 1024
config.model.classes = len(name2label)
model = VideoGPT(config.model)

And here's how you'd train it:

python train_text2video.py

A simple version without encoder/decoder

python train_videogpt.py

Dataset

And the dataset is the (text, video) pair, which are from UCF101 dataset, email to info@vividitytech.com for download

Samples

prompt = "a girl with white clothes is doing floor gymnastics exercise from right to left"

Library Dependences

pytorch

pip install pytorch

minGPT

If you want to import mingpt into your project:

git clone https://github.com/karpathy/minGPT.git
cd minGPT
pip install -e .

References

Code:

minGPT
openai/image-gpt classification part
huggingface/transformers
imagen-pytorch for the Unet module

Papers + some implementation notes:

A simple text to video model via transformer

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
TextVideoDataset.py		TextVideoDataset.py
Unet.py		Unet.py
VideoData.py		VideoData.py
VideoGPT.py		VideoGPT.py
VideoUnetGPT.py		VideoUnetGPT.py
VideoVAEGPT.py		VideoVAEGPT.py
results3.avi		results3.avi
test_text2video.py		test_text2video.py
test_videogpt.py		test_videogpt.py
text2video.pdf		text2video.pdf
text2video_paper.pdf		text2video_paper.pdf
train_text2video.py		train_text2video.py
train_videogpt.py		train_videogpt.py

vividitytech/text2videoGPT

Folders and files

Latest commit

History

Repository files navigation

text2video GPT

Usage

Dataset

Samples

Library Dependences

References

License

About

Resources

Stars

Watchers

Forks

Languages