Skip to content

vividitytech/text2videoGPT

Repository files navigation

text2video GPT

The model structure info refer to text2video

A PyTorch implementation of text 2 video based on transformers. In order to generate videos, we need encoder and decoder to handle frames in the same hidden space as text. In addition, the transformer will handle the temporal & sequential relationship between frames.

Usage

Here's how you'd instantiate a GPT-2 (124M param version):

from VideoVAEGPT import VideoVAEGPT as VideoGPT
from VideoData import loadData, getlabels
from TextVideoDataset import TextVideoDataset
config.model.vocab_size = tokenizer.vocab_size
config.model.block_size = 1024
config.model.classes = len(name2label)
model = VideoGPT(config.model)

And here's how you'd train it:

python train_text2video.py

A simple version without encoder/decoder

python train_videogpt.py

Dataset

And the dataset is the (text, video) pair, which are from UCF101 dataset, email to info@vividitytech.com for download

Samples

prompt = "a girl with white clothes is doing floor gymnastics exercise from right to left" result video

Library Dependences

pytorch

pip install pytorch

minGPT

If you want to import mingpt into your project:

git clone https://github.com/karpathy/minGPT.git
cd minGPT
pip install -e .

References

Code:

Papers + some implementation notes:

License

MIT

About

a simple text to video GPT based on transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages