Text2Image

This project uses Generative Adverserial Networks (GAN's) to generate a 256x256 image given a text description. I chose to use a GAN over other generative models, such as Pixel RNN, Gated Pixel CNN, Variational Autoencoders, etc. because GAN's have been shown to generate more clear images, although they are much more difficult to train. Both the generator and the discriminator in GAN's are trained using the gradients of the discriminator, so were the discriminator's loss to reach a maximum/minimum quickly, training cannot take place properly. Instead, we want to reach the Nash Equilibrium, and in order to do this I adopted several extra measures, including using the LeakyReLU activation function and noisy labels for the loss function.

I used the Tensorflow, Keras, and Numpy libraries for Python to implement my model, and I trained it using the Caltech/UCSD Birds Dataset. My model can be generalized to generate images other than birds by instead using the MS COCO dataset, a larger and more extensive dataset.

Note: This repository does NOT include the dataset used to train the model, as it took an extraordinarily long time to push the data. Instead, I will leave a link to the dataset below.

Dataset: http://www.vision.caltech.edu/visipedia/CUB-200-2011.html

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README.md		README.md
encoder.py		encoder.py
glove.py		glove.py
main.py		main.py
model.py		model.py
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text2Image

About

Releases

Packages

Languages

vdopp234/Text2Image

Folders and files

Latest commit

History

Repository files navigation

Text2Image

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages