CamEye

To build a model that can generate a descriptive caption for an image we provide it.

This model learns where to look. As you generate a caption, word by word, you can see the model's gaze shifting across the image. This is possible because of its Attention mechanism, which allows it to focus on the part of the image most relevant to the word it is going to utter next.

Concepts

Image captioning
Encoder-Decoder architecture. Typically, a model that generates sequences will use an Encoder to encode the input into a fixed form and a Decoder to decode it, word by word, into a sequence.
Attention. The use of Attention networks is widespread in deep learning, and with good reason. This is a way for a model to choose only those parts of the encoding that it thinks is relevant to the task at hand. The same mechanism you see employed here can be used in any model where the Encoder's output has multiple points in space or time. In image captioning, you consider some pixels more important than others. In sequence to sequence tasks like machine translation, you consider some words more important than others.
Transfer Learning. This is when you borrow from an existing model by using parts of it in a new model. This is almost always better than training a new model from scratch (i.e., knowing nothing). As you will see, you can always fine-tune this second-hand knowledge to the specific task at hand. Using pretrained word embeddings is a dumb but valid example. For our image captioning problem, we will use a pretrained Encoder, and then fine-tune it as needed.
Beam Search. This is where you don't let your Decoder be lazy and simply choose the words with the best score at each decode-step. Beam Search is useful for any language modeling problem because it finds the most optimal sequence.

Thanks

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CamEye

Concepts

About

Releases

Packages

Languages

vermavinay982/scene-desciption-pytorch

Folders and files

Latest commit

History

Repository files navigation

CamEye

Concepts

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages