ImageLingo

ImageLingo is an image captioning project that uses deep learning to generate captions for images. The project is built using PyTorch and includes training, evaluation, and deployment components.

Setup

Prerequisites

Python 3.8 or higher
PyTorch
Docker
DVC
MLflow

Installation

Clone the repository:

git clone https://github.com/Yuval728/imagelingo.git
cd imagelingo

Create a virtual environment and activate it:

python -m venv env
source env/bin/activate  # On Windows use `env\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```
Set up DVC:
```
dvc pull
```

Explanation and Approach

Overview

ImageLingo is designed to generate descriptive captions for images using a deep learning model. The project leverages a combination of Convolutional Neural Networks (CNNs) for image feature extraction and Recurrent Neural Networks (RNNs) for sequence generation.

Data Preparation

The dataset used for training the model is the Flickr8k dataset, which contains 8,000 images each with five different captions. The data preparation involves the following steps:

Image Preprocessing: Images are resized and normalized to ensure consistency.
Caption Tokenization: Captions are tokenized and converted into sequences of word indices.
Vocabulary Creation: A vocabulary is created based on the frequency of words in the captions.

Model Architecture

The model consists of two main components:

Encoder (CNN): A pre-trained CNN (such as ResNet) is used to extract features from the images. The final convolutional layer's output is used as the image representation.
Decoder (RNN): An RNN (such as LSTM) is used to generate captions based on the image features. The decoder is trained to predict the next word in the sequence given the previous words and the image features.

Training

The training process involves optimizing the model to minimize the difference between the generated captions and the actual captions. The following steps are performed:

Forward Pass: The image is passed through the encoder to obtain the image features. The decoder then generates a caption based on these features.
Loss Calculation: The loss is calculated based on the difference between the generated caption and the actual caption.
Backward Pass: The gradients are computed and the model parameters are updated to minimize the loss.

Evaluation

The model is evaluated using standard metrics such as BLEU score, which measures the similarity between the generated captions and the actual captions. The evaluation process involves:

Generating Captions: The model generates captions for the test images.
Calculating Metrics: The generated captions are compared with the actual captions using metrics like BLEU score.

Deployment

The trained model is deployed using Docker and TorchServe. The deployment process involves:

Creating Model Archive: The model is packaged into a model archive file (MAR) using Torch Model Archiver.
Building Docker Image: A Docker image is created with the necessary dependencies and the model archive.
Running Docker Container: The Docker container is run to serve the model and provide an API for generating captions.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.dvc		.dvc
docs		docs
src		src
test		test
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
paper.pdf		paper.pdf
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImageLingo

Setup

Prerequisites

Installation

Explanation and Approach

Overview

Data Preparation

Model Architecture

Training

Evaluation

Deployment

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ImageLingo

Setup

Prerequisites

Installation

Explanation and Approach

Overview

Data Preparation

Model Architecture

Training

Evaluation

Deployment

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages