Image-to-Text Captioning System

This project implements an AI-based Image-to-Text system that generates descriptive captions for images by integrating advanced computer vision and natural language processing techniques. The system utilizes Xception (a pre-trained CNN) for image feature extraction and LSTM for caption generation.

Features

Extracts high-level features from images using a pre-trained Xception model.
Generates accurate and contextually relevant captions with LSTM.
Supports BLEU score evaluation for caption quality.
Modular implementation for easy extensibility.

Dataset

This project uses the Flickr8k dataset, which includes:

8,000 images with 5 captions per image.

Dataset Structure:

Dataset/
├── Images/              # Folder containing image files
├── captions.txt         # Text file with image-caption mappings

Installation

Prerequisites

Python 3.7+
Pip package manager

Install Dependencies

Install the required Python packages:

pip install -r requirements.txt

Usage

1. Set Up the Dataset

Place the dataset under the Dataset/ folder as described in the dataset structure.
Ensure captions.txt contains the mappings of image filenames to their captions.

2. Train the Model

Run the script to preprocess the data, extract features, and train the model:

python Untitled-1.py

Default parameters:
- Epochs: 13
- Batch size: 32

3. Generate Captions

Use the generate_caption function in the script to predict captions for images:

lst, pred = generate_caption(model, "1096165011_cc5eb16aa6", image_directory, mapping, featuresx, tokenizer, max_length)

4. Evaluate the Model

Evaluate the performance using BLEU scores:

from nltk.translate.bleu_score import corpus_bleu
actual, predicted = list(), list()
bleu_score = corpus_bleu(actual, predicted)
print(f"BLEU Score: {bleu_score}")

Results

Example Output:

Generated Caption: "A boys is smiling underwater."

BLEU Score:

Achieved BLEU score: 0.066 .

Project Structure

.
├── Untitled-1.py          # Main script for training and testing
├── Dataset/               # Contains images and captions.txt
├── requirements.txt       # Python dependencies
├── README.md              # Project documentation

Limitations

Subjective Captions: BLEU scores can be low due to the subjective nature of captions.
Complex Scenes: Model struggles with images containing multiple objects or intricate details.

Future Enhancements

Advanced Architectures:
- Experiment with Vision Transformers (ViT) or GPT-based models for improved caption generation.
Larger Datasets:
- Incorporate datasets like COCO or Visual Genome for better generalization.
Multilingual Captioning:
- Extend functionality to support captions in multiple languages.

Acknowledgements

Dataset: Flickr8k Dataset
Xception Model: Keras Applications
References:
- Understanding BLEU Score
- Understanding LSTMs

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
NLP_report.pdf		NLP_report.pdf
README.md		README.md
caption_generation.ipynb		caption_generation.ipynb
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-to-Text Captioning System

Features

Dataset

Dataset Structure:

Installation

Prerequisites

Install Dependencies

Usage

1. Set Up the Dataset

2. Train the Model

3. Generate Captions

4. Evaluate the Model

Results

Example Output:

BLEU Score:

Project Structure

Limitations

Future Enhancements

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Image-to-Text Captioning System

Features

Dataset

Dataset Structure:

Installation

Prerequisites

Install Dependencies

Usage

1. Set Up the Dataset

2. Train the Model

3. Generate Captions

4. Evaluate the Model

Results

Example Output:

BLEU Score:

Project Structure

Limitations

Future Enhancements

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages