GenAI Image to Voice Description

Convert images into descriptive stories with voice narration using Generative AI models.

Overview

This project utilizes Generative AI models to convert images into descriptive stories with voice narration. It employs state-of-the-art natural language processing and computer vision techniques to generate compelling narratives based on uploaded images. The generated stories are then summarized and narrated in audio format, providing users with an immersive storytelling experience.

Features

Image-to-Text Conversion: Utilizes the blip model to extract text from uploaded images.
Story Generation: Generates short stories from the extracted text using GPT (Generative Pre-trained Transformer) models.
Text-to-Speech Conversion: Converts generated stories into audio format using the ESPnet text-to-speech model from HuggingFace.
Summarization: Summarizes the generated stories for brevity and clarity.
Streamlit Web App: Provides a user-friendly interface for uploading images and listening to generated stories.

Usage

Upload an image.
Wait for the model to process the image and generate a story.
Listen to the generated story in audio format.
Optionally, view the summarized version of the story.

Requirements

Python 3.6+
Streamlit
Transformers
LangChain
HuggingFace API
ESPnet

Installation

Clone the repository:

git clone https://github.com/tushark01/GenAI-Image-to-Voice-Description.git

Install dependencies:
```
pip install -r requirements.txt
```

How to Run

Run the following command:

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Audios		Audios
pics		pics
utils		utils
.env		.env
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI Image to Voice Description

Overview

Features

Usage

Requirements

Installation

How to Run

About

Releases

Packages

Languages

tushark01/GenAI-Image-to-Voice-Description

Folders and files

Latest commit

History

Repository files navigation

GenAI Image to Voice Description

Overview

Features

Usage

Requirements

Installation

How to Run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages