Convert images into descriptive stories with voice narration using Generative AI models.
This project utilizes Generative AI models to convert images into descriptive stories with voice narration. It employs state-of-the-art natural language processing and computer vision techniques to generate compelling narratives based on uploaded images. The generated stories are then summarized and narrated in audio format, providing users with an immersive storytelling experience.
- Image-to-Text Conversion: Utilizes the blip model to extract text from uploaded images.
- Story Generation: Generates short stories from the extracted text using GPT (Generative Pre-trained Transformer) models.
- Text-to-Speech Conversion: Converts generated stories into audio format using the ESPnet text-to-speech model from HuggingFace.
- Summarization: Summarizes the generated stories for brevity and clarity.
- Streamlit Web App: Provides a user-friendly interface for uploading images and listening to generated stories.
- Upload an image.
- Wait for the model to process the image and generate a story.
- Listen to the generated story in audio format.
- Optionally, view the summarized version of the story.
- Python 3.6+
- Streamlit
- Transformers
- LangChain
- HuggingFace API
- ESPnet
-
Clone the repository:
git clone https://github.com/tushark01/GenAI-Image-to-Voice-Description.git
-
Install dependencies:
pip install -r requirements.txt
Run the following command:
streamlit run app.py