Skip to content

A web app that converts audio to text and enhances transcription with Retrieval-Augmented Generation (RAG). Upload audio, get accurate transcriptions with contextual enrichment using external knowledge sources

Notifications You must be signed in to change notification settings

omkartidke42/Audio-text-rag-app

Repository files navigation

RAG Audio Q&A Application

This is a Retrieval-Augmented Generation (RAG) application that allows users to upload an audio file, ask a question, and receive a text-based answer. The system uses Whisper for transcribing audio, SentenceTransformers for creating vector embeddings, and FAISS for fast retrieval of relevant text chunks. The answer is then generated using GPT.

Make sure you have the following installed:

Python 3.7+ (Recommended version: 3.8 or 3.9)

Git (for version control)

FFmpeg (required for audio processing)

FFmpeg Installation: Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

macOS (using Homebrew):

brew install ffmpeg

Windows: Download FFmpeg from ffmpeg.org and follow installation instructions.

⚙️ Setup

  1. Clone the Repository First, clone the repository to your local machine:
git clone https://github.com/yourusername/RAG-Audio-QA.git
cd RAG-Audio-QA
  1. Create a Virtual Environment Create a virtual environment to manage dependencies:
python3 -m venv venv

Activate the virtual environment:

Linux/macOS:

source venv/bin/activate

Windows:

venv\Scripts\activate
  1. Install Dependencies Install the required Python libraries:
pip install -r requirements.txt

🚀 Running the Application

  1. Set Flask App Environment Variable

Set the FLASK_APP environment variable to your main Flask app:

Linux/macOS:

export FLASK_APP=app.py

Windows:

set FLASK_APP=app.py
  1. Run the Application

Start the Flask development server:

flask run

This should output something like:

 * Running on http://127.0.0.1:5000
  1. Open in Browser Open your web browser and go to http://127.0.0.1:5000 to interact with the application.

  2. Upload Audio & Ask a Question Upload an audio file (e.g., .mp3, .wav).

Type a question related to the content of the audio.

Click Submit, and the app will return an answer based on the transcribed text.

📂 Folder Structure Here's an overview of the folder structure:

RAG-Audio-QA/
├── .gitignore                  # Git ignore file
├── app.py                      # Main Flask app entry point
├── requirements.txt            # All Python dependencies
├── audio_processing/           # Audio transcription related files
│   ├── transcriber.py
├── text_processing/            # Text chunking and embedding
│   ├── chunker.py
│   ├── embedder.py
├── retrieval/                  # FAISS for efficient retrieval
│   ├── retriever.py
├── generation/                 # Answer generation (using GPT)
│   ├── qa_generator.py
├── static/                     # Static files (CSS, JS, images)
│   ├── style.css
├── templates/                  # HTML files for the frontend
│   ├── index.html
├── venv/                       # Virtual environment (ignored by Git)
└── README.md                   # Documentation (this file)

Results :

Screenshot from 2025-04-23 11-32-22

🧑‍💻 Git Ignore The .gitignore file will ignore unnecessary files like:

Virtual environments (venv/)

Large model files (*.pt, *.h5, etc.)

System-specific files (.DS_Store, Thumbs.db)

Logs, temporary files, and more.

Here's a quick view of what the .gitignore excludes:

# Python
*.pyc
*.pyo
*.pyd
__pycache__/
venv/
*.egg-info/

# Flask
instance/
*.db
*.sqlite

# Virtual Environment
venv/
env/
ENV/
*.env

# Model files (large files)
*.h5
*.bin
*.pt
*.ckpt
*.model

# Temporary files
*.log
*.tmp
*.swp

# System-specific files
.DS_Store
Thumbs.db

About

A web app that converts audio to text and enhances transcription with Retrieval-Augmented Generation (RAG). Upload audio, get accurate transcriptions with contextual enrichment using external knowledge sources

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published