AI Audio Generator

Transform documents into natural-sounding audio summaries using Azure OpenAI's GPT-4O and GPT-4O-Audio-Preview capabilities.

Features

📄 Support for PDF, TXT, DOC, and DOCX files
🔍 Multiple processing methods (Vision Only or Text + Vision)
🎯 Customizable processing goals
⏱️ Adjustable summary lengths
🌍 Multiple language support with flags
🗣️ Various voice options and tones
🎨 Modern, responsive UI with intuitive controls
💾 PostgreSQL database for reliable history storage
🔄 Rerun capability with modified settings
🗑️ Easy deletion of history entries
📱 Mobile-friendly design with glass morphism effects

History Management

The application now includes a robust history management system:

View all processed documents with timestamps
See processing settings for each entry
Rerun previous entries with new settings
Delete individual entries with confirmation
Clear visual feedback for selected entries
Automatic audio player updates when selecting entries

Prerequisites

Docker and Docker Compose (recommended)
Or for local development:
- Python 3.8 or higher
- Poppler (required for PDF processing)
- PostgreSQL 15 or higher
- Azure OpenAI API access
- Azure OpenAI API key and endpoint

Quick Start with Docker (Recommended)

Clone the repository:

git clone <repository-url>
cd <repository-directory>

Set up environment variables: Create a file named keys.env in the project root:

AZURE_OPENAI_ENDPOINT=your_endpoint_here
AZURE_OPENAI_API_KEY=your_api_key_here

Configure model settings: If needed, edit config.py in the project root with the names of your Azure OpenAI model deployments:

# Model Deployments
AZURE_MODELS = {
    'text': os.getenv('AZURE_OPENAI_TEXT_DEPLOYMENT', 'gpt-4o'),
    'audio': os.getenv('AZURE_OPENAI_AUDIO_DEPLOYMENT', 'gpt-4o-audio-preview'),
}

Start the application:

# Default setup (port 5001)
docker-compose up -d

# If port 5001 is in use, you can override it:
PORT=8080 docker-compose up -d

Access the application:
- Default: http://localhost:5001
- Or if you changed the port: http://localhost:YOUR_PORT
View logs:
```
docker-compose logs -f
```

Stop the application:

docker-compose down  # Keeps the database data
docker-compose down -v  # Removes all data (fresh start)

Local Development Setup

Install Poppler:
- macOS:
```
brew install poppler
```
- Ubuntu/Debian:
```
sudo apt-get install poppler-utils
```
- Windows:
  - Download from poppler-windows
  - Extract and add bin directory to PATH
Install PostgreSQL:
- PostgreSQL Downloads
- Create a database named aoai_audio

Create and activate virtual environment:

python -m venv venv

# Windows
.\venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

Configure environment: Create keys.env with your Azure OpenAI credentials and database URL:

AZURE_OPENAI_ENDPOINT=your_endpoint_here
AZURE_OPENAI_API_KEY=your_api_key_here
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/aoai_audio

Run the application:

python -m flask run --host=0.0.0.0 --port=5001

Troubleshooting

Port Already in Use

If port 5001 is already in use:

Docker setup:

# Use a different port (e.g., 8080)
PORT=8080 docker-compose up -d

Local setup:

# Use a different port (e.g., 8080)
python -m flask run --host=0.0.0.0 --port=8080

Database Issues

Ensure PostgreSQL is running
Check connection string in keys.env
For Docker: docker-compose logs db to check database logs

Best Practices

Environment Variables:
- Never commit keys.env (it's in .gitignore)
- Use strong passwords in production
Data Persistence:
- Database data persists in Docker volume postgres_data
- Backup regularly in production
Security:
- Change default PostgreSQL password in production
- Use SSL in production
- Keep dependencies updated
Monitoring:
- Check application logs: docker-compose logs -f web
- Check database logs: docker-compose logs -f db

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
database.py		database.py
docker-compose.yml		docker-compose.yml
history.py		history.py
image.png		image.png
models.py		models.py
requirements.txt		requirements.txt
tools.py		tools.py
wait-for-it.sh		wait-for-it.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Audio Generator

Features

History Management

Prerequisites

Quick Start with Docker (Recommended)

Local Development Setup

Troubleshooting

Port Already in Use

Database Issues

Best Practices

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

FWikand/AOAI-Audio-Generation

Folders and files

Latest commit

History

Repository files navigation

AI Audio Generator

Features

History Management

Prerequisites

Quick Start with Docker (Recommended)

Local Development Setup

Troubleshooting

Port Already in Use

Database Issues

Best Practices

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages