Tone Twist 🎙✨

Craft your voice into unique characters with AI magic.

Overview

Tone Twist is a fun and innovative AI-powered application where users can upload or record their voice, select a character persona, and transform their speech into a completely new voice style.

Under the hood, the app combines cutting-edge technologies:

Google Cloud Speech-to-Text for accurate transcription
Gemini Flash 2.0 (Google AI) for enhancing the text according to the chosen character's speaking style
ElevenLabs AI for ultra-realistic, expressive voice synthesis based on the persona

Features

🎙 Upload or record your voice easily from the browser.
🎭 Choose your character (pirate, anime villain, robot assistant, etc.)
✨ Transform your voice through 3 stages: Transcription → Text Enhancement → Voice Synthesis
🔥 Live demo for each character with speaker previews.
🎧 Play and download your transformed voice instantly.
📈 Seamless progress tracking through interactive UI stages.

Tech Stack

Next.js 14 (App Router + Server Actions)
Shadcn/UI for components
TailwindCSS for styling
Google Cloud Speech-to-Text API (Transcription)
Gemini Flash 2.0 API (Text enhancement by persona)
ElevenLabs Text-to-Speech API (Realistic voice generation)
Vercel for hosting

How It Works

Step 1: Add Your Voice
Upload an audio file (or record directly in-browser).
Step 2: Choose Your Character
Select a pre-made persona that defines the emotion, accent, and style.
Start Cookin'
The system:
- Transcribes your speech into text
- Enhances the text based on the chosen character’s style
- Synthesizes natural-sounding audio using ElevenLabs voices
Enjoy Your Creation!
Listen, download, or share your newly twisted voice!

Installation & Setup (for development)

git clone https://github.com/MuditST/tonetwist.git
cd tone-twist
npm install

Create a .env.local file with:

GOOGLE_APPLICATION_CREDENTIALS=your-google-credentials.json
ELEVENLABS_API_KEY=your-elevenlabs-api-key
OPENAI_API_KEY=your-openai-or-gemini-api-key

Then run locally:

npm run dev

Notes

Maximum input audio length: 60 seconds (to avoid Google API timeout)
Input audio is automatically downsampled to 16kHz mono WAV for best transcription performance
User audio is processed securely and is not stored permanently
Project optimized for modern browsers (Chrome, Edge, Safari)

Future Enhancements

🔒 User authentication (Clerk integration)
📦 Cloud history for saving previous transformations
🌐 Public sharing links for generated audios
🧠 Personalized AI voice personas in the future

License

This project is licensed under the MIT License.

Credits

Built with ❤️ by Mudit Tushir
Powered by OpenAI, Google Cloud, ElevenLabs, Vercel

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
app		app
components		components
lib		lib
public		public
.gitignore		.gitignore
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tone Twist 🎙✨

Overview

Features

Tech Stack

How It Works

Installation & Setup (for development)

Notes

Future Enhancements

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Languages

MuditST/tonetwist

Folders and files

Latest commit

History

Repository files navigation

Tone Twist 🎙✨

Overview

Features

Tech Stack

How It Works

Installation & Setup (for development)

Notes

Future Enhancements

License

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages