PolyLingo is an advanced multilingual AI voice chatbot designed to understand, respond, and speak to users in multiple languages while maintaining emotional awareness and distinct personality tones. The system combines Natural Language Processing (NLP), Speech Recognition (STT), Text-to-Speech (TTS), and Emotion AI into one seamless interactive assistant experience. It serves as an intelligent communication interface capable of analyzing language, detecting sentiment, adapting personality, and providing voice feedback dynamically based on the user’s input.
PolyLingo acts as a real-time voice assistant that can listen to user speech, transcribe it into text, process it through language and emotion analysis, generate an intelligent context-based response, and finally speak it out loud in the same detected language. It supports multiple languages and personas, each persona defining a distinct conversational tone. For instance, a “friendly” persona replies warmly and casually, while a “professional” one responds formally and politely. It’s a full-stack application consisting of a FastAPI-based backend and a React-based frontend, integrated with Groq’s LLaMA model for fast and efficient NLP responses.
Multilingual voice and text-based interaction.
Automatic language detection using langdetect and fasttext.
Emotion-aware response generation using HuggingFace emotion classification models
Persona-based tone control (Friendly, Caring, Witty, Professional)
Text-to-Speech synthesis in the same detected language using Web Speech API.
Voice input through Speech-to-Text conversion for natural user interaction.
Real-time mood tracking and XP system for user engagement.
Elegant, glassmorphic UI designed for a modern chatbot experience.
Modular architecture allowing separate testing of NLP, persona, emotion, and UI modules.
git clone https://github.com/tung-programming/PolyLingo
- Running Backend:
cd backend
python -m venv venv
venv\Scripts\Activate
pip install -r requirements.txt
pip install numpy==1.26.4
pip install openai
uvicorn main:app --reload
- Running Frontend:
cd frontend
cd polylingo-ui
npm install
npm run dev
| Layer | Technology | Description |
|---|---|---|
| Frontend | React + Vite | Interactive chat UI, microphone and speaker integration |
| Backend | FastAPI (Python) | Handles API routes, NLP pipeline, and response generation |
| AI Model | Groq (LLaMA 3.1 – 8B Instant) | Generates intelligent, context-aware replies |
| Emotion Analysis | HuggingFace Transformers | Classifies emotion from text |
| Language Detection | langdetect + fasttext | Automatically detects language |
| Speech | Web Speech API | STT + TTS integration in browsers |
| Styling | CSS (Glassmorphism) | Smooth, minimal dark theme design |
| Environment | Python 3.11, Node.js 18+ | Runs locally or deploys on cloud easily |
-
User Input: The user can either speak through a microphone or type a message.
-
STT Module: Speech-to-Text conversion using the Web Speech API.
-
NLP Pipeline: The backend receives the message and performs:
- Language detection
- Emotion classification
- Persona tone mapping
- Response generation via Groq model
-
TTS Module: The backend’s text output is sent back to the frontend, which plays it out loud using Text-to-Speech in the detected language.
-
UI Display: The dashboard shows both the user and bot messages, emotion, persona, and XP updates.
This architecture ensures fluid multilingual interaction while keeping emotion and tone consistent throughout the conversation.
The testing modules resides in the /backend/app/nlp directory and includes several NLP and emotion modules that were tested independently before integration.
- Language Detection
The language_detector.py file automatically detects the language of the user’s input text. It uses langdetect and fasttext to identify over 20 languages, enabling multilingual response generation.
- Emotion Analysis
The emotion_analyzer.py file uses a HuggingFace transformer model (j-hartmann/emotion-english-distilroberta-base) to determine the user’s emotion, such as joy, sadness, anger, fear, or neutral.
- Persona Engine
The persona_engine.py file adjusts the chatbot’s tone and response structure based on the selected persona. It defines how each personality communicates, adds appropriate emojis, and modulates style.
- Example personas include:
- Friendly: Cheerful and emoji-filled responses.
- Professional: Concise, polite, and formal tone.
- Caring: Empathetic, reassuring, and understanding replies.
- Witty: Playful, clever, and humorous tone.
- Mood Tracker
The mood_tracker.py file tracks the user’s XP and dominant emotional state. Each interaction grants XP, and once it crosses a certain threshold, the user “levels up,” simulating progress and engagement.
- Response Generation
The response_generator.py file connects to the Groq API using the llama-3.1-8b-instant model. It takes the detected language, emotion, and persona as input and generates contextually relevant, emotionally aligned replies.
You can test this locally by running:
python -m app.nlp.response_generator
Sample output:
{
"user_input": "I am happy today",
"language": "en",
"persona": "friendly",
"emotion": "joy",
"reply": "That's amazing! I'm really glad you're feeling happy today 😊",
"tts_hint": {"pitch": "+4%", "rate": "fast"}
}
The backend routes are managed through FastAPI under /app/routes/chat_pipeline.py.
Primary Endpoint:
POST /api/nlp/response
Request Example:
{
"user_input": "Bonjour, comment allez-vous?",
"user_id": "guest",
"persona": "friendly"
}
Response Sample:
{
"user_input": "Bonjour, comment allez-vous?",
"user_id": "guest",
"persona": "friendly"
}
Response Example:
{
"success": true,
"language": "fr",
"emotion": {"label": "neutral", "confidence": 0.91},
"persona": "friendly",
"reply": "Bonjour! 😊 Je vais très bien, merci. Comment allez-vous?",
"tts_hint": {"pitch": "+2%", "rate": "normal"}
}
This endpoint powers the entire chatbot flow, linking frontend user queries to NLP-generated responses.
uvicorn main:app --reload
Desclaimer: Before running the above scripts , make sure you have activated the virtual environment and all the dependencies have been installed when it is activate. It should look similar to the one I have shown below depending on the name of your environment
(venv) PS D:\Ai-verse\multichat\backend> uvicorn main:app --reload
The frontend interface, located in /frontend/polylingo-ui/, handles all user interaction, visual display, and voice features.
Main Components:
- Dashboard.jsx → Chat display and input logic.
- PersonaSelector.jsx → Dropdown for choosing persona type.
- VoiceRecorder.jsx → Microphone input using browser SpeechRecognition API.
- ResponsePlayer.jsx → Text-to-Speech playback for bot replies.
- api.js → Connects frontend to FastAPI endpoints.
- App.css → Complete UI design with smooth glassmorphism, shadows, and dark theme.
Speech Handling:
- Speech-to-Text: Captures user voice and converts it to text dynamically.
- Text-to-Speech: Speaks the bot’s response in the detected language, automatically switching voice engines for correct pronunciation (e.g., Japanese → Google 日本語 voice).
Command:
npm run dev
Frontend runs on http://localhost:5173.
PolyLingo supports a wide range of languages both in text and speech, including:
- english
- Hindi
- Japanese
- French
- Spanish
- Portuguese
- Chinese (Simplified & Traditional)
- Korean
- Russian
- Italian It automatically detects input language and switches to an appropriate TTS voice (e.g., Japanese → ja-JP, Hindi → hi-IN).
| Endpoint | Method | Purpose |
|---|---|---|
/api/nlp/response |
POST | Generates a language and emotion-aware chatbot reply. |
/api/stt/transcribe |
POST | Converts user speech to text (browser handled). |
/api/tts/speak |
POST | Converts text to speech (browser handled). |
/api/persona/list |
GET | Retrieves all available persona options. |
-
The user types or speaks to the bot.
-
The system detects language and emotion.
-
Persona tone and style are applied.
-
The Groq NLP model generates the reply.
-
The message appears on-screen and is spoken out loud in the correct language.
-
The XP tracker updates based on user emotion and interaction frequency.
-
Add long-term memory for context continuity.
-
Integrate user-specific profiles and mood history.
-
Deploy to cloud (AWS/GCP) with HTTPS endpoints.
-
Add emotion-driven voice modulation for realism.
-
Optionally connect to 3D avatar for visual expressions.
PolyLingo is a fully functional multilingual AI chatbot that merges NLP, speech recognition, text-to-speech, and emotion analysis into a single unified interface. It understands language context, emotional state, and personality tones to deliver intelligent, human-like responses. With real-time voice interaction and a beautifully designed UI, PolyLingo stands as an advanced demonstration of multilingual and emotion-aware AI communication. It showcases how technology can bridge linguistic and emotional gaps, creating conversations that feel truly alive.