https://victor-gp.github.io/figmatrix/
This app is an interactive speech therapy application designed to help children with speech issues practice pronunciation of challenging sounds through tongue twisters and targeted sentences. The application provides real-time feedback on pronunciation accuracy by comparing spoken words against expected text.
- Sentence Display: The application presents sentences and tongue twisters tailored to specific speech sounds that children struggle with.
- Interactive Practice: Users speak each word into their device's microphone, with visual guidance showing which word to pronounce next.
- Space Bar Navigation: Press the space bar between words to move the "cursor" to the next word in the sequence.
- Real-time Validation: Your pronunciation is instantly validated against the expected word using advanced speech recognition technology.
- Instant Feedback: Receive feedback on whether your pronunciation matches the target word, helping you improve over time.
This application is specifically designed for:
- Speech-language pathologists and therapists
- Children with speech articulation disorders
- Python 3.8+
- Node.js (for frontend development)
- ElevenLabs API account and API key
- Clone the repository
- Install backend dependencies:
pip install -r requirements.txt - Install frontend dependencies:
cd frontend && npm install - Set up your ElevenLabs API key on
.env.example - Start the backend server:
python main.py - Start the frontend development server:
cd frontend && npm run dev
- Open the application in your web browser
- Allow microphone access when prompted
- Select a tongue twister or sentence to practice
- Speak each word clearly into your microphone
- Press space bar to advance to the next word
- Receive instant feedback on your pronunciation accuracy
- ElevenLabs Speech-to-Text API - Converts spoken audio to text for comparison
- Web Audio API - Captures microphone input from the browser
- ElevenLabs Python SDK - Interface for speech-to-text conversion
- Python 3.x - Core programming language
- FastAPI - Modern, fast web framework for building APIs
- HTML5 - Structure and semantics
- CSS3 - Styling and responsive design
- JavaScript/TypeScript - Client-side interactivity
- Vite - Build tool and development server
- React - User interface components
- Real-time Speech Recognition: Leverages ElevenLabs' advanced speech-to-text technology
- Pronunciation Comparison: Sophisticated string matching algorithms with configurable similarity thresholds
- Progressive Word Navigation: Space bar controlled word-by-word progression
- Responsive Design: Works across desktop and mobile devices
- Audio Processing: Handles PCM audio encoding and decoding
- Error Handling: Robust error management for various speech recognition scenarios
- Afaf Driouech
- Daniele Pala
- Rahimakhan Abduqodirova
- Thao Phuong Pham
- Victor Gonzalez Prieto