-
Notifications
You must be signed in to change notification settings - Fork 0
Technical Documentation
MeMa's Speech Recognition uses the SpeechRecognition library. The module at codebank/mema_speech_recognition provides a wrapper around the speech recognition library, which allows capturing audio from a microphone and converting it to text. The script makes use of the speech_recognition library (aliased as sr) for the speech recognition functionality. It also utilizes other Python modules for handling C data types, creating context managers, and managing threads, but this was mainly done to mute ASLA Errors from spamming the terminal, don't worry too much about that.
Integrating speech recognition into your program is easy thanks to the Mema Page Framework, which will automatically transcribe and supply any spoken content through the callback function. This allows developers to avoid the stress of having to directly interface and use the speech recognition functional interface (below). More information on how to detect speech recognition requests and allocate for them is available in the Mema Page Framework Wiki
Here is a description of the supplied functions within the Speech Recognition class. These should not be used directly, instead through the callback function on the mema page instance. This is just here in-case someone needs to modify the internal workings of the speech recognition process.
⚠️ Please do not call any of these functions directly, instead use thecallbackfunction on your current page to allow speech recognition. (See Mema Page Framework Wiki)
listen(input_queue: Queue, stop: bool) -> NoneStarts a thread to recognize speech. Recognized phrases are added to the parsed input queue, using the speech_recognition library.
recognize_speech_thread(input_queue: Queue, stop: bool) -> NoneThis function runs the
recognize_speech_internal()function inside a context manager (noalsaerr) to suppress ALSA warnings. It continuously recognizes speech and adds the recognized phrases to the providedinput_queue. It stops when thestopflag becomesTrue.
recognize_speech_internal() -> str|NoneThis function performs speech recognition using the Google Speech Recognition API. It listens on the microphone for speech and returns the recognized text as a
string. If no speech is recognized or there is an error during the recognition process, it returnsNone.
Speech Recognition ASLA Warnings
It's probably quite intimidating to look at this file because of the complicated, long-winded but working implementation to block ASLA from spamming the stdout with lot's of warnings. These really aren't integral to the programs function and just exist to essentially only silence ASLA. If these are causing errors then it may be worth looking into this area.
noalsaerr() -> NoneThis context manager switches the libasound.so error handler to the
py_error_handlerfunction (which is empty), essentially blocking ALSA warnings that get spammed in the terminal.
py_error_handler(filename, line, function, err, fmt) -> NoneThis empty error handler is used by the
noalsaerr()context manager to block ALSA warnings.
ERROR_HANDLER_FUNCThis variable represents the C type for the error handler function used to handle ALSA errors.
ERROR_HANDLER_FUNC(py_error_handler)This function sets the error handler for ALSA errors to the provided
py_error_handlerfunction.
MeMa's Facial Recognition uses the cv2 and face_recognition library. The facial-recognition code was built upon this example.
MeMa's Text to Speech uses the gTTS (Google Text To Speech) Library, The module provides functions to generate speech from input text and play it aloud using the playsound library. The TTS is played asynchronously on a separate thread to allow concurrent execution of other parts of the program.
speak(text: str) -> NoneThe
speak()function provides a higher-level interface to the TTS capabilities. It converts the provided text into speech and plays it asynchronously on a separate thread using speak_thread(). The function handles error checking for an empty input string.
speak_thread(text: str) -> NoneThis function generates speech from the provided input text using the gTTS library and plays it using playsound. The speech generation and playback occur in a separate thread to ensure concurrent execution, asynchronous to the main program.
⚠️ This shouldn't be called directly, instead call the abovespeak()command.
-
The gTTS library requires an internet connection to convert text to speech, as it relies on Google's Text-to-Speech service. To make this work offline, switch to pyttsx3.
-
The TTS playback is executed asynchronously using threads, allowing the rest of the program to continue running without waiting for speech playback to finish.
-
The TTS module includes error handling for cases where the file generation or playback might fail. Any exceptions encountered during this process will be caught and printed, but the TTS functionality will not interrupt the rest of the program's execution.