### Real-Time Speech Recognition system
#### Theory
Real-time speech recognition is a process where spoken language is converted into text in real-time using advanced machine learning models. The Vosk library is a popular tool for implementing speech recognition due to its lightweight design and support for multiple languages. It uses pre-trained models to recognize speech patterns and convert them into text. The process involves capturing audio input, processing it through a model, and generating text output.

Key components of the system include:
- **Audio Input Stream**: Captures real-time audio data using a microphone.
- **Vosk Model**: A pre-trained model that processes the audio data and performs speech-to-text conversion.
- **Recognizer**: A component that uses the model to analyze audio frames and generate results.
- **Queue**: A data structure used to manage audio data for processing.

The system also provides partial results during processing, allowing for continuous feedback before the final recognition result is generated.
#### Objectives
1. **Understand Real-Time Speech Recognition**: Learn how to implement a real-time speech recognition system using the Vosk library.
2. **Explore Audio Input Handling**: Understand how to capture and process audio data using the `sounddevice` library.
3. **Implement Speech-to-Text Conversion**: Use the Vosk model to convert spoken words into text in real-time.
4. **Handle Partial and Final Results**: Differentiate between partial and final recognition results and display them appropriately.
5. **Terminate on Command**: Implement functionality to stop the recognition process when a specific word (e.g., "exit") is spoken.

In [4]:
import queue
import sounddevice as sd
import vosk
import sys
import json

MODEL_PATH = "model"
model = vosk.Model(MODEL_PATH)

samplerate = 16000
device = None
q = queue.Queue()

def callback(indata, frames, time, status):
    if status:
        print(status, file=sys.stderr)
    q.put(bytes(indata))

recognizer = vosk.KaldiRecognizer(model, samplerate)
print("Real-time Speech Recognition started. Say exit to exit.")
with sd.RawInputStream(samplerate, 8000, device, dtype="int16", channels=1, callback=callback):
    while True:
        data = q.get()
        if recognizer.AcceptWaveform(data):
            result = json.loads(recognizer.Result())
            print(f"\nRecognized: {result['text']}")
            if result["text"] == "exit":
                break
        else:
            partial = json.loads(recognizer.PartialResult())
            print(f"\rPartial: {partial['partial']}", end="", flush=True)

LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=10 max-active=3000 lattice-beam=2
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:6:7:8:9:10
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from model/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:282) Loading HCL and G from model/graph/HCLr.fst model/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():model.cc:308) Loading winfo model/graph/phones/word_boundary.int


Real-time Speech Recognition started. Say exit to exit.
Partial: this is real thing is is to cognitions his team also with continuous time listening
Recognized: this is real thing is is to cognitions his team also with continuous time listening
Partial: exit
Recognized: exit
