### Speech to Text conversion using Google api
#### Theory
Speech recognition is a technology that enables machines to interpret and process human speech into text. It involves the use of algorithms and models to analyze audio signals, extract features, and map them to corresponding linguistic representations. The process typically includes the following steps:

1. **Audio Signal Acquisition**: Capturing audio input using a microphone or other recording devices.
2. **Preprocessing**: Removing noise and normalizing the audio signal for better accuracy.
3. **Feature Extraction**: Extracting relevant features such as Mel-Frequency Cepstral Coefficients (MFCCs) or spectrograms from the audio signal.
4. **Recognition**: Using machine learning models, such as Hidden Markov Models (HMMs) or neural networks, to map features to text.
5. **Postprocessing**: Refining the output to improve readability and accuracy.

**In this lab, the `speech_recognition` library is used to implement speech-to-text functionality. This library provides tools for capturing audio, processing it, and recognizing speech using various APIs like Google Speech Recognition.**

#### Objectives
1. To understand the fundamentals of speech recognition and its workflow.
2. To implement a speech recognition system using the `speech_recognition` library in Python.
3. To capture audio input from a microphone and preprocess it for recognition.
4. To utilize Google Speech Recognition API for converting speech to text.
5. To analyze and interpret the recognized text for further applications.

In [9]:
import speech_recognition as sr

In [10]:
duration = 5

recognizer = sr.Recognizer()
microphone = sr.Microphone()

print("Calibrating microphone for ambient noise...")
with microphone as mic:
	recognizer.adjust_for_ambient_noise(mic, 2)
	recognizer.dynamic_energy_threshold = True
	recognizer.pause_threshold = 0.8

	print(f"Recording for {duration} secs...")
	audio_data = recognizer.record(mic, duration=duration)

print("Processing audio...")
try:
	out = recognizer.recognize_google(audio_data, show_all=False)

	print("Recognized text:")
	print(out)
except sr.UnknownValueError:
	print("No speech detected.")

ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave


Calibrating microphone for ambient noise...
Recording for 5 secs...
Processing audio...
Recognized text:
my favourite subject is digital image and speech processing
