# ASR Module

## Install Dependencies

In [39]:
pip install --upgrade --no-cache-dir openai-whisper


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


In [40]:
pip install sounddevice wave transformers sentence-transformers


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


## ASR

In [41]:
import whisper
import faiss
import torch
import sounddevice as sd
import numpy as np
import wave
from transformers import AutoTokenizer, AutoModel
from sentence_transformers import SentenceTransformer


In [42]:
# Load Whisper model (using 'small' for efficiency)
asr_model = whisper.load_model("small")

# Load embedding model for memory storage
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize FAISS index for vector storage
embedding_dim = 384  # Must match MiniLM embedding size
index = faiss.IndexFlatL2(embedding_dim)


In [43]:
def record_audio(filename, duration=10, samplerate=16000):
    """
    Records audio from the microphone and saves it as a WAV file.
    Args:
        filename (str): Path to save the recorded audio.
        duration (int): Duration of recording in seconds.
        samplerate (int): Sampling rate for audio recording.
    """
    print("Recording...")
    audio_data = sd.rec(int(duration * samplerate), samplerate=samplerate, channels=1, dtype=np.int16)
    sd.wait()
    print("Recording finished.")
    
    with wave.open(filename, "wb") as wf:
        wf.setnchannels(1)
        wf.setsampwidth(2)
        wf.setframerate(samplerate)
        wf.writeframes(audio_data.tobytes())


In [44]:
def transcribe_audio(audio_path):
    """
    Transcribes speech from an audio file using Whisper ASR.
    Args:
        audio_path (str): Path to the audio file.
    Returns:
        str: Transcribed text.
    """
    result = asr_model.transcribe(audio_path)
    return result["text"]

def store_transcription_in_memory(text):
    """
    Converts transcribed text into an embedding and stores it in FAISS.
    Args:
        text (str): Transcribed speech text.
    """
    embedding = embedding_model.encode([text])
    embedding = np.array(embedding).astype('float32')
    index.add(embedding)  # Store in FAISS
    print("Stored in Memory Module:", text)


In [46]:
# Test
audio_file = "recorded_audio.wav"
record_audio(audio_file)
transcription = transcribe_audio(audio_file)
if transcription:
    store_transcription_in_memory(transcription)


Recording...
Recording finished.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Stored in Memory Module: Do you have any plans for this weekend?你这周末有什么计划吗?
